From: SeongJae Park <sjpark(a)amazon.de>
When running a test program, 'run_one()' checks if the program has the
execution permission and fails if it doesn't. However, it's easy to
mistakenly missing the permission, as some common tools like 'diff'
don't support the permission change well[1]. Compared to that, making
mistakes in the test program's path would only rare, as those are
explicitly listed in 'TEST_PROGS'. Therefore, it might make more sense
to resolve the situation on our own and run the program.
For the reason, this commit makes the test program runner function to
still print the warning message but try parsing the interpreter of the
program and explicitly run it with the interpreter, in the case.
[1] https://lore.kernel.org/mm-commits/YRJisBs9AunccCD4@kroah.com/
Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
---
Changes from v1
(https://lore.kernel.org/linux-kselftest/20210810140459.23990-1-sj38.park@gm…)
- Parse and use the interpreter instead of changing the file
tools/testing/selftests/kselftest/runner.sh | 28 +++++++++++++--------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index cc9c846585f0..a9ba782d8ca0 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -33,9 +33,9 @@ tap_timeout()
{
# Make sure tests will time out if utility is available.
if [ -x /usr/bin/timeout ] ; then
- /usr/bin/timeout --foreground "$kselftest_timeout" "$1"
+ /usr/bin/timeout --foreground "$kselftest_timeout" $1
else
- "$1"
+ $1
fi
}
@@ -65,17 +65,25 @@ run_one()
TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
echo "# $TEST_HDR_MSG"
- if [ ! -x "$TEST" ]; then
- echo -n "# Warning: file $TEST is "
- if [ ! -e "$TEST" ]; then
- echo "missing!"
- else
- echo "not executable, correct this."
- fi
+ if [ ! -e "$TEST" ]; then
+ echo "# Warning: file $TEST is missing!"
echo "not ok $test_num $TEST_HDR_MSG"
else
+ cmd="./$BASENAME_TEST"
+ if [ ! -x "$TEST" ]; then
+ echo "# Warning: file $TEST is not executable"
+
+ if [ $(head -n 1 "$TEST" | cut -c -2) = "#!" ]
+ then
+ interpreter=$(head -n 1 "$TEST" | cut -c 3-)
+ cmd="$interpreter ./$BASENAME_TEST"
+ else
+ echo "not ok $test_num $TEST_HDR_MSG"
+ return
+ fi
+ fi
cd `dirname $TEST` > /dev/null
- ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout "$cmd" 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
--
2.17.1
The Felix VSC9959 switch in NXP LS1028A supports the tc-gate action
which enforced time-based access control per stream. A stream as seen by
this switch is identified by {MAC DA, VID}.
We use the standard forwarding selftest topology with 2 host interfaces
and 2 switch interfaces. The host ports must require timestamping non-IP
packets and supporting tc-etf offload, for isochron to work. The
isochron program monitors network sync status (ptp4l, phc2sys) and
deterministically transmits packets to the switch such that the tc-gate
action either (a) always accepts them based on its schedule, or
(b) always drops them.
I tried to keep as much of the logic that isn't specific to the NXP
LS1028A in a new tsn_lib.sh, for future reuse. This covers
synchronization using ptp4l and phc2sys, and isochron.
The cycle-time chosen for this selftest isn't particularly impressive
(and the focus is the functionality of the switch), but I didn't really
know what to do better, considering that it will mostly be run during
debugging sessions, various kernel bloatware would be enabled, like
lockdep, KASAN, etc, and we certainly can't run any races with those on.
I tried to look through the kselftest framework for other real time
applications and didn't really find any, so I'm not sure how better to
prepare the environment in case we want to go for a lower cycle time.
At the moment, the only thing the selftest is ensuring is that dynamic
frequency scaling is disabled on the CPU that isochron runs on. It would
probably be useful to have a blacklist of kernel config options (checked
through zcat /proc/config.gz) and some cyclictest scripts to run
beforehand, but I saw none of those.
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
v1->v2:
- fix an off-by-one bug introduced at the last minute regarding which
tc-mqprio queue was used for tc-etf and SO_TXTIME
- introduce debugging for packets incorrectly received / incorrectly
dropped based on "isochron report"
- make the tsn_lib.sh dependency on isochron and linuxptp optional via
REQUIRE_ISOCHRON and REQUIRE_LINUXPTP
- avoid errors when CONFIG_CPU_FREQ is disabled
- consistently use SCHED_FIFO instead of SCHED_RR for the isochron
receiver
.../selftests/drivers/net/ocelot/psfp.sh | 327 ++++++++++++++++++
.../selftests/net/forwarding/tsn_lib.sh | 235 +++++++++++++
2 files changed, 562 insertions(+)
create mode 100755 tools/testing/selftests/drivers/net/ocelot/psfp.sh
create mode 100644 tools/testing/selftests/net/forwarding/tsn_lib.sh
diff --git a/tools/testing/selftests/drivers/net/ocelot/psfp.sh b/tools/testing/selftests/drivers/net/ocelot/psfp.sh
new file mode 100755
index 000000000000..5a5cee92c665
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/ocelot/psfp.sh
@@ -0,0 +1,327 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2021-2022 NXP
+
+# Note: On LS1028A, in lack of enough user ports, this setup requires patching
+# the device tree to use the second CPU port as a user port
+
+WAIT_TIME=1
+NUM_NETIFS=4
+STABLE_MAC_ADDRS=yes
+NETIF_CREATE=no
+lib_dir=$(dirname $0)/../../../net/forwarding
+source $lib_dir/tc_common.sh
+source $lib_dir/lib.sh
+source $lib_dir/tsn_lib.sh
+
+UDS_ADDRESS_H1="/var/run/ptp4l_h1"
+UDS_ADDRESS_SWP1="/var/run/ptp4l_swp1"
+
+# Tunables
+NUM_PKTS=1000
+STREAM_VID=100
+STREAM_PRIO=6
+# Use a conservative cycle of 10 ms to allow the test to still pass when the
+# kernel has some extra overhead like lockdep etc
+CYCLE_TIME_NS=10000000
+# Create two Gate Control List entries, one OPEN and one CLOSE, of equal
+# durations
+GATE_DURATION_NS=$((${CYCLE_TIME_NS} / 2))
+# Give 2/3 of the cycle time to user space and 1/3 to the kernel
+FUDGE_FACTOR=$((${CYCLE_TIME_NS} / 3))
+# Shift the isochron base time by half the gate time, so that packets are
+# always received by swp1 close to the middle of the time slot, to minimize
+# inaccuracies due to network sync
+SHIFT_TIME_NS=$((${GATE_DURATION_NS} / 2))
+
+h1=${NETIFS[p1]}
+swp1=${NETIFS[p2]}
+swp2=${NETIFS[p3]}
+h2=${NETIFS[p4]}
+
+H1_IPV4="192.0.2.1"
+H2_IPV4="192.0.2.2"
+H1_IPV6="2001:db8:1::1"
+H2_IPV6="2001:db8:1::2"
+
+# Chain number exported by the ocelot driver for
+# Per-Stream Filtering and Policing filters
+PSFP()
+{
+ echo 30000
+}
+
+psfp_chain_create()
+{
+ local if_name=$1
+
+ tc qdisc add dev $if_name clsact
+
+ tc filter add dev $if_name ingress chain 0 pref 49152 flower \
+ skip_sw action goto chain $(PSFP)
+}
+
+psfp_chain_destroy()
+{
+ local if_name=$1
+
+ tc qdisc del dev $if_name clsact
+}
+
+psfp_filter_check()
+{
+ local expected=$1
+ local packets=""
+ local drops=""
+ local stats=""
+
+ stats=$(tc -j -s filter show dev ${swp1} ingress chain $(PSFP) pref 1)
+ packets=$(echo ${stats} | jq ".[1].options.actions[].stats.packets")
+ drops=$(echo ${stats} | jq ".[1].options.actions[].stats.drops")
+
+ if ! [ "${packets}" = "${expected}" ]; then
+ printf "Expected filter to match on %d packets but matched on %d instead\n" \
+ "${expected}" "${packets}"
+ fi
+
+ echo "Hardware filter reports ${drops} drops"
+}
+
+h1_create()
+{
+ simple_if_init $h1 $H1_IPV4/24 $H1_IPV6/64
+}
+
+h1_destroy()
+{
+ simple_if_fini $h1 $H1_IPV4/24 $H1_IPV6/64
+}
+
+h2_create()
+{
+ simple_if_init $h2 $H2_IPV4/24 $H2_IPV6/64
+}
+
+h2_destroy()
+{
+ simple_if_fini $h2 $H2_IPV4/24 $H2_IPV6/64
+}
+
+switch_create()
+{
+ local h2_mac_addr=$(mac_get $h2)
+
+ ip link set ${swp1} up
+ ip link set ${swp2} up
+
+ ip link add br0 type bridge vlan_filtering 1
+ ip link set ${swp1} master br0
+ ip link set ${swp2} master br0
+ ip link set br0 up
+
+ bridge vlan add dev ${swp2} vid ${STREAM_VID}
+ bridge vlan add dev ${swp1} vid ${STREAM_VID}
+ # PSFP on Ocelot requires the filter to also be added to the bridge
+ # FDB, and not be removed
+ bridge fdb add dev ${swp2} \
+ ${h2_mac_addr} vlan ${STREAM_VID} static master
+
+ psfp_chain_create ${swp1}
+
+ tc filter add dev ${swp1} ingress chain $(PSFP) pref 1 \
+ protocol 802.1Q flower skip_sw \
+ dst_mac ${h2_mac_addr} vlan_id ${STREAM_VID} \
+ action gate base-time 0.000000000 \
+ sched-entry OPEN ${GATE_DURATION_NS} -1 -1 \
+ sched-entry CLOSE ${GATE_DURATION_NS} -1 -1
+}
+
+switch_destroy()
+{
+ psfp_chain_destroy ${swp1}
+ ip link del br0
+}
+
+txtime_setup()
+{
+ local if_name=$1
+
+ tc qdisc add dev ${if_name} clsact
+ # Classify PTP on TC 7 and isochron on TC 6
+ tc filter add dev ${if_name} egress protocol 0x88f7 \
+ flower action skbedit priority 7
+ tc filter add dev ${if_name} egress protocol 802.1Q \
+ flower vlan_ethtype 0xdead action skbedit priority 6
+ tc qdisc add dev ${if_name} handle 100: parent root mqprio num_tc 8 \
+ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
+ map 0 1 2 3 4 5 6 7 \
+ hw 1
+ # Set up TC 6 for SO_TXTIME. tc-mqprio queues count from 1.
+ tc qdisc replace dev ${if_name} parent 100:$((${STREAM_PRIO} + 1)) etf \
+ clockid CLOCK_TAI offload delta ${FUDGE_FACTOR}
+}
+
+txtime_cleanup()
+{
+ local if_name=$1
+
+ tc qdisc del dev ${if_name} root
+ tc qdisc del dev ${if_name} clsact
+}
+
+setup_prepare()
+{
+ vrf_prepare
+
+ h1_create
+ h2_create
+ switch_create
+
+ txtime_setup ${h1}
+
+ # Set up swp1 as a master PHC for h1, synchronized to the local
+ # CLOCK_REALTIME.
+ phc2sys_start ${swp1} ${UDS_ADDRESS_SWP1}
+
+ # Assumption true for LS1028A: h1 and h2 use the same PHC. So by
+ # synchronizing h1 to swp1 via PTP, h2 is also implicitly synchronized
+ # to swp1 (and both to CLOCK_REALTIME).
+ ptp4l_start ${h1} true ${UDS_ADDRESS_H1}
+ ptp4l_start ${swp1} false ${UDS_ADDRESS_SWP1}
+
+ # Make sure there are no filter matches at the beginning of the test
+ psfp_filter_check 0
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ptp4l_stop ${swp1}
+ ptp4l_stop ${h1}
+ phc2sys_stop
+ isochron_recv_stop
+
+ txtime_cleanup ${h1}
+
+ h2_destroy
+ h1_destroy
+ switch_destroy
+
+ vrf_cleanup
+}
+
+debug_incorrectly_dropped_packets()
+{
+ local isochron_dat=$1
+ local dropped_seqids
+ local seqid
+
+ echo "Packets incorrectly dropped:"
+
+ dropped_seqids=$(isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "%u RX hw %T\n" \
+ --printf-args "qR" | \
+ grep 'RX hw 0.000000000' | \
+ awk '{print $1}')
+
+ for seqid in ${dropped_seqids}; do
+ isochron report \
+ --input-file "${isochron_dat}" \
+ --start ${seqid} --stop ${seqid} \
+ --printf-format "seqid %u scheduled for %T, HW TX timestamp %T\n" \
+ --printf-args "qST"
+ done
+}
+
+debug_incorrectly_received_packets()
+{
+ local isochron_dat=$1
+
+ echo "Packets incorrectly received:"
+
+ isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "seqid %u scheduled for %T, HW TX timestamp %T, HW RX timestamp %T\n" \
+ --printf-args "qSTR" |
+ grep -v 'HW RX timestamp 0.000000000'
+}
+
+run_test()
+{
+ local base_time=$1
+ local expected=$2
+ local test_name=$3
+ local debug=$4
+ local isochron_dat="$(mktemp)"
+ local extra_args=""
+ local received
+
+ isochron_do \
+ "${h1}" \
+ "${h2}" \
+ "${UDS_ADDRESS_H1}" \
+ "" \
+ "${base_time}" \
+ "${CYCLE_TIME_NS}" \
+ "${SHIFT_TIME_NS}" \
+ "${NUM_PKTS}" \
+ "${STREAM_VID}" \
+ "${STREAM_PRIO}" \
+ "" \
+ "${isochron_dat}"
+
+ # Count all received packets by looking at the non-zero RX timestamps
+ received=$(isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "%u\n" --printf-args "R" | \
+ grep -w -v '0' | wc -l)
+
+ if [ "${received}" = "${expected}" ]; then
+ RET=0
+ else
+ RET=1
+ echo "Expected isochron to receive ${expected} packets but received ${received}"
+ fi
+
+ log_test "${test_name}"
+
+ if [ "$RET" = "1" ]; then
+ ${debug} "${isochron_dat}"
+ fi
+
+ rm ${isochron_dat} 2> /dev/null
+}
+
+test_gate_in_band()
+{
+ # Send packets in-band with the OPEN gate entry
+ run_test 0.000000000 ${NUM_PKTS} "In band" \
+ debug_incorrectly_dropped_packets
+
+ psfp_filter_check ${NUM_PKTS}
+}
+
+test_gate_out_of_band()
+{
+ # Send packets in-band with the CLOSE gate entry
+ run_test 0.005000000 0 "Out of band" \
+ debug_incorrectly_received_packets
+
+ psfp_filter_check $((2 * ${NUM_PKTS}))
+}
+
+trap cleanup EXIT
+
+ALL_TESTS="
+ test_gate_in_band
+ test_gate_out_of_band
+"
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tsn_lib.sh b/tools/testing/selftests/net/forwarding/tsn_lib.sh
new file mode 100644
index 000000000000..60a1423e8116
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/tsn_lib.sh
@@ -0,0 +1,235 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2021-2022 NXP
+
+REQUIRE_ISOCHRON=${REQUIRE_ISOCHRON:=yes}
+REQUIRE_LINUXPTP=${REQUIRE_LINUXPTP:=yes}
+
+# Tunables
+UTC_TAI_OFFSET=37
+ISOCHRON_CPU=1
+
+if [[ "$REQUIRE_ISOCHRON" = "yes" ]]; then
+ # https://github.com/vladimiroltean/tsn-scripts
+ # WARNING: isochron versions pre-1.0 are unstable,
+ # always use the latest version
+ require_command isochron
+fi
+if [[ "$REQUIRE_LINUXPTP" = "yes" ]]; then
+ require_command phc2sys
+ require_command ptp4l
+fi
+
+phc2sys_start()
+{
+ local if_name=$1
+ local uds_address=$2
+ local extra_args=""
+
+ if ! [ -z "${uds_address}" ]; then
+ extra_args="${extra_args} -z ${uds_address}"
+ fi
+
+ phc2sys_log="$(mktemp)"
+
+ chrt -f 10 phc2sys -m \
+ -c ${if_name} \
+ -s CLOCK_REALTIME \
+ -O ${UTC_TAI_OFFSET} \
+ --step_threshold 0.00002 \
+ --first_step_threshold 0.00002 \
+ ${extra_args} \
+ > "${phc2sys_log}" 2>&1 &
+ phc2sys_pid=$!
+
+ echo "phc2sys logs to ${phc2sys_log} and has pid ${phc2sys_pid}"
+
+ sleep 1
+}
+
+phc2sys_stop()
+{
+ { kill ${phc2sys_pid} && wait ${phc2sys_pid}; } 2> /dev/null
+ rm "${phc2sys_log}" 2> /dev/null
+}
+
+ptp4l_start()
+{
+ local if_name=$1
+ local slave_only=$2
+ local uds_address=$3
+ local log="ptp4l_log_${if_name}"
+ local pid="ptp4l_pid_${if_name}"
+ local extra_args=""
+
+ if [ "${slave_only}" = true ]; then
+ extra_args="${extra_args} -s"
+ fi
+
+ # declare dynamic variables ptp4l_log_${if_name} and ptp4l_pid_${if_name}
+ # as global, so that they can be referenced later
+ declare -g "${log}=$(mktemp)"
+
+ chrt -f 10 ptp4l -m -2 -P \
+ -i ${if_name} \
+ --step_threshold 0.00002 \
+ --first_step_threshold 0.00002 \
+ --tx_timestamp_timeout 100 \
+ --uds_address="${uds_address}" \
+ ${extra_args} \
+ > "${!log}" 2>&1 &
+ declare -g "${pid}=$!"
+
+ echo "ptp4l for interface ${if_name} logs to ${!log} and has pid ${!pid}"
+
+ sleep 1
+}
+
+ptp4l_stop()
+{
+ local if_name=$1
+ local log="ptp4l_log_${if_name}"
+ local pid="ptp4l_pid_${if_name}"
+
+ { kill ${!pid} && wait ${!pid}; } 2> /dev/null
+ rm "${!log}" 2> /dev/null
+}
+
+cpufreq_max()
+{
+ local cpu=$1
+ local freq="cpu${cpu}_freq"
+ local governor="cpu${cpu}_governor"
+
+ # Kernel may be compiled with CONFIG_CPU_FREQ disabled
+ if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then
+ return
+ fi
+
+ # declare dynamic variables cpu${cpu}_freq and cpu${cpu}_governor as
+ # global, so they can be referenced later
+ declare -g "${freq}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq)"
+ declare -g "${governor}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor)"
+
+ cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_max_freq > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq
+ echo -n "performance" > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor
+}
+
+cpufreq_restore()
+{
+ local cpu=$1
+ local freq="cpu${cpu}_freq"
+ local governor="cpu${cpu}_governor"
+
+ if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then
+ return
+ fi
+
+ echo "${!freq}" > /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq
+ echo -n "${!governor}" > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor
+}
+
+isochron_recv_start()
+{
+ local if_name=$1
+ local uds=$2
+ local extra_args=$3
+
+ if ! [ -z "${uds}" ]; then
+ extra_args="--unix-domain-socket ${uds}"
+ fi
+
+ isochron rcv \
+ --interface ${if_name} \
+ --sched-priority 98 \
+ --sched-fifo \
+ --utc-tai-offset ${UTC_TAI_OFFSET} \
+ --quiet \
+ ${extra_args} & \
+ isochron_pid=$!
+
+ sleep 1
+}
+
+isochron_recv_stop()
+{
+ { kill ${isochron_pid} && wait ${isochron_pid}; } 2> /dev/null
+}
+
+isochron_do()
+{
+ local sender_if_name=$1; shift
+ local receiver_if_name=$1; shift
+ local sender_uds=$1; shift
+ local receiver_uds=$1; shift
+ local base_time=$1; shift
+ local cycle_time=$1; shift
+ local shift_time=$1; shift
+ local num_pkts=$1; shift
+ local vid=$1; shift
+ local priority=$1; shift
+ local dst_ip=$1; shift
+ local isochron_dat=$1; shift
+ local extra_args=""
+ local receiver_extra_args=""
+ local vrf="$(master_name_get ${sender_if_name})"
+ local use_l2="true"
+
+ if ! [ -z "${dst_ip}" ]; then
+ use_l2="false"
+ fi
+
+ if ! [ -z "${vrf}" ]; then
+ dst_ip="${dst_ip}%${vrf}"
+ fi
+
+ if ! [ -z "${vid}" ]; then
+ vid="--vid=${vid}"
+ fi
+
+ if [ -z "${receiver_uds}" ]; then
+ extra_args="${extra_args} --omit-remote-sync"
+ fi
+
+ if ! [ -z "${shift_time}" ]; then
+ extra_args="${extra_args} --shift-time=${shift_time}"
+ fi
+
+ if [ "${use_l2}" = "true" ]; then
+ extra_args="${extra_args} --l2 --etype=0xdead ${vid}"
+ receiver_extra_args="--l2 --etype=0xdead"
+ else
+ extra_args="${extra_args} --l4 --ip-destination=${dst_ip}"
+ receiver_extra_args="--l4"
+ fi
+
+ cpufreq_max ${ISOCHRON_CPU}
+
+ isochron_recv_start "${h2}" "${receiver_uds}" "${receiver_extra_args}"
+
+ isochron send \
+ --interface ${sender_if_name} \
+ --unix-domain-socket ${sender_uds} \
+ --priority ${priority} \
+ --base-time ${base_time} \
+ --cycle-time ${cycle_time} \
+ --num-frames ${num_pkts} \
+ --frame-size 64 \
+ --txtime \
+ --utc-tai-offset ${UTC_TAI_OFFSET} \
+ --cpu-mask $((1 << ${ISOCHRON_CPU})) \
+ --sched-fifo \
+ --sched-priority 98 \
+ --client 127.0.0.1 \
+ --sync-threshold 5000 \
+ --output-file ${isochron_dat} \
+ ${extra_args} \
+ --quiet
+
+ isochron_recv_stop
+
+ cpufreq_restore ${ISOCHRON_CPU}
+}
--
2.25.1
Real-time setups try hard to ensure proper isolation between time
critical applications and e.g. network processing performed by the
network stack in softirq and RPS is used to move the softirq
activity away from the isolated core.
If the network configuration is dynamic, with netns and devices
routinely created at run-time, enforcing the correct RPS setting
on each newly created device allowing to transient bad configuration
became complex.
These series try to address the above, introducing a new
sysctl knob: rps_default_mask. The new sysctl entry allows
configuring a systemwide RPS mask, to be enforced since receive
queue creation time without any fourther per device configuration
required.
Additionally, a simple self-test is introduced to check the
rps_default_mask behavior.
v1 -> v2:
- fix sparse warning in patch 2/3
Paolo Abeni (3):
net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
net/core: introduce default_rps_mask netns attribute
self-tests: introduce self-tests for RPS default mask
Documentation/admin-guide/sysctl/net.rst | 6 ++
include/linux/netdevice.h | 1 +
net/core/net-sysfs.c | 73 +++++++++++--------
net/core/sysctl_net_core.c | 58 +++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 3 +
.../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++
7 files changed, 169 insertions(+), 30 deletions(-)
create mode 100755 tools/testing/selftests/net/rps_default_mask.sh
--
2.26.2
This series provides initial support for the ARMv9 Scalable Matrix
Extension (SME). SME takes the approach used for vectors in SVE and
extends this to provide architectural support for matrix operations. A
more detailed overview can be found in [1].
For the kernel SME can be thought of as a series of features which are
intended to be used together by applications but operate mostly
orthogonally:
- The ZA matrix register.
- Streaming mode, in which ZA can be accessed and a subset of SVE
features are available.
- A second vector length, used for streaming mode SVE and ZA and
controlled using a similar interface to that for SVE.
- TPIDR2, a new userspace controllable system register intended for use
by the C library for storing context related to the ZA ABI.
A substantial part of the series is dedicated to refactoring the
existing SVE support so that we don't need to duplicate code for
handling vector lengths and the SVE registers, this involves creating an
array of vector types and making the users take the vector type as a
parameter. I'm not 100% happy with this but wasn't able to come up with
anything better, duplicating code definitely felt like a bad idea so
this felt like the least bad thing. If this approach makes sense to
people it might make sense to split this off into a separate series
and/or merge it while the rest is pending review to try to make things a
little more digestable, the series is very large so it'd probably make
things easier to digest if some of the preparatory refactoring could be
merged before the rest is ready.
One feature of the architecture of particular note is that switching
to and from streaming mode may change the size of and invalidate the
contents of the SVE registers, and when in streaming mode the FFR is not
accessible. This complicates aspects of the ABI like signal handling
and ptrace.
This initial implementation is mainly intended to get the ABI in place,
there are several areas which will be worked on going forwards - some of
these will be blockers, others could be handled in followup serieses:
- SME is currently not supported for KVM guests, this will be done as a
followup series. A host system can use SME and run KVM guests but
SME is not available in the guests.
- The KVM host support is done in a very simplistic way, were anyone to
attempt to use it in production there would be performance impacts on
hosts with SME support. As part of this we also add enumeration of
fine grained traps.
- There is not currently ptrace or signal support TPIDR2, this will be
done as a followup series.
- No support is currently provided for scheduler control of SME or SME
applications, given the size of the SME register state the context
switch overhead may be noticable so this may be needed especially for
real time applications. Similar concerns already exist for larger
SVE vector lengths but are amplified for SME, particularly as the
vector length increases.
- There has been no work on optimising the performance of anything the
kernel does.
It is not expected that any systems will be encountered that support SME
but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9.
The code attempts to handle any such systems that are encountered but
this hasn't been tested extensively.
v14:
- Rebase onto v5.18-rc3.
v13:
- Preserve ZA in both parent and child on clone() and add a test case
for this.
- Fix EFI integration for FA64.
- Minor tweaks to the ABI document following Catlain's review.
- Add and make use of thread_get_cur_vl() helper.
- Fix some issues with SVE/FPSIMD register type moves in streaming SVE
ptrace.
- Typo fixes.
- Roll in separately posted series extending ptrace coverage in
kselftest for better integrated testing of the series.
v12:
- Fix some typos in the ABI document.
- Print a message when we skip a vector length in the signal tests.
- Add note of earliest toolchain versions with SME to manual encodings
for future reference now that's landed.
- Drop reference to PCS in sme.rst, it's not referenced and one of the
links was broken.
- Encode smstop and smstart as sysregs in the kernel.
- Don't redundantly flush the SVE register state when loading FPSIMD
state with SME enabled for the task, the architecture will do this
for us.
- Introduce and use task_get_cur_vl() to get the vector length for the
currently active SVE registers.
- Fix support for !FA64 mode in signal and syscall tests.
- Simplify instruction sequence for ssve_regs signal test.
- Actually include the ZA signal test in the patch set.
v11:
- Rebase onto v5.17-rc3.
- Provide a sme-inst.h to collect manual encodings in kselftest.
v10:
- Actually do the rebase of fixups from the previous version into
relevant patches.
v9:
- Remove defensive programming around IS_ENABLED() and FGT in KVM code.
- Fix naming of TPIDR2 FGT register bit.
- Add patches making handling of floating point register bits more
consistent (also sent as separate series).
- Drop now unused enumeration of fine grained traps.
v8:
- Rebase onto v5.17-rc1.
- Support interoperation with KVM, SME is disabled for KVM guests with
minimal handling for cleaning up SME state when entering and leaving
the guest.
- Document and implement that signal handlers are invoked with ZA and
streaming mode disabled.
- Use the RDSVL instruction introduced in EAC2 of the architecture to
obtain the streaming mode vector length during enumeration, ZA state
loading/saving and in test programs.
- Store a pointer to SVCR in fpsimd_last_state and use it in fpsimd_save()
for interoperation with KVM.
- Add a test case sme_trap_no_sm checking that we generate a SIGILL
when using an instruction that requires streaming mode without
enabling it.
- Add basic ZA context form validation to testcases helper library.
- Move signal tests over to validating streaming VL from ZA information.
- Pulled in patch removing ARRAY_SIZE() so that kselftest builds
cleanly and to avoid trivial conflicts.
v7:
- Rebase onto v5.16-rc3.
- Reduce indentation when supporting custom triggers for signal tests
as suggested by Catalin.
- Change to specifying a width for all CPU features rather than adding
single bit specific infrastructure.
- Don't require zeroing of non-shared SVE state during syscalls.
v6:
- Rebase onto v5.16-rc1.
- Return to disabling TIF_SVE on kernel entry even if we have SME
state, this avoids the need for KVM to handle the case where TIF_SVE
is set on guest entry.
- Add syscall-abi.h to SME updates to syscall-abi, mistakenly omitted
from commit.
v5:
- Rebase onto currently merged SVE and kselftest patches.
- Add support for the FA64 option, introduced in the recently published
EAC1 update to the specification.
- Pull in test program for the syscall ABI previously sent separately
with some revisions and add coverage for the SME ABI.
- Fix checking for options with 1 bit fields in ID_AA64SMFR0_EL1.
- Minor fixes and clarifications to the ABI documentation.
v4:
- Rebase onto merged patches.
- Remove an uneeded NULL check in vec_proc_do_default_vl().
- Include patch to factor out utility routines in kselftests written in
assembler.
- Specify -ffreestanding when building TPIDR2 test.
v3:
- Skip FFR rather than predicate registers in sve_flush_live().
- Don't assume a bool is all zeros in sve_flush_live() as per AAPCS.
- Don't redundantly specify a zero index when clearing FFR.
v2:
- Fix several issues with !SME and !SVE configurations.
- Preserve TPIDR2 when creating a new thread/process unless
CLONE_SETTLS is set.
- Report traps due to using features in an invalid mode as SIGILL.
- Spell out streaming mode behaviour in SVE ABI documentation more
directly.
- Document TPIDR2 in the ABI document.
- Use SMSTART and SMSTOP rather than read/modify/write sequences.
- Rework logic for exiting streaming mode on syscall.
- Don't needlessly initialise SVCR on access trap.
- Always restore SME VL for userspace if SME traps are disabled.
- Only yield to encourage preemption every 128 iterations in za-test,
otherwise do a getpid(), and validate SVCR after syscall.
- Leave streaming mode disabled except when reading the vector length
in za-test, and disable ZA after detecting a mismatch.
- Add SME support to vlset.
- Clarifications and typo fixes in comments.
- Move sme_alloc() forward declaration back a patch.
[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-…
Mark Brown (39):
kselftest/arm64: Fix comment for ptrace_sve_get_fpsimd_data()
kselftest/arm64: Remove assumption that tasks start FPSIMD only
kselftest/arm64: Validate setting via FPSIMD and read via SVE regsets
arm64/sme: Provide ABI documentation for SME
arm64/sme: System register and exception syndrome definitions
arm64/sme: Manually encode SME instructions
arm64/sme: Early CPU setup for SME
arm64/sme: Basic enumeration support
arm64/sme: Identify supported SME vector lengths at boot
arm64/sme: Implement sysctl to set the default vector length
arm64/sme: Implement vector length configuration prctl()s
arm64/sme: Implement support for TPIDR2
arm64/sme: Implement SVCR context switching
arm64/sme: Implement streaming SVE context switching
arm64/sme: Implement ZA context switching
arm64/sme: Implement traps and syscall handling for SME
arm64/sme: Disable ZA and streaming mode when handling signals
arm64/sme: Implement streaming SVE signal handling
arm64/sme: Implement ZA signal handling
arm64/sme: Implement ptrace support for streaming mode SVE registers
arm64/sme: Add ptrace support for ZA
arm64/sme: Disable streaming mode and ZA when flushing CPU state
arm64/sme: Save and restore streaming mode over EFI runtime calls
KVM: arm64: Hide SME system registers from guests
KVM: arm64: Trap SME usage in guest
KVM: arm64: Handle SME host state when running guests
arm64/sme: Provide Kconfig for SME
kselftest/arm64: Add manual encodings for SME instructions
kselftest/arm64: sme: Add SME support to vlset
kselftest/arm64: Add tests for TPIDR2
kselftest/arm64: Extend vector configuration API tests to cover SME
kselftest/arm64: sme: Provide streaming mode SVE stress test
kselftest/arm64: signal: Handle ZA signal context in core code
kselftest/arm64: Add stress test for SME ZA context switching
kselftest/arm64: signal: Add SME signal handling tests
kselftest/arm64: Add streaming SVE to SVE ptrace tests
kselftest/arm64: Add coverage for the ZA ptrace interface
kselftest/arm64: Add SME support to syscall ABI test
selftests/arm64: Add a testcase for handling of ZA on clone()
Documentation/arm64/elf_hwcaps.rst | 33 +
Documentation/arm64/index.rst | 1 +
Documentation/arm64/sme.rst | 428 +++++++++++++
Documentation/arm64/sve.rst | 70 ++-
arch/arm64/Kconfig | 11 +
arch/arm64/include/asm/cpu.h | 4 +
arch/arm64/include/asm/cpufeature.h | 24 +
arch/arm64/include/asm/el2_setup.h | 64 +-
arch/arm64/include/asm/esr.h | 13 +-
arch/arm64/include/asm/exception.h | 1 +
arch/arm64/include/asm/fpsimd.h | 123 +++-
arch/arm64/include/asm/fpsimdmacros.h | 87 +++
arch/arm64/include/asm/hwcap.h | 8 +
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/processor.h | 26 +-
arch/arm64/include/asm/sysreg.h | 67 ++
arch/arm64/include/asm/thread_info.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 8 +
arch/arm64/include/uapi/asm/ptrace.h | 69 ++-
arch/arm64/include/uapi/asm/sigcontext.h | 55 +-
arch/arm64/kernel/cpufeature.c | 106 ++++
arch/arm64/kernel/cpuinfo.c | 13 +
arch/arm64/kernel/entry-common.c | 11 +
arch/arm64/kernel/entry-fpsimd.S | 36 ++
arch/arm64/kernel/fpsimd.c | 585 ++++++++++++++++--
arch/arm64/kernel/process.c | 44 +-
arch/arm64/kernel/ptrace.c | 358 +++++++++--
arch/arm64/kernel/signal.c | 188 +++++-
arch/arm64/kernel/syscall.c | 29 +-
arch/arm64/kernel/traps.c | 1 +
arch/arm64/kvm/fpsimd.c | 43 +-
arch/arm64/kvm/hyp/nvhe/switch.c | 30 +
arch/arm64/kvm/hyp/vhe/switch.c | 11 +-
arch/arm64/kvm/sys_regs.c | 9 +-
arch/arm64/tools/cpucaps | 2 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 9 +
kernel/sys.c | 12 +
tools/testing/selftests/arm64/abi/.gitignore | 1 +
tools/testing/selftests/arm64/abi/Makefile | 9 +-
.../selftests/arm64/abi/syscall-abi-asm.S | 79 ++-
.../testing/selftests/arm64/abi/syscall-abi.c | 204 +++++-
.../testing/selftests/arm64/abi/syscall-abi.h | 15 +
tools/testing/selftests/arm64/abi/tpidr2.c | 298 +++++++++
tools/testing/selftests/arm64/fp/.gitignore | 5 +
tools/testing/selftests/arm64/fp/Makefile | 19 +-
tools/testing/selftests/arm64/fp/rdvl-sme.c | 14 +
tools/testing/selftests/arm64/fp/rdvl.S | 10 +
tools/testing/selftests/arm64/fp/rdvl.h | 1 +
tools/testing/selftests/arm64/fp/sme-inst.h | 51 ++
tools/testing/selftests/arm64/fp/ssve-stress | 59 ++
tools/testing/selftests/arm64/fp/sve-ptrace.c | 175 +++++-
tools/testing/selftests/arm64/fp/sve-test.S | 20 +
tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 +
tools/testing/selftests/arm64/fp/vlset.c | 10 +-
.../testing/selftests/arm64/fp/za-fork-asm.S | 61 ++
tools/testing/selftests/arm64/fp/za-fork.c | 156 +++++
tools/testing/selftests/arm64/fp/za-ptrace.c | 356 +++++++++++
tools/testing/selftests/arm64/fp/za-stress | 59 ++
tools/testing/selftests/arm64/fp/za-test.S | 388 ++++++++++++
.../testing/selftests/arm64/signal/.gitignore | 3 +
.../selftests/arm64/signal/test_signals.h | 4 +
.../arm64/signal/test_signals_utils.c | 6 +
.../testcases/fake_sigreturn_sme_change_vl.c | 92 +++
.../arm64/signal/testcases/sme_trap_no_sm.c | 38 ++
.../signal/testcases/sme_trap_non_streaming.c | 45 ++
.../arm64/signal/testcases/sme_trap_za.c | 36 ++
.../selftests/arm64/signal/testcases/sme_vl.c | 68 ++
.../arm64/signal/testcases/ssve_regs.c | 135 ++++
.../arm64/signal/testcases/testcases.c | 36 ++
.../arm64/signal/testcases/testcases.h | 3 +-
.../arm64/signal/testcases/za_regs.c | 128 ++++
73 files changed, 4991 insertions(+), 191 deletions(-)
create mode 100644 Documentation/arm64/sme.rst
create mode 100644 tools/testing/selftests/arm64/abi/syscall-abi.h
create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c
create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c
create mode 100644 tools/testing/selftests/arm64/fp/sme-inst.h
create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-fork-asm.S
create mode 100644 tools/testing/selftests/arm64/fp/za-fork.c
create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c
create mode 100644 tools/testing/selftests/arm64/fp/za-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-test.S
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_no_sm.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_non_streaming.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_za.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/za_regs.c
base-commit: b2d229d4ddb17db541098b83524d901257e93845
--
2.30.2
The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details). The invocation has
moved around since then because of the interaction with the munlock
logic but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be
any blocking operation before the memory is unmapped by exit_mmap so
the oom reaper invocation can be dropped. The unmapping part can be done
with the non-exclusive mmap_sem and the exclusive one is only required
when page tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to
read. This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.
[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
---
Notes:
- Rebased over git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
mm-unstable branch per Andrew's request but applies cleany to Linus' ToT
- Conflicts with maple-tree patchset. Resolving these was discussed in
https://lore.kernel.org/all/20220519223438.qx35hbpfnnfnpouw@revolver/
include/linux/oom.h | 2 --
mm/mmap.c | 31 ++++++++++++-------------------
mm/oom_kill.c | 2 +-
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 02d1e7bbd8cd..6cdde62b078b 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
return 0;
}
-bool __oom_reap_task_mm(struct mm_struct *mm);
-
long oom_badness(struct task_struct *p,
unsigned long totalpages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2b9305ed0dda..b7918e6bb0db 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3110,30 +3110,13 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Manually reap the mm to free as much memory as possible.
- * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
- * this mm from further consideration. Taking mm->mmap_lock for
- * write after setting MMF_OOM_SKIP will guarantee that the oom
- * reaper will not run on this mm again after mmap_lock is
- * dropped.
- *
- * Nothing can be holding mm->mmap_lock here and the above call
- * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
- * __oom_reap_task_mm() will not block.
- */
- (void)__oom_reap_task_mm(mm);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- }
-
- mmap_write_lock(mm);
+ mmap_read_lock(mm);
arch_exit_mmap(mm);
vma = mm->mmap;
if (!vma) {
/* Can happen if dup_mmap() received an OOM */
- mmap_write_unlock(mm);
+ mmap_read_unlock(mm);
return;
}
@@ -3143,6 +3126,16 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
+ mmap_read_unlock(mm);
+
+ /*
+ * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
+ * because the memory has been already freed. Do not bother checking
+ * mm_is_oom_victim because setting a bit unconditionally is cheaper.
+ */
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+
+ mmap_write_lock(mm);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8a70bca67c94..98dca2b42357 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -538,7 +538,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-bool __oom_reap_task_mm(struct mm_struct *mm)
+static bool __oom_reap_task_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma;
bool ret = true;
--
2.36.1.255.ge46751e96f-goog
Add bpf trampoline support for arm64. Most of the logic is the same as
x86.
Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
#9 /1 bpf_cookie/kprobe:OK
#9 /2 bpf_cookie/multi_kprobe_link_api:FAIL
#9 /3 bpf_cookie/multi_kprobe_attach_api:FAIL
#9 /4 bpf_cookie/uprobe:OK
#9 /5 bpf_cookie/tracepoint:OK
#9 /6 bpf_cookie/perf_event:OK
#9 /7 bpf_cookie/trampoline:OK
#9 /8 bpf_cookie/lsm:OK
#9 bpf_cookie:FAIL
#18 /1 bpf_tcp_ca/dctcp:OK
#18 /2 bpf_tcp_ca/cubic:OK
#18 /3 bpf_tcp_ca/invalid_license:OK
#18 /4 bpf_tcp_ca/dctcp_fallback:OK
#18 /5 bpf_tcp_ca/rel_setsockopt:OK
#18 bpf_tcp_ca:OK
#51 /1 dummy_st_ops/dummy_st_ops_attach:OK
#51 /2 dummy_st_ops/dummy_init_ret_value:OK
#51 /3 dummy_st_ops/dummy_init_ptr_arg:OK
#51 /4 dummy_st_ops/dummy_multiple_args:OK
#51 dummy_st_ops:OK
#55 fentry_fexit:OK
#56 fentry_test:OK
#57 /1 fexit_bpf2bpf/target_no_callees:OK
#57 /2 fexit_bpf2bpf/target_yes_callees:OK
#57 /3 fexit_bpf2bpf/func_replace:OK
#57 /4 fexit_bpf2bpf/func_replace_verify:OK
#57 /5 fexit_bpf2bpf/func_sockmap_update:OK
#57 /6 fexit_bpf2bpf/func_replace_return_code:OK
#57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK
#57 /8 fexit_bpf2bpf/func_replace_multi:OK
#57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK
#57 fexit_bpf2bpf:OK
#58 fexit_sleep:OK
#59 fexit_stress:OK
#60 fexit_test:OK
#67 get_func_args_test:OK
#68 get_func_ip_test:OK
#104 modify_return:OK
#237 xdp_bpf2bpf:OK
bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.
v5:
- As Alexei suggested, remove is_valid_bpf_tramp_flags()
v4: https://lore.kernel.org/bpf/20220517071838.3366093-1-xukuohai@huawei.com/
- Run the test cases on raspberry pi 4b
- Rebase and add cookie to trampoline
- As Steve suggested, move trace_direct_tramp() back to entry-ftrace.S to
avoid messing up generic code with architecture specific code
- As Jakub suggested, merge patch 4 and patch 5 of v3 to provide full function
in one patch
- As Mark suggested, add a comment for the use of aarch64_insn_patch_text_nosync()
- Do not generate trampoline for long jump to avoid triggering ftrace_bug
- Round stack size to multiples of 16B to avoid SPAlignmentFault
- Use callee saved register x20 to reduce the use of mov_i64
- Add missing BTI J instructions
- Trivial spelling and code style fixes
v3: https://lore.kernel.org/bpf/20220424154028.1698685-1-xukuohai@huawei.com/
- Append test results for bpf_tcp_ca, dummy_st_ops, fexit_bpf2bpf,
xdp_bpf2bpf
- Support to poke bpf progs
- Fix return value of arch_prepare_bpf_trampoline() to the total number
of bytes instead of number of instructions
- Do not check whether CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled in
arch_prepare_bpf_trampoline, since the trampoline may be hooked to a bpf
prog
- Restrict bpf_arch_text_poke() to poke bpf text only, as kernel functions
are poked by ftrace
- Rewrite trace_direct_tramp() in inline assembly in trace_selftest.c
to avoid messing entry-ftrace.S
- isolate arch_ftrace_set_direct_caller() with macro
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to avoid compile error
when this macro is disabled
- Some trivial code sytle fixes
v2: https://lore.kernel.org/bpf/20220414162220.1985095-1-xukuohai@huawei.com/
- Add Song's ACK
- Change the multi-line comment in is_valid_bpf_tramp_flags() into net
style (patch 3)
- Fix a deadloop issue in ftrace selftest (patch 2)
- Replace pt_regs->x0 with pt_regs->orig_x0 in patch 1 commit message
- Replace "bpf trampoline" with "custom trampoline" in patch 1, as
ftrace direct call is not only used by bpf trampoline.
v1: https://lore.kernel.org/bpf/20220413054959.1053668-1-xukuohai@huawei.com/
Xu Kuohai (6):
arm64: ftrace: Add ftrace direct call support
ftrace: Fix deadloop caused by direct call in ftrace selftest
bpf: Remove is_valid_bpf_tramp_flags()
bpf, arm64: Impelment bpf_arch_text_poke() for arm64
bpf, arm64: bpf trampoline for arm64
selftests/bpf: Fix trivial typo in fentry_fexit.c
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/ftrace.h | 22 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry-ftrace.S | 28 +-
arch/arm64/net/bpf_jit.h | 1 +
arch/arm64/net/bpf_jit_comp.c | 523 +++++++++++++++++-
arch/x86/net/bpf_jit_comp.c | 20 -
kernel/bpf/bpf_struct_ops.c | 3 +
kernel/bpf/trampoline.c | 3 +
kernel/trace/trace_selftest.c | 2 +
.../selftests/bpf/prog_tests/fentry_fexit.c | 4 +-
11 files changed, 570 insertions(+), 39 deletions(-)
--
2.30.2
From: Vincent Cheng <vincent.cheng.xh(a)renesas.com>
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
Vincent Cheng (3):
ptp: Add adjphase function to support phase offset control.
ptp: Add adjust_phase to ptp_clock_caps capability.
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
drivers/ptp/ptp_chardev.c | 1 +
drivers/ptp/ptp_clock.c | 3 ++
drivers/ptp/ptp_clockmatrix.c | 92 +++++++++++++++++++++++++++++++++++
drivers/ptp/ptp_clockmatrix.h | 8 ++-
include/linux/ptp_clock_kernel.h | 6 ++-
include/uapi/linux/ptp_clock.h | 4 +-
tools/testing/selftests/ptp/testptp.c | 6 ++-
7 files changed, 114 insertions(+), 6 deletions(-)
--
2.7.4
The default file permissions on a memfd include execute bits, which
means that such a memfd can be filled with a executable and passed to
the exec() family of functions. This is undesirable on systems where all
code is verified and all filesystems are intended to be mounted noexec,
since an attacker may be able to use a memfd to load unverified code and
execute it.
Additionally, execution via memfd is a common way to avoid scrutiny for
malicious code, since it allows execution of a program without a file
ever appearing on disk. This attack vector is not totally mitigated with
this new flag, since the default memfd file permissions must remain
executable to avoid breaking existing legitimate uses, but it should be
possible to use other security mechanisms to prevent memfd_create calls
without MFD_NOEXEC on systems where it is known that executable memfds
are not necessary.
This patch series adds a new MFD_NOEXEC flag for memfd_create(), which
allows creation of non-executable memfds, and as part of the
implementation of this new flag, it also adds a new F_SEAL_EXEC seal,
which will prevent modification of any of the execute bits of a sealed
memfd.
I am not sure if this is the best way to implement the desired behavior
(for example, the F_SEAL_EXEC seal is really more of an implementation
detail and feels a bit clunky to expose), so suggestions are welcome
for alternate approaches.
Daniel Verkamp (4):
mm/memfd: add F_SEAL_EXEC
mm/memfd: add MFD_NOEXEC flag to memfd_create
selftests/memfd: add tests for F_SEAL_EXEC
selftests/memfd: add tests for MFD_NOEXEC
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 1 +
mm/memfd.c | 12 ++-
mm/shmem.c | 6 ++
tools/testing/selftests/memfd/memfd_test.c | 114 +++++++++++++++++++++
5 files changed, 133 insertions(+), 1 deletion(-)
--
2.35.1.1094.g7c7d902a7c-goog