Hi, This fix, originally intended for XFRM/IPsec, has been recommended by Steffen Klassert to submit to the net tree.
The patch addresses a minor issue related to the IPv4 source address of ICMP error messages, which originated from an old 2011 commit:
415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.")
The omission of a "Fixes" tag in the following commit is deliberate to prevent potential test failures and subsequent regression issues that may arise from backporting this patch all stable kerenels. This is a minor fix, anot not security fix. With a seleftest I am submitting this to net-next tree.
v2->v3 : fix testscript. The IFS, space, got mangled. v1->v2 : add kernel selftest script
Antony Antony (2): xfrm: fix source address in icmp error generation from IPsec gateway selftests/net: add ICMP unreachable over IPsec tunnel
net/ipv4/icmp.c | 1 - tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/xfrm_state.sh | 624 ++++++++++++++++++++++ 3 files changed, 625 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/net/xfrm_state.sh
-- 2.30.2
When enabling support for XFRM lookup using reverse ICMP payload, we have identified an issue where the source address of the IPv4, e.g., "Destination Host Unreachable" message, is incorrect. The error message received by the sender appears to originate from a non-existing/unreachable host. IPv6 seems to behave correctly; respond with an existing source address from the gateway.
Here is example of incorrect source address for ICMP error response. When sending a ping to an unreachable host, the sender would receive an ICMP unreachable response with a fake source address. Rather the address of the host that generated ICMP Unreachable message. This is confusing and incorrect. A fllow up commit adds this example as a test case.
Example: ping -W 9 -w 5 -c 1 10.1.4.3 PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data. From 10.1.4.3 icmp_seq=1 Destination Host Unreachable
Notice : packet has the source address of the ICMP "Unreachable host!"
This issue can be traced back to commit 415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.") which introduced a change that copied the source address from the ICMP payload.
This commit would force to use source address from the gatway/host. The ICMP error message source address correctly set from the host.
After fixing: ping -W 5 -c 1 10.1.4.3 PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data. From 10.1.3.2 icmp_seq=1 Destination Host Unreachable
Here is an snippt to reporduce the issue.
export AB="10.1" for i in 1 2 3 4 5; do h="host${i}" ip netns add ${h} ip -netns ${h} link set lo up ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 if [ $i -lt 5 ]; then ip -netns ${h} link add eth0 type veth peer name eth10${i} ip -netns ${h} addr add "${AB}.${i}.1/24" dev eth0 ip -netns ${h} link set up dev eth0 fi done
for i in 1 2 3 4 5; do h="host${i}" p=$((i - 1)) ph="host${p}" # connect to previous host if [ $i -gt 1 ]; then ip -netns ${ph} link set eth10${p} netns ${h} ip -netns ${h} link set eth10${p} name eth1 ip -netns ${h} link set up dev eth1 ip -netns ${h} addr add "${AB}.${p}.2/24" dev eth1 fi # add forward routes for k in $(seq ${i} $((5 - 1))); do ip -netns ${h} route 2>/dev/null | (grep "${AB}.${k}.0" 2>/dev/null) || \ ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${i}.2" 2>/dev/nul done
# add reverse routes for k in $(seq 1 $((i - 2))); do ip -netns ${h} route 2>/dev/null | grep "${AB}.${k}.0" 2>/dev/null || \ ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${p}.1" 2>/dev/nul done done
ip netns exec host1 ping -q -W 2 -w 1 -c 1 10.1.4.2 2>&1>/dev/null && echo "success 10.1.4.2 reachable" || echo "ERROR" ip netns exec host1 ping -W 9 -w 5 -c 1 10.1.4.3 || echo "note the source address of unreachble of gateway" ip -netns host1 route flush cache
ip netns exec host3 nft add table inet filter ip netns exec host3 nft add chain inet filter FORWARD { type filter hook forward priority filter; policy drop ; } ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol icmp drop ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol esp accept ip netns exec host3 nft add rule inet filter FORWARD counter drop
ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \ flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel
ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir in \ tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel
ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir fwd \ flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel
ip -netns host2 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x1111111111111111111111111111111111111111 96 \ sel src 10.1.1.0/24 dst 10.1.4.0/24
ip -netns host2 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ flag icmp reqid 2 replay-window 10 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x2222222222222222222222222222222222222222 96
ip -netns host4 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir out \ flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 1 mode tunnel
ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir in \ tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel
ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir fwd \ flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel
ip -netns host4 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x2222222222222222222222222222222222222222 96
ip -netns host4 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ reqid 2 replay-window 20 flag icmp mode tunnel aead 'rfc4106(gcm(aes))' \ 0x1111111111111111111111111111111111111111 96 \ sel src 10.1.1.0/24 dst 10.1.4.0/24
ip netns exec host1 ping -W 5 -c 1 10.1.4.2 2>&1 > /dev/null && echo "" ip netns exec host1 ping -W 5 -c 1 10.1.4.3 || echo "note source address of gateway 10.1.3.2"
Again before the fix ping -W 5 -c 1 10.1.4.3 From 10.1.4.3 icmp_seq=1 Destination Host Unreachable
After the fix From 10.1.3.2 icmp_seq=1 Destination Host Unreachable
Signed-off-by: Antony Antony antony.antony@secunet.com Acked-by: Tobias Brunner tobias@strongswan.org --- net/ipv4/icmp.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 207482d30dc7..317ad5165408 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -558,7 +558,6 @@ static struct rtable *icmp_route_lookup(struct net *net, rt2 = dst_rtable(dst2); if (!IS_ERR(dst2)) { dst_release(&rt->dst); - memcpy(fl4, &fl4_dec, sizeof(*fl4)); rt = rt2; } else if (PTR_ERR(dst2) == -EPERM) { if (rt)
Add IPsec tunnel, aka xfrm state, tests with ICMP flags enabled. IPv4 and IPv6, unreachable tests over xfrm/IPsec tunnels, xfrm SA with "flag icmp" set.
Signed-off-by: Antony Antony antony.antony@secunet.com --- v2->v3: fix the IFS whitespace. It got mangled --- tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/xfrm_state.sh | 624 ++++++++++++++++++++++ 2 files changed, 625 insertions(+) create mode 100755 tools/testing/selftests/net/xfrm_state.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 5befca249452..7d96b3e411b7 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -53,6 +53,7 @@ TEST_PROGS += bind_bhash.sh TEST_PROGS += ip_local_port_range.sh TEST_PROGS += rps_default_mask.sh TEST_PROGS += big_tcp.sh +TEST_PROGS += xfrm_state.sh TEST_PROGS_EXTENDED := toeplitz_client.sh toeplitz.sh TEST_GEN_FILES = socket nettest TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any diff --git a/tools/testing/selftests/net/xfrm_state.sh b/tools/testing/selftests/net/xfrm_state.sh new file mode 100755 index 000000000000..26eac013abcf --- /dev/null +++ b/tools/testing/selftests/net/xfrm_state.sh @@ -0,0 +1,624 @@ +#!/bin/bash -u +# SPDX-License-Identifier: GPL-2.0 +# +# Checks for xfrm/ESP/IPsec tunnel. +# - The unreachable tests are for icmp error handling. +# As specified in IETF RFC 4301 section 6. +# +# See "test=" below for the implemented tests. +# +# Network topology default +# 10.1.c.d or IPv6 fc00:c::d/64 +# 1.1 1.2 2.1 2.2 3.1 3.2 4.1 4.2 5.1 5.2 6.1 6.2 +# eth0 eth1 eth0 eth1 eth0 eth1 eth0 eth1 eth0 eth1 eth0 eth1 +# a -------- r1 -------- s1 -------- r2 -------- s2 -------- r3 -------- b +# a, b = Alice and Bob hosts without IPsec. +# r1, r2, r3 routers without IPsec +# s1, s2, IPsec gateways/routers that setup tunnel(s). +# +# Network topology: x for IPsec gateway that generate ICMP response. +# 10.1.c.d or IPv6 fc00:c::d/64 +# 1.1 1.2 2.1 2.2 3.1 3.2 4.1 4.2 5.1 5.2 +# eth0 eth1 eth0 eth1 eth0 eth1 eth0 eth1 eth0 eth1 +# a -------- r1 -------- s1 -------- r2 -------- s2 -------- b +# + +source lib.sh + +PAUSE_ON_FAIL=no +VERBOSE=${VERBOSE:-0} +TRACING=0 + +# Name Description +tests=" + unreachable_ipv4 IPv4 unreachable from router r3 + unreachable_ipv4 IPv6 unreachable from router r3 + unreachable_gw_ipv4 IPv4 unreachable from IPsec gateway s2 + unreachable_gw_ipv6 IPv6 unreachable from IPsec gateway s2 + mtu_ipv4_r2 IPv4 MTU exceeded from ESP router r2 + mtu_ipv6_r2 IPv6 MTU exceeded from ESP router r2 + mtu_ipv4_r3 IPv4 MTU exceeded router r3 + mtu_ipv6_r3 IPv6 MTU exceeded router r3" + +ns_set="a r1 s1 r2 s2 r3 b" # Network topology default +imax=7 # number of namespaces in the test + +prefix4="10.1" +prefix6="fc00" + +run_cmd() { + cmd="$*" + + if [ "$VERBOSE" -gt 0 ]; then + printf " COMMAND: $cmd\n" + fi + + out="$($cmd 2>&1)" + rc=$? + if [ "$VERBOSE" -gt 1 -a -n "$out" ]; then + echo " $out" + echo + fi + return $rc +} + +run_test() { + ( + tname="$1" + tdesc="$2" + + + unset IFS + + fail="yes" + + # Since cleanup() relies on variables modified by this sub shell, it + # has to run in this context. + trap cleanup EXIT + + if [ "$VERBOSE" -gt 0 ]; then + printf "\n#####################################################################\n\n" + fi + + # if errexit was not set, set it and unset after test eval + errexit=0 + if [[ $- =~ "e" ]]; then + errexit=1 + else + set -e + fi + + eval test_${tname} + ret=$? + fail="no" + [ $errexit -eq 0 ] && set +e # hack until exception is fixed + + if [ $ret -eq 0 ]; then + printf "TEST: %-60s [ PASS ]\n" "${tdesc}" + elif [ $ret -eq 1 ]; then + printf "TEST: %-60s [FAIL]\n" "${tdesc}" + if [ "$VERBOSE" -eq 0 -o -n "${out}" -o -n "${out}" ]; then + echo "#####################################################################" + [ -n "${cmd}" ] && echo -e "${cmd}" + [ -n "${out}" ] && echo -e "${out}" + echo "#####################################################################" + fi + if [ "${PAUSE_ON_FAIL}" = "yes" ]; then + echo + echo "Pausing. Hit enter to continue" + read a + fi + err_flush + exit 1 + elif [ $ret -eq $ksft_skip ]; then + printf "TEST: %-60s [SKIP]\n" "${tdesc}" + err_flush + fi + + return $ret + ) + ret=$? + case $ret in + 0) + all_skipped=false + [ $exitcode -eq $ksft_skip ] && exitcode=0 + ;; + $ksft_skip) + [ $all_skipped = true ] && exitcode=$ksft_skip + ;; + *) + all_skipped=false + exitcode=1 + ;; + esac + + return $ret +} + +# Find the auto-generated name for this namespace +nsname() { + eval echo ns_$1 +} + +nscmd() { + eval echo "ip netns exec $1" +} + +setup_namespace() { + setup_ns NS_A + ns_a="ip netns exec ${NS_A}" +} + +setup_namespaces() { + local namespaces=""; + + NS_R1="" + NS_R2="" + NS_R3="" + for ns in ${ns_set}; do + n=$(nsname ${ns}) + n=$(echo $n | tr '[:lower:]' '[:upper:]') + namespaces="$namespaces ${n}" + done + + setup_ns $namespaces + + ns_active= #ordered list of namespaces for this test. + + [ -n NS_A ] && ns_a="ip netns exec ${NS_A}" && ns_active="${ns_active} $NS_A" + [ -n NS_R1 ] && ns_r1="ip netns exec ${NS_R1}" && ns_active="${ns_active} $NS_R1" + [ -n NS_S1 ] && ns_s1="ip netns exec ${NS_S1}" && ns_active="${ns_active} $NS_S1" + [ -n NS_R2 ] && ns_r2="ip netns exec ${NS_R2}" && ns_active="${ns_active} $NS_R2" + [ -n NS_S2 ] && ns_s2="ip netns exec ${NS_S2}" && ns_active="${ns_active} $NS_S2" + [ -n NS_R3 ] && ns_r3="ip netns exec ${NS_R3}" && ns_active="${ns_active} $NS_R3" + [ -n NS_B ] && ns_b="ip netns exec ${NS_B}" && ns_active="${ns_active} $NS_B" +} + +setup_addr_add() { + local ns_cmd=$(nscmd $1) + local ip0="$2" + local ip1="$3" + + if [ -n "${ip0}" ]; then + run_cmd ${ns_cmd} ip addr add ${ip0} dev eth0 + run_cmd ${ns_cmd} ip link set up eth0 + fi + if [ -n "${ip1}" ]; then + run_cmd ${ns_cmd} ip addr add ${ip1} dev eth1 + run_cmd ${ns_cmd} ip link set up eth1 + fi + run_cmd ${ns_cmd} sysctl -q net/ipv4/ip_forward=1 + run_cmd ${ns_cmd} sysctl -q net/ipv6/conf/all/forwarding=1 + + # Disable DAD, so that we don't have to wait to use the + # configured IPv6 addresses + run_cmd ${ns_cmd} sysctl -q net/ipv6/conf/default/accept_dad=0 +} + +route_add() { + local ns_cmd=$(nscmd $1) + local nhf=$2 + local nhr=$3 + local i=$4 + + if [ -n "${nhf}" ]; then + # add forward routes + for j in $(seq $((i + 1)) $imax); do + local route="${prefix}${s}${j}${S}0/${prefix_len}" + run_cmd ${ns_cmd} ip route replace "${route} via ${nhf}" + done + fi + + if [ -n "${nhr}" ]; then + # add reverse routes + for j in $(seq 1 $((i - 2))); do + local route="${prefix}${s}${j}${S}0/${prefix_len}" + run_cmd ${ns_cmd} ip route replace "${route} via ${nhr}" + done + fi +} + +veth_add() { + local ns_cmd=$(nscmd $1) + local tn="veth${2}1" + local ln=${3:-eth0} + run_cmd ${ns_cmd} ip link add ${ln} type veth peer name ${tn} +} + +setup_nft_add_icmp_filter() { + local ns_cmd=${ns_r2} + + run_cmd ${ns_cmd} nft add table inet filter + run_cmd ${ns_cmd} nft add chain inet filter FORWARD \ + { type filter hook forward priority filter; policy drop ; } + run_cmd ${ns_cmd} nft add rule inet filter FORWARD counter ip protocol \ + icmp counter log drop + run_cmd ${ns_cmd} nft add rule inet filter FORWARD counter ip protocol esp \ + counter log accept +} + +setup_nft_add_icmpv6_filter() { + local ns_cmd=${ns_r2} + + run_cmd ${ns_cmd} nft add table inet filter + run_cmd ${ns_cmd} nft add chain inet filter FORWARD { type filter \ + hook forward priority filter; policy drop ; } + run_cmd ${ns_cmd} nft add rule inet filter FORWARD ip6 nexthdr \ + ipv6-icmp icmpv6 type echo-request counter log drop + run_cmd ${ns_cmd} nft add rule inet filter FORWARD ip6 nexthdr esp \ + counter log accept + run_cmd ${ns_cmd} nft add rule inet filter FORWARD ip6 nexthdr \ + ipv6-icmp icmpv6 type {nd-neighbor-solicit,nd-neighbor-advert,\ + nd-router-solicit,nd-router-advert} counter log accept +} + +veth_mv() { + local ns=$1 + local nsp=$2 + local rn=${4:-eth1} + local tn="veth${3}1" + + run_cmd "$(nscmd ${nsp})" ip link set ${tn} netns ${ns} + run_cmd "$(nscmd ${ns})" ip link set ${tn} name ${rn} +} + +vm_set() { + s1_src=${src} + s1_dst=${dst} + s1_src_net=${src_net} + s1_dst_net=${dst_net} +} + +setup_vm_set_v4() { + src="10.1.3.1" + dst="10.1.4.2" + src_net="10.1.1.0/24" + dst_net="10.1.6.0/24" + + prefix=${prefix4} + prefix_len=24 + s="." + S="." + + vm_set +} + +setup_vm_set_v4x() { + ns_set="a r1 s1 r2 s2 b" # Network topology: x + imax=6 + prefix=${prefix4} + s="." + S="." + src="10.1.3.1" + dst="10.1.4.2" + src_net="10.1.1.0/24" + dst_net="10.1.5.0/24" + prefix_len=24 + + vm_set +} + +setup_vm_set_v6() { + imax=7 + prefix=${prefix6} + s=":" + S="::" + src="fc00:3::1" + dst="fc00:4::2" + src_net="fc00:1::0/64" + dst_net="fc00:6::0/64" + prefix_len=64 + + vm_set +} + +setup_vm_set_v6x() { + ns_set="a r1 s1 r2 s2 b" # Network topology: x + imax=6 + prefix=${prefix6} + s=":" + S="::" + src="fc00:3::1" + dst="fc00:4::2" + src_net="fc00:1::0/64" + dst_net="fc00:5::0/64" + prefix_len=64 + + vm_set +} + +setup_veths() { + i=1 + for ns in ${ns_active}; do + [ ${i} = ${imax} ] && continue + veth_add ${ns} ${i} + i=$((i + 1)) + done + + j=1 + for ns in ${ns_active}; do + if [ ${j} -eq 1 ]; then + p=${ns}; + pj=${j} + j=$((j + 1)) + continue + fi + veth_mv ${ns} "${p}" ${pj} + p=${ns} + pj=${j} + j=$((j + 1)) + done +} + +setup_routes() { + ip1="" + i=1 + for ns in ${ns_active}; do + # 10.1.C.1/24 + ip0="${prefix}${s}${i}${S}1/${prefix_len}" + [ "${ns}" = b ] && ip0="" + setup_addr_add ${ns} "${ip0}" "${ip1}" + # 10.1.C.2/24 + ip1="${prefix}${s}${i}${S}2/${prefix_len}" + i=$((i + 1)) + done + + i=1 + nhr="" + for ns in ${ns_active}; do + nhf="${prefix}${s}${i}${S}2" + [ "${ns}" = b ] && nhf="" + route_add ${ns} "${nhf}" "${nhr}" ${i} + nhr="${prefix}${s}${i}${S}1" + i=$((i + 1)) + done +} + +setup_xfrm() { + + run_cmd ${ns_s1} ip xfrm policy add src ${s1_src_net} dst ${s1_dst_net} dir out \ + tmpl src ${s1_src} dst ${s1_dst} proto esp reqid 1 mode tunnel + + # no "input" policies. we are only doing forwarding. + # run_cmd ${ns_s1} ip xfrm policy add src ${s1_dst_net} dst ${s1_src_net} dir in \ + # flag icmp tmpl src ${s1_dst} dst ${s1_src} proto esp reqid 2 mode tunnel + + run_cmd ${ns_s1} ip xfrm policy add src ${s1_dst_net} dst ${s1_src_net} dir fwd \ + flag icmp tmpl src ${s1_dst} dst ${s1_src} proto esp reqid 2 mode tunnel + + run_cmd ${ns_s1} ip xfrm state add src ${s1_src} dst ${s1_dst} proto esp spi 1 \ + reqid 1 mode tunnel aead 'rfc4106(gcm(aes))' \ + 0x1111111111111111111111111111111111111111 96 \ + sel src ${s1_src_net} dst ${s1_dst_net} + + run_cmd ${ns_s1} ip xfrm state add src ${s1_dst} dst ${s1_src} proto esp spi 2 \ + reqid 2 flag icmp replay-window 8 mode tunnel aead 'rfc4106(gcm(aes))' \ + 0x2222222222222222222222222222222222222222 96 \ + sel src ${s1_dst_net} dst ${s1_src_net} + + run_cmd ${ns_s2} ip xfrm policy add src ${s1_dst_net} dst ${s1_src_net} dir out \ + flag icmp tmpl src ${s1_dst} dst ${s1_src} proto esp reqid 2 mode tunnel + + run_cmd ${ns_s2} ip xfrm policy add src ${s1_src_net} dst ${s1_dst_net} dir fwd \ + tmpl src ${s1_src} dst ${s1_dst} proto esp reqid 1 mode tunnel + + run_cmd ${ns_s2} ip xfrm state add src ${s1_dst} dst ${s1_src} proto esp spi 2 \ + reqid 2 mode tunnel aead 'rfc4106(gcm(aes))' \ + 0x2222222222222222222222222222222222222222 96 \ + sel src ${s1_dst_net} dst ${s1_src_net} + + run_cmd ${ns_s2} ip xfrm state add src ${s1_src} dst ${s1_dst} proto esp spi 1 \ + reqid 1 flag icmp replay-window 8 mode tunnel aead 'rfc4106(gcm(aes))' \ + 0x1111111111111111111111111111111111111111 96 \ + sel src ${s1_src_net} dst ${s1_dst_net} +} + +setup() { + [ "$(id -u)" -ne 0 ] && echo " need to run as root" && return $ksft_skip + + for arg do + eval setup_${arg} || { echo " ${arg} not supported"; return 1; } + done +} + +trace() { + [ $TRACING -eq 0 ] && return + + for arg do + [ "${ns_cmd}" = "" ] && ns_cmd="${arg}" && continue + ns_cmd= + done + sleep 1 +} + +cleanup() { + if [ "${fail}" = "yes" -a -n "${desc}" ]; then + printf "TEST: %-60s [ FAIL ]\n" "${desc}" + [ -n "${cmd}" ] && echo -e "${cmd}\n" + [ -n "${out}" ] && echo -e "${out}\n" + fi + + cleanup_all_ns +} + +mtu() { + ns_cmd="${1}" + dev="${2}" + mtu="${3}" + + ${ns_cmd} ip link set dev ${dev} mtu ${mtu} +} + +mtu_parse() { + input="${1}" + + next=0 + for i in ${input}; do + [ ${next} -eq 1 -a "${i}" = "lock" ] && next=2 && continue + [ ${next} -eq 1 ] && echo "${i}" && return + [ ${next} -eq 2 ] && echo "lock ${i}" && return + [ "${i}" = "mtu" ] && next=1 + done +} + +link_get() { + ns_cmd="${1}" + name="${2}" + + ${ns_cmd} ip link show dev "${name}" +} + +link_get_mtu() { + ns_cmd="${1}" + name="${2}" + + mtu_parse "$(link_get "${ns_cmd}" ${name})" +} + +test_unreachable_ipv6() { + setup vm_set_v6 namespaces veths routes xfrm nft_add_icmpv6_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 fc00:6::2 + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 fc00:6::3 || true + rc=0 + echo -e "$out" | grep -q -E 'From fc00:5::2 icmp_seq.* Destination' || rc=1 + return ${rc} +} + +test_unreachable_gw_ipv6() { + setup vm_set_v6x namespaces veths routes xfrm nft_add_icmpv6_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 fc00:5::2 + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 fc00:5::3 || true + rc=0 + echo -e "$out" | grep -q -E 'From fc00:4::2 icmp_seq.* Destination' || rc=1 + return ${rc} +} + +test_unreachable_gw_ipv4() { + setup vm_set_v4x namespaces veths routes xfrm nft_add_icmp_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 10.1.5.2 + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 10.1.5.3 || true + rc=0 + echo -e "$out" | grep -q -E 'From 10.1.4.2 icmp_seq.* Destination' || rc=1 + return ${rc} +} + +test_unreachable_ipv4() { + setup vm_set_v4 namespaces veths routes xfrm nft_add_icmp_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 10.1.6.2 + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 10.1.6.3 || true + rc=0 + echo -e "$out" | grep -q -E 'From 10.1.5.2 icmp_seq.* Destination' || rc=1 + return ${rc} +} + +test_mtu_ipv4_r2() { + setup vm_set_v4 namespaces veths routes xfrm nft_add_icmp_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 10.1.6.2 + run_cmd ${ns_r2} ip route replace 10.1.3.0/24 dev eth1 src 10.1.3.2 mtu 1300 + run_cmd ${ns_r2} ip route replace 10.1.4.0/24 dev eth0 src 10.1.4.1 mtu 1300 + run_cmd ${ns_a} ping -M do -s 1300 -W 5 -w 4 -c 1 10.1.6.2 || true + rc=0 + # note the error should be s1 not from r2 + echo -e "$out" | grep -q -E "From 10.1.2.2 icmp_seq=.* Frag needed and DF set" || rc=1 + return ${rc} +} + +test_mtu_ipv6_r2() { + setup vm_set_v6 namespaces veths routes xfrm nft_add_icmpv6_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 fc00:6::2 + run_cmd ${ns_r2} ip -6 route replace fc00:3::/64 dev eth1 metric 256 src fc00:3::2 mtu 1300 + run_cmd ${ns_r2} ip -6 route replace fc00:4::/64 dev eth0 metric 256 src fc00:4::1 mtu 1300 + run_cmd ${ns_a} ping -M do -s 1300 -W 5 -w 4 -c 1 fc00:6::2 || true + rc=0 + # note the error should be s1 not from r2 + echo -e "$out" | grep -q -E "From fc00:2::2 icmp_seq=.* Packet too big: mtu=1230" || rc=1 + return ${rc} +} + +test_mtu_ipv4_r3() { + setup vm_set_v4 namespaces veths routes xfrm nft_add_icmp_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 10.1.6.2 + run_cmd ${ns_r3} ip route replace 10.1.6.0/24 dev eth0 src 10.1.6.1 mtu 1300 + run_cmd ${ns_a} ping -M do -s 1350 -W 5 -w 4 -c 1 10.1.6.2 || true + rc=0 + echo -e "$out" | grep -q -E "From 10.1.5.2 icmp_seq=.* Frag needed and DF set (mtu = 1300)" || rc=1 + return ${rc} +} + +test_mtu_ipv6_r3() { + setup vm_set_v6 namespaces veths routes xfrm nft_add_icmpv6_filter || return $ksft_skip + run_cmd ${ns_a} ping -W 5 -w 4 -c 1 fc00:6::2 + run_cmd ${ns_r3} ip -6 route replace fc00:6::/64 dev eth1 metric 256 src fc00:6::1 mtu 1300 + run_cmd ${ns_a} ping -M do -s 1300 -W 5 -w 4 -c 1 fc00:6::2 || true + rc=0 + # note the error should be s1 not from r2 + echo -e "$out" | grep -q -E "From fc00:5::2 icmp_seq=.* Packet too big: mtu=1300" || rc=1 + return ${rc} +} + +################################################################################ +# +usage() { + echo + echo "$0 [OPTIONS] [TEST]..." + echo "If no TEST argument is given, all tests will be run." + echo + echo -e "\t-p Pause on fail" + echo -e "\t-v Verbose output. Show commands; -vv Show output also" + echo "Available tests${tests}" + exit 1 +} + +################################################################################ +# +exitcode=0 +desc=0 +all_skipped=true +out= +cmd= + +while getopts :pv o +do + case $o in + p) PAUSE_ON_FAIL=yes;; + v) VERBOSE=$(( VERBOSE + 1 ));; + *) usage;; + esac +done +shift $(($OPTIND-1)) + +IFS=" +" + +for arg do + # Check first that all requested tests are available before running any + command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not found"; usage; } +done + +trap cleanup EXIT + +name="" +desc="" +fail="no" + +# end cleanup +cleanup + +for t in ${tests}; do + [ "${name}" = "" ] && name="${t}" && continue + [ "${desc}" = "" ] && desc="${t}" + + run_this=1 + for arg do + [ "${arg}" != "${arg#--*}" ] && continue + [ "${arg}" = "${name}" ] && run_this=1 && break + run_this=0 + done + if [ $run_this -eq 1 ]; then + run_test "${name}" "${desc}" + fi + name="" + desc="" +done + +exit ${exitcode}
On Mon, 6 May 2024 10:05:54 +0200 Antony Antony wrote:
Add IPsec tunnel, aka xfrm state, tests with ICMP flags enabled. IPv4 and IPv6, unreachable tests over xfrm/IPsec tunnels, xfrm SA with "flag icmp" set.
Doesn't seem to work:
# selftests: net: xfrm_state.sh # ./xfrm_state.sh: line 91: test_: command not found # TEST: unreachable_ipv4IPv6 unreachable from router r3 [ FAIL ] # ./xfrm_state.sh: line 91: test_: command not found # TEST: unreachable_gw_ipv6IPv6 unreachable from IPsec gateway s2 [ FAIL ] # ./xfrm_state.sh: line 91: test_: command not found # TEST: mtu_ipv6_r2IPv6 MTU exceeded from ESP router r2 [ FAIL ] # ./xfrm_state.sh: line 91: test_: command not found # TEST: mtu_ipv6_r3IPv6 MTU exceeded router r3 [ FAIL ] not ok 1 selftests: net: xfrm_state.sh # exit=1
Hi Jakub,
On Mon, May 06, 2024 at 06:28:30AM -0700, Jakub Kicinski via Devel wrote:
On Mon, 6 May 2024 10:05:54 +0200 Antony Antony wrote:
Add IPsec tunnel, aka xfrm state, tests with ICMP flags enabled. IPv4 and IPv6, unreachable tests over xfrm/IPsec tunnels, xfrm SA with "flag icmp" set.
Doesn't seem to work:
thanks. I am looking into it. I notice two issues.
# selftests: net: xfrm_state.sh # ./xfrm_state.sh: line 91: test_: command not found # TEST: unreachable_ipv4IPv6 unreachable from router r3 [ FAIL ]
This appears to be an error from the v2 run, which was sent yesterday. The v3 patch should have superseded it.
The branch net-dev-testing/net-next-2024-05-06--12-00 contains the v2 patch. I wonder if net-dev testing recognized v3 patch.
git diff net-next-2024-05-06--12-00 net-next-2024-05-06--03-00 ./tools/testing/selftests/net/xfrm_state.sh is missing the expected one line diff in IFS.
# ./xfrm_state.sh: line 91: test_: command not found # TEST: unreachable_gw_ipv6IPv6 unreachable from IPsec gateway s2 [ FAIL ] # ./xfrm_state.sh: line 91: test_: command not found # TEST: mtu_ipv6_r2IPv6 MTU exceeded from ESP router r2 [ FAIL ] # ./xfrm_state.sh: line 91: test_: command not found # TEST: mtu_ipv6_r3IPv6 MTU exceeded router r3 [ FAIL ] not ok 1 selftests: net: xfrm_state.sh # exit=1
I suspect there is another another issue with tools/testing/selftests/net/config . It does not appear to support nftables match for ESP. Which this script assumes.
# ip netns exec ns_r2-39oUmE nft add rule inet filter FORWARD counter ip protocol esp counter log accept # # Error: Could not process rule: No such file or directory # add rule inet filter FORWARD counter ip protocol esp counter log accept # ^^^^^^
I learning vng also. I will send v4 with change to config, then I hope the test runner will pick up the latest patch.
-antony
On Mon, 6 May 2024 17:37:54 +0200 Antony Antony wrote:
This appears to be an error from the v2 run, which was sent yesterday. The v3 patch should have superseded it.
The branch net-dev-testing/net-next-2024-05-06--12-00 contains the v2 patch. I wonder if net-dev testing recognized v3 patch.
You're right! I guess the pw-bot didn't discard v2 because of the capitalization change in the subject of the cover letter.
git diff net-next-2024-05-06--12-00 net-next-2024-05-06--03-00 ./tools/testing/selftests/net/xfrm_state.sh is missing the expected one line diff in IFS.
# ./xfrm_state.sh: line 91: test_: command not found # TEST: unreachable_gw_ipv6IPv6 unreachable from IPsec gateway s2 [ FAIL ] # ./xfrm_state.sh: line 91: test_: command not found # TEST: mtu_ipv6_r2IPv6 MTU exceeded from ESP router r2 [ FAIL ] # ./xfrm_state.sh: line 91: test_: command not found # TEST: mtu_ipv6_r3IPv6 MTU exceeded router r3 [ FAIL ] not ok 1 selftests: net: xfrm_state.sh # exit=1
I suspect there is another another issue with tools/testing/selftests/net/config . It does not appear to support nftables match for ESP. Which this script assumes.
# ip netns exec ns_r2-39oUmE nft add rule inet filter FORWARD counter ip protocol esp counter log accept # # Error: Could not process rule: No such file or directory # add rule inet filter FORWARD counter ip protocol esp counter log accept # ^^^^^^
I learning vng also. I will send v4 with change to config, then I hope the test runner will pick up the latest patch.
👍️
Hi Antony,
2024-05-06, 10:05:54 +0200, Antony Antony wrote:
diff --git a/tools/testing/selftests/net/xfrm_state.sh b/tools/testing/selftests/net/xfrm_state.sh new file mode 100755 index 000000000000..26eac013abcf --- /dev/null +++ b/tools/testing/selftests/net/xfrm_state.sh
[...]
+run_test() {
- (
- tname="$1"
- tdesc="$2"
- unset IFS
- fail="yes"
- # Since cleanup() relies on variables modified by this sub shell, it
- # has to run in this context.
- trap cleanup EXIT
- if [ "$VERBOSE" -gt 0 ]; then
printf "\n#####################################################################\n\n"
- fi
- # if errexit was not set, set it and unset after test eval
- errexit=0
- if [[ $- =~ "e" ]]; then
errexit=1
- else
set -e
- fi
- eval test_${tname}
- ret=$?
- fail="no"
- [ $errexit -eq 0 ] && set +e # hack until exception is fixed
What needs to be fixed?
+setup_namespace() {
Is this one actually used? I can't find a reference to "namespace" (singular) in this script.
- setup_ns NS_A
- ns_a="ip netns exec ${NS_A}"
+}
+veth_add() {
- local ns_cmd=$(nscmd $1)
- local tn="veth${2}1"
- local ln=${3:-eth0}
- run_cmd ${ns_cmd} ip link add ${ln} type veth peer name ${tn}
Why not just create the peer directly in the correct namespace and with the correct name? That would avoid the mess of moving/renaming with veth_mv, and the really hard to read loop in setup_veths.
+}
[...]
+setup_vm_set_v4x() {
- ns_set="a r1 s1 r2 s2 b" # Network topology: x
- imax=6
It would be more robust to set ns_set, imax, and all other parameters in every setup, so that the right topology is always used even if the test order changes. Currently I'm not sure which topology is used in which test, except the ones that use setup_vm_set_v4x and setup_vm_set_v6x.
- prefix=${prefix4}
- s="."
- S="."
- src="10.1.3.1"
- dst="10.1.4.2"
- src_net="10.1.1.0/24"
- dst_net="10.1.5.0/24"
- prefix_len=24
- vm_set
+}
[...]
+setup_veths() {
- i=1
- for ns in ${ns_active}; do
[ ${i} = ${imax} ] && continue
IIUC imax should be the last, so s/continue/break/ ?
veth_add ${ns} ${i}
i=$((i + 1))
- done
- j=1
- for ns in ${ns_active}; do
if [ ${j} -eq 1 ]; then
p=${ns};
pj=${j}
j=$((j + 1))
continue
fi
veth_mv ${ns} "${p}" ${pj}
p=${ns}
pj=${j}
j=$((j + 1))
- done
+}
+setup_routes() {
- ip1=""
- i=1
- for ns in ${ns_active}; do
# 10.1.C.1/24
ip0="${prefix}${s}${i}${S}1/${prefix_len}"
[ "${ns}" = b ] && ip0=""
setup_addr_add ${ns} "${ip0}" "${ip1}"
# 10.1.C.2/24
ip1="${prefix}${s}${i}${S}2/${prefix_len}"
i=$((i + 1))
This loop is really hard to follow :/ It would probably be easier to read if setup_addr_add only installed exactly one address (instead of conditionally adding maybe 2), and checking here whether the address needs to be added ("${ns}" != b, i -ne 1).
- done
- i=1
- nhr=""
- for ns in ${ns_active}; do
nhf="${prefix}${s}${i}${S}2"
[ "${ns}" = b ] && nhf=""
route_add ${ns} "${nhf}" "${nhr}" ${i}
nhr="${prefix}${s}${i}${S}1"
i=$((i + 1))
I'd suggest the same here, split route_add into route_add_{forward,reverse} and only call the right one (or both) for the current iteration.
- done
+}
[...]
+setup() {
- [ "$(id -u)" -ne 0 ] && echo " need to run as root" && return $ksft_skip
- for arg do
eval setup_${arg} || { echo " ${arg} not supported"; return 1; }
- done
+}
+trace() {
Unused?
- [ $TRACING -eq 0 ] && return
Then you can also get rid of that variable at the top.
[...]
+mtu() {
- ns_cmd="${1}"
- dev="${2}"
- mtu="${3}"
- ${ns_cmd} ip link set dev ${dev} mtu ${mtu}
+}
+mtu_parse() {
- input="${1}"
- next=0
- for i in ${input}; do
[ ${next} -eq 1 -a "${i}" = "lock" ] && next=2 && continue
[ ${next} -eq 1 ] && echo "${i}" && return
[ ${next} -eq 2 ] && echo "lock ${i}" && return
[ "${i}" = "mtu" ] && next=1
- done
+}
+link_get() {
- ns_cmd="${1}"
- name="${2}"
- ${ns_cmd} ip link show dev "${name}"
+}
+link_get_mtu() {
- ns_cmd="${1}"
- name="${2}"
- mtu_parse "$(link_get "${ns_cmd}" ${name})"
+}
All those also seem completely unused by this script. Please don't just c/p from other selftests without checking.
2024-05-06, 09:58:26 +0200, Antony Antony wrote:
Hi, This fix, originally intended for XFRM/IPsec, has been recommended by Steffen Klassert to submit to the net tree.
The patch addresses a minor issue related to the IPv4 source address of ICMP error messages, which originated from an old 2011 commit:
415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.")
The omission of a "Fixes" tag in the following commit is deliberate to prevent potential test failures and subsequent regression issues that may arise from backporting this patch all stable kerenels.
What kind of regression do you expect? If there's a risk of regression, I'm not sure net-next is that much "better" than net or stable. If a user complains about the new behavior breaking their setup, my understanding is that you would likely have to revert the patch anyway, or at least add some way to toggle the behavior.
Hi Sabrina,
On Mon, May 06, 2024 at 03:36:15PM +0200, Sabrina Dubroca via Devel wrote:
2024-05-06, 09:58:26 +0200, Antony Antony wrote:
Hi, This fix, originally intended for XFRM/IPsec, has been recommended by Steffen Klassert to submit to the net tree.
The patch addresses a minor issue related to the IPv4 source address of ICMP error messages, which originated from an old 2011 commit:
415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.")
The omission of a "Fixes" tag in the following commit is deliberate to prevent potential test failures and subsequent regression issues that may arise from backporting this patch all stable kerenels.
What kind of regression do you expect? If there's a risk of
For example, an old testing scripts with hardcoded source IP address assume that the "Unreachable response" will have the previous behavior. Such testing script may trigger regression when this patch is backported. Consequently, there may be discussions on whether this patch has broken the 10-year-old test scripts, which may be hard to fix.
regression, I'm not sure net-next is that much "better" than net or stable. If a user complains about the new behavior breaking their setup, my understanding is that you would likely have to revert the patch anyway, or at least add some way to toggle the behavior.
My hope is that if this patch is applied to net-next without a "Fixes" tag, users would fix their testing scripts properly. Additionally, another piece of the puzzle for a complete fix is "forwarding of ICMP Error messages" patch that is in the kerenl 6.8, which is new feature and applied via ipsec-next.
-antony
2024-05-06, 17:57:23 +0200, Antony Antony wrote:
Hi Sabrina,
On Mon, May 06, 2024 at 03:36:15PM +0200, Sabrina Dubroca via Devel wrote:
2024-05-06, 09:58:26 +0200, Antony Antony wrote:
Hi, This fix, originally intended for XFRM/IPsec, has been recommended by Steffen Klassert to submit to the net tree.
The patch addresses a minor issue related to the IPv4 source address of ICMP error messages, which originated from an old 2011 commit:
415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.")
The omission of a "Fixes" tag in the following commit is deliberate to prevent potential test failures and subsequent regression issues that may arise from backporting this patch all stable kerenels.
What kind of regression do you expect? If there's a risk of
For example, an old testing scripts with hardcoded source IP address assume that the "Unreachable response" will have the previous behavior. Such testing script may trigger regression when this patch is backported. Consequently, there may be discussions on whether this patch has broken the 10-year-old test scripts, which may be hard to fix.
Ok, that seems like an acceptable level of "regression" to me. Thanks for explaining.
regression, I'm not sure net-next is that much "better" than net or stable. If a user complains about the new behavior breaking their setup, my understanding is that you would likely have to revert the patch anyway, or at least add some way to toggle the behavior.
My hope is that if this patch is applied to net-next without a "Fixes" tag, users would fix their testing scripts properly.
I don't think the lack of a fixes tag will make people fix broken test scripts, but maybe I'm too pessimistic.
linux-kselftest-mirror@lists.linaro.org