The series is a host of cleanups to the openvswitch selftest suite which should be ready to run under the netdev selftest runners using vng. For now, the testing has been done with RW directories, but additional testing will be done to try and keep it all as RO to be more friendly.
There is one more test case I plan which will print the debug log details when a test case fails so that a developer can get a clear picture why the test case failed. That will be done for the proper submission as another patch in this series.
Additionally, the timeout setting was just an arbitrary number that I picked, but needs more testing to tune it properly (since 5 minutes may be a bit too long).
Tested on fedora 38 using virtme-ng with the following commandline:
../virtme-ng/vng -v --run . --user root --cpus 4 \ --rwdir=/home/aconole/git/linux/tools/testing/selftests/net/openvswitch/ \ -- \ make -C tools/testing/selftests/net/openvswitch \ TARGETS=openvswitch TEST_PROGS=openvswitch.sh run_tests
Aaron Conole (7): selftests: openvswitch: add test case error directories to clean list selftests: openvswitch: be more verbose with selftest debugging selftests: openvswitch: use non-graceful kills when needed selftests: openvswitch: delete previously allocated netns selftests: openvswitch: make arping test a bit 'slower' selftests: openvswitch: insert module when running the tests selftests: openvswitch: add config and timeout settings
.../selftests/net/openvswitch/Makefile | 12 ++++- .../testing/selftests/net/openvswitch/config | 50 +++++++++++++++++++ .../selftests/net/openvswitch/openvswitch.sh | 33 +++++++++--- .../selftests/net/openvswitch/settings | 1 + 4 files changed, 89 insertions(+), 7 deletions(-) create mode 100644 tools/testing/selftests/net/openvswitch/config create mode 100644 tools/testing/selftests/net/openvswitch/settings
Normally, the openvswitch selftests don't keep error files around, but if debugging, there is an option to keep these files. The 'clean' target should be informed that they exist to ensure they are deleted properly.
Signed-off-by: Aaron Conole aconole@redhat.com --- tools/testing/selftests/net/openvswitch/Makefile | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/openvswitch/Makefile b/tools/testing/selftests/net/openvswitch/Makefile index 2f1508abc826..e6c4a89cc14a 100644 --- a/tools/testing/selftests/net/openvswitch/Makefile +++ b/tools/testing/selftests/net/openvswitch/Makefile @@ -8,6 +8,16 @@ TEST_PROGS := openvswitch.sh
TEST_FILES := ovs-dpctl.py
-EXTRA_CLEAN := test_netlink_checks +OVS_TESTS := \ + test_arp_ping \ + test_ct_connect_v4 \ + test_connect_v4 \ + test_nat_connect_v4 \ + test_nat_related_v4 \ + test_netlink_checks \ + test_upcall_interfaces \ + test_drop_reason + +EXTRA_CLEAN := $(OVS_TESTS)
include ../../lib.mk
The openvswitch selftest is difficult to debug for anyone that isn't directly familiar with the openvswitch module and the specifics of the test cases. Many times when something fails, the debug log will be sparsely populated and it takes some time to understand where a failure occured.
Increase the amount of details logged to the debug log by trapping all 'info' logs, and all 'ovs_sbx' commands.
Signed-off-by: Aaron Conole aconole@redhat.com --- tools/testing/selftests/net/openvswitch/openvswitch.sh | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 87b80bee6df4..a5dbde482ba4 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -23,7 +23,9 @@ tests=" drop_reason drop: test drop reasons are emitted"
info() { - [ $VERBOSE = 0 ] || echo $* + [ "${ovs_dir}" != "" ] && + echo "`date +"[%m-%d %H:%M:%S]"` $*" >> ${ovs_dir}/debug.log + [ $VERBOSE = 0 ] || echo $* }
ovs_base=`pwd` @@ -65,7 +67,8 @@ ovs_setenv() {
ovs_sbx() { if test "X$2" != X; then - (ovs_setenv $1; shift; "$@" >> ${ovs_dir}/debug.log) + (ovs_setenv $1; shift; + info "run cmd: $@"; "$@" >> ${ovs_dir}/debug.log) else ovs_setenv $1 fi @@ -139,9 +142,10 @@ ovs_add_flow () { info "Adding flow to DP: sbx:$1 br:$2 flow:$3 act:$4" ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py add-flow "$2" "$3" "$4" if [ $? -ne 0 ]; then - echo "Flow [ $3 : $4 ] failed" >> ${ovs_dir}/debug.log + info "Flow [ $3 : $4 ] failed" return 1 fi + info "Flow added." return 0 }
Normally a spawned process under OVS is given a SIGTERM when the test ends as part of cleanup. However, in case the process is still lingering for some reason, we also send a SIGKILL to force it down faster.
Signed-off-by: Aaron Conole aconole@redhat.com --- tools/testing/selftests/net/openvswitch/openvswitch.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index a5dbde482ba4..678a72ad47c1 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -91,7 +91,8 @@ ovs_add_if () { python3 $ovs_base/ovs-dpctl.py add-if \ -u "$2" "$3" >$ovs_dir/$3.out 2>$ovs_dir/$3.err & pid=$! - on_exit "ovs_sbx $1 kill -TERM $pid 2>/dev/null" + on_exit "ovs_sbx $1 kill --timeout 1000 TERM \ + --timeout 1000 KILL $pid 2>/dev/null" fi }
@@ -108,7 +109,8 @@ ovs_netns_spawn_daemon() { info "spawning cmd: $*" ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & pid=$! - ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null" + ovs_sbx "$sbx" on_exit "kill --timeout 1000 TERM \ + --timeout 1000 KILL $pid 2>/dev/null" }
ovs_add_netns_and_veths () {
On 2/16/24 16:28, Aaron Conole wrote:
Normally a spawned process under OVS is given a SIGTERM when the test ends as part of cleanup. However, in case the process is still lingering for some reason, we also send a SIGKILL to force it down faster.
Signed-off-by: Aaron Conole aconole@redhat.com
tools/testing/selftests/net/openvswitch/openvswitch.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index a5dbde482ba4..678a72ad47c1 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -91,7 +91,8 @@ ovs_add_if () { python3 $ovs_base/ovs-dpctl.py add-if \ -u "$2" "$3" >$ovs_dir/$3.out 2>$ovs_dir/$3.err & pid=$!
on_exit "ovs_sbx $1 kill -TERM $pid 2>/dev/null"
on_exit "ovs_sbx $1 kill --timeout 1000 TERM \
fi }--timeout 1000 KILL $pid 2>/dev/null"
AFAIK, this will immediately send TERM, then wait 1s, send TERM again, wait 1s then send KILL. Is that what you intended? To avoid the double TERM you could:
kill --timeout 1000 KILL --signal TERM $pid
-- Adrián Moreno
@@ -108,7 +109,8 @@ ovs_netns_spawn_daemon() { info "spawning cmd: $*" ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & pid=$!
- ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
- ovs_sbx "$sbx" on_exit "kill --timeout 1000 TERM \
}--timeout 1000 KILL $pid 2>/dev/null"
ovs_add_netns_and_veths () {
Adrian Moreno amorenoz@redhat.com writes:
On 2/16/24 16:28, Aaron Conole wrote:
Normally a spawned process under OVS is given a SIGTERM when the test ends as part of cleanup. However, in case the process is still lingering for some reason, we also send a SIGKILL to force it down faster. Signed-off-by: Aaron Conole aconole@redhat.com
tools/testing/selftests/net/openvswitch/openvswitch.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index a5dbde482ba4..678a72ad47c1 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -91,7 +91,8 @@ ovs_add_if () { python3 $ovs_base/ovs-dpctl.py add-if \ -u "$2" "$3" >$ovs_dir/$3.out 2>$ovs_dir/$3.err & pid=$!
on_exit "ovs_sbx $1 kill -TERM $pid 2>/dev/null"
on_exit "ovs_sbx $1 kill --timeout 1000 TERM \
fi }--timeout 1000 KILL $pid 2>/dev/null"
AFAIK, this will immediately send TERM, then wait 1s, send TERM again, wait 1s then send KILL. Is that what you intended? To avoid the double TERM you could:
Okay, I'll adjust it.
-- Adrián Moreno
@@ -108,7 +109,8 @@ ovs_netns_spawn_daemon() { info "spawning cmd: $*" ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & pid=$!
- ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
- ovs_sbx "$sbx" on_exit "kill --timeout 1000 TERM \
} ovs_add_netns_and_veths () {--timeout 1000 KILL $pid 2>/dev/null"
Many openvswitch test cases reused netns and interface names. This works fine as long as the test case cleans up gracefully. However, if there is some kind of ungraceful termination (such as an external signal) the netns or interfaces can be left lingering. This happens when the selftest timeout gets exceeded, while running under very slow debugging conditions.
The solution here is to cleanup the netns on executing the next test.
Signed-off-by: Aaron Conole aconole@redhat.com --- tools/testing/selftests/net/openvswitch/openvswitch.sh | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 678a72ad47c1..8dc315585710 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -115,6 +115,10 @@ ovs_netns_spawn_daemon() {
ovs_add_netns_and_veths () { info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}" + ntns_e=`ip netns list | grep $3` + [ "$ntns_e" != "" ] && ip netns del "$3" + if4_e=`ip link show $4 2>/dev/null` + [ "$if4_e" != "" ] && ip link del "$4" ovs_sbx "$1" ip netns add "$3" || return 1 on_exit "ovs_sbx $1 ip netns del $3" ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1
On Fri, 2024-02-16 at 10:28 -0500, Aaron Conole wrote:
Many openvswitch test cases reused netns and interface names. This works fine as long as the test case cleans up gracefully. However, if there is some kind of ungraceful termination (such as an external signal) the netns or interfaces can be left lingering.
It looks the openvswitch.sh test script is already trying quite hard to delete the allocated resources on ungraceful termination via "trap...".
That is usually enough for other self-tests, could you please detail when it fails here?
This happens when the selftest timeout gets exceeded, while running under very slow debugging conditions.
'timeout' should send SIG_TERM, and the script already handle that gracefully?
The solution here is to cleanup the netns on executing the next test.
I suggest avoiding this, it could end up killing innocent alias netns.
You could consider using the 'setup_ns' helper from the tools/testing/selftests/net/lib.sh library to always generate unique netns names.
Signed-off-by: Aaron Conole aconole@redhat.com
tools/testing/selftests/net/openvswitch/openvswitch.sh | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 678a72ad47c1..8dc315585710 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -115,6 +115,10 @@ ovs_netns_spawn_daemon() { ovs_add_netns_and_veths () { info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
- ntns_e=`ip netns list | grep $3`
- [ "$ntns_e" != "" ] && ip netns del "$3"
- if4_e=`ip link show $4 2>/dev/null`
Minor unrelated note: $() is preferable to `` for sub-shells, as it's more friendly to nesting, string expansing, quotes, etc.
Cheers,
Paolo
Paolo Abeni pabeni@redhat.com writes:
On Fri, 2024-02-16 at 10:28 -0500, Aaron Conole wrote:
Many openvswitch test cases reused netns and interface names. This works fine as long as the test case cleans up gracefully. However, if there is some kind of ungraceful termination (such as an external signal) the netns or interfaces can be left lingering.
It looks the openvswitch.sh test script is already trying quite hard to delete the allocated resources on ungraceful termination via "trap...".
That is usually enough for other self-tests, could you please detail when it fails here?
I thought it should work - but at least what I observed is that when the vng spawned VM was running the tests, it would TERM portions of the subshell, but not the running openvswitch.sh script. That left these namespaces and interfaces lingering.
This happens when the selftest timeout gets exceeded, while running under very slow debugging conditions.
'timeout' should send SIG_TERM, and the script already handle that gracefully?
At least, I didn't observe that to be the case when it got terminated. I'll remove the timeout setting and try to reproduce it.
The solution here is to cleanup the netns on executing the next test.
I suggest avoiding this, it could end up killing innocent alias netns.
You could consider using the 'setup_ns' helper from the tools/testing/selftests/net/lib.sh library to always generate unique netns names.
Okay - I will look into that.
Signed-off-by: Aaron Conole aconole@redhat.com
tools/testing/selftests/net/openvswitch/openvswitch.sh | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 678a72ad47c1..8dc315585710 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -115,6 +115,10 @@ ovs_netns_spawn_daemon() { ovs_add_netns_and_veths () { info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
- ntns_e=`ip netns list | grep $3`
- [ "$ntns_e" != "" ] && ip netns del "$3"
- if4_e=`ip link show $4 2>/dev/null`
Minor unrelated note: $() is preferable to `` for sub-shells, as it's more friendly to nesting, string expansing, quotes, etc.
Okay - I'll prefer it in future. I didn't know how much I should be worrying about non-POSIX shells (I seem to remember that `` is accepted in more shells).
Cheers,
Paolo
The arping test transmits a single packet and immediately tries to pull the log for upcall details. This works fine in practice on most systems but can fail under a slower VM where it can take a while for the log data to be written. By adding addtional transmits we give the system time to write, and also increase the opportunity to not miss processing the upcall queue.
Signed-off-by: Aaron Conole aconole@redhat.com --- tools/testing/selftests/net/openvswitch/openvswitch.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 8dc315585710..a2c106104fb8 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -598,7 +598,7 @@ test_upcall_interfaces() {
sleep 1 info "sending arping" - ip netns exec upc arping -I l0 172.31.110.20 -c 1 \ + ip netns exec upc arping -I l0 172.31.110.20 -c 3 \ >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
grep -E "MISS upcall[0/yes]: .*arp(sip=172.31.110.1,tip=172.31.110.20,op=1,sha=" $ovs_dir/left0.out >/dev/null 2>&1 || return 1
The openvswitch tests do not attempt to insert the openvswitch module automatically. Now the test will auto load the module and try to unload it at the end. The test harness includes the option to not load the module, which is helpful when developing changes and loading the module from a different location.
Signed-off-by: Aaron Conole aconole@redhat.com --- .../testing/selftests/net/openvswitch/openvswitch.sh | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index a2c106104fb8..e7c9b4fc5591 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -672,12 +672,14 @@ run_test() { exitcode=0 desc=0 all_skipped=true +LOAD_MOD=yes
while getopts :pvt o do case $o in p) PAUSE_ON_FAIL=yes;; v) VERBOSE=1;; + n) LOAD_MOD=no;; t) if which tcpdump > /dev/null 2>&1; then TRACING=1 else @@ -697,6 +699,10 @@ for arg do command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not found"; usage; } done
+if [ "$LOAD_MOD" == "yes" ]; then + modprobe openvswitch +fi + name="" desc="" for t in ${tests}; do @@ -716,4 +722,9 @@ for t in ${tests}; do desc="" done
+ +if [ "$LOAD_MOD" == "yes" ]; then + modprobe -r openvswitch +fi + exit ${exitcode}
In order for the ovs selftests to run, we need to introduce a sample configuration. This will get run under the kselftest runner for net, but given that environment can be limited process wise, the test can take longer than on bare metal or high availability VMs. So, also introduce a settings file which includes a timeout value that should be sufficent for such environments.
Signed-off-by: Aaron Conole aconole@redhat.com --- .../testing/selftests/net/openvswitch/config | 50 +++++++++++++++++++ .../selftests/net/openvswitch/settings | 1 + 2 files changed, 51 insertions(+) create mode 100644 tools/testing/selftests/net/openvswitch/config create mode 100644 tools/testing/selftests/net/openvswitch/settings
diff --git a/tools/testing/selftests/net/openvswitch/config b/tools/testing/selftests/net/openvswitch/config new file mode 100644 index 000000000000..24cff330f3c0 --- /dev/null +++ b/tools/testing/selftests/net/openvswitch/config @@ -0,0 +1,50 @@ +CONFIG_OPENVSWITCH=m +CONFIG_OPENVSWITCH_GRE=m +CONFIG_OPENVSWITCH_VXLAN=m +CONFIG_OPENVSWITCH_GENEVE=m +CONFIG_NF_CONNTRACK_OVS=y +CONFIG_NF_NAT_OVS=y +CONFIG_NF_NAT=m +CONFIG_NF_CONNTRACK=m +CONFIG_GENEVE=y +CONFIG_VXLAN=y +CONFIG_IPV6=y +CONFIG_TUN=y +CONFIG_TAP=y +CONFIG_USER_NS=y +CONFIG_NET_NS=y +CONFIG_SYSFS=y +CONFIG_PROC_SYSCTL=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_VETH=y +CONFIG_DUMMY=y +CONFIG_NETFILTER=y +CONFIG_NETFILTER_ADVANCED=y +CONFIG_IP6_NF_IPTABLES=m +CONFIG_IP_NF_IPTABLES=m +CONFIG_IP6_NF_NAT=m +CONFIG_IP_NF_NAT=m +CONFIG_IPV6_GRE=m +CONFIG_TRACEPOINTS=y +CONFIG_NET_DROP_MONITOR=m +CONFIG_NF_TABLES=m +CONFIG_NF_TABLES_IPV6=y +CONFIG_NF_TABLES_IPV4=y +CONFIG_KALLSYMS=y +CONFIG_BRIDGE=m +CONFIG_VLAN_8021Q=m +CONFIG_BRIDGE_VLAN_FILTERING=y +CONFIG_NET_L3_MASTER_DEV=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NET_VRF=m +CONFIG_BPF_SYSCALL=y +CONFIG_CGROUP_BPF=y +CONFIG_NET_ACT_CT=m +CONFIG_NET_ACT_MIRRED=m +CONFIG_NET_ACT_MPLS=m +CONFIG_NET_ACT_VLAN=m +CONFIG_NET_CLS_FLOWER=m +CONFIG_NET_CLS_MATCHALL=m +CONFIG_NET_SCH_INGRESS=m +CONFIG_NET_ACT_GACT=m +CONFIG_NAMESPACES=y diff --git a/tools/testing/selftests/net/openvswitch/settings b/tools/testing/selftests/net/openvswitch/settings new file mode 100644 index 000000000000..694d70710ff0 --- /dev/null +++ b/tools/testing/selftests/net/openvswitch/settings @@ -0,0 +1 @@ +timeout=300
On Fri, 16 Feb 2024 10:28:39 -0500 Aaron Conole wrote:
The series is a host of cleanups to the openvswitch selftest suite which should be ready to run under the netdev selftest runners using vng. For now, the testing has been done with RW directories, but additional testing will be done to try and keep it all as RO to be more friendly.
Would it be an option to make the output go into a dir in /tmp/ instead of in place, in the tree?
mktemp -p /tmp/ -d ovs-test.XXXXXXXXX
or such?
Jakub Kicinski kuba@kernel.org writes:
On Fri, 16 Feb 2024 10:28:39 -0500 Aaron Conole wrote:
The series is a host of cleanups to the openvswitch selftest suite which should be ready to run under the netdev selftest runners using vng. For now, the testing has been done with RW directories, but additional testing will be done to try and keep it all as RO to be more friendly.
Would it be an option to make the output go into a dir in /tmp/ instead of in place, in the tree?
mktemp -p /tmp/ -d ovs-test.XXXXXXXXX
That's probably the best approach. I'll switch to it.
or such?
linux-kselftest-mirror@lists.linaro.org