There are a couple of places where the test script "sleep"s to wait for some external condition to be met.
This is error prone, specially in slow systems (identified in CI by "KSFT_MACHINE_SLOW=yes").
To fix this, add a "ovs_wait" function that tries to execute a command a few times until it succeeds. The timeout used is set to 5s for "normal" systems and doubled if a slow CI machine is detected.
This should make the following work:
$ vng --build \ --config tools/testing/selftests/net/config \ --config kernel/configs/debug.config
$ vng --run . --user root -- "make -C tools/testing/selftests/ \ KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
Signed-off-by: Adrian Moreno amorenoz@redhat.com --- .../selftests/net/openvswitch/openvswitch.sh | 45 +++++++++++++++---- .../selftests/net/openvswitch/ovs-dpctl.py | 1 + 2 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index bc71dbc18b21..cc0bfae2bafa 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -11,6 +11,11 @@ ksft_skip=4 PAUSE_ON_FAIL=no VERBOSE=0 TRACING=0 +WAIT_TIMEOUT=5 + +if test "X$KSFT_MACHINE_SLOW" == "Xyes"; then + WAIT_TIMEOUT=10 +fi
tests=" arp_ping eth-arp: Basic arp ping between two NS @@ -29,6 +34,30 @@ info() { [ $VERBOSE = 0 ] || echo $* }
+ovs_wait() { + info "waiting $WAIT_TIMEOUT s for: $@" + + if "$@" ; then + info "wait succeeded immediately" + return 0 + fi + + # A quick re-check helps speed up small races in fast systems. + # However, fractional sleeps might not necessarily work. + local start=0 + sleep 0.1 || { sleep 1; start=1; } + + for (( i=start; i<WAIT_TIMEOUT; i++ )); do + if "$@" ; then + info "wait succeeded after $i seconds" + return 0 + fi + sleep 1 + done + info "wait failed after $i seconds" + return 1 +} + ovs_base=`pwd` sbxs= sbx_add () { @@ -278,20 +307,19 @@ test_psample() {
# Record psample data. ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py psample-events + ovs_wait grep -q "listening for psample events" ${ovs_dir}/stdout
# Send a single ping. - sleep 1 ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 1 || return 1 - sleep 1
# We should have received one userspace action upcall and 2 psample packets. - grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || return 1 + ovs_wait grep -q "userspace action command" $ovs_dir/s0.out || return 1
# client -> server samples should only contain the first 14 bytes of the packet. - grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \ - $ovs_dir/stdout >/dev/null 2>&1 || return 1 - grep -E "rate:4294967295,group:2,cookie:eeff0c" \ - $ovs_dir/stdout >/dev/null 2>&1 || return 1 + ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \ + $ovs_dir/stdout || return 1 + + ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout || return 1
return 0 } @@ -711,7 +739,8 @@ test_upcall_interfaces() { ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \ 172.31.110.1/24 -u || return 1
- sleep 1 + ovs_wait grep -q "listening on upcall packet handler" ${ovs_dir}/left0.out + info "sending arping" ip netns exec upc arping -I l0 172.31.110.20 -c 1 \ >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 1e15b0818074..8a0396bfaf99 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -2520,6 +2520,7 @@ class PsampleEvent(EventSocket): marshal_class = psample_msg
def read_samples(self): + print("listening for psample events", flush=True) while True: try: for msg in self.get():
On Wed, Jul 10, 2024 at 11:04:59AM GMT, Adrian Moreno wrote:
There are a couple of places where the test script "sleep"s to wait for some external condition to be met.
This is error prone, specially in slow systems (identified in CI by "KSFT_MACHINE_SLOW=yes").
To fix this, add a "ovs_wait" function that tries to execute a command a few times until it succeeds. The timeout used is set to 5s for "normal" systems and doubled if a slow CI machine is detected.
This should make the following work:
$ vng --build \ --config tools/testing/selftests/net/config \ --config kernel/configs/debug.config
$ vng --run . --user root -- "make -C tools/testing/selftests/ \ KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
Signed-off-by: Adrian Moreno amorenoz@redhat.com
.../selftests/net/openvswitch/openvswitch.sh | 45 +++++++++++++++---- .../selftests/net/openvswitch/ovs-dpctl.py | 1 + 2 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index bc71dbc18b21..cc0bfae2bafa 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -11,6 +11,11 @@ ksft_skip=4 PAUSE_ON_FAIL=no VERBOSE=0 TRACING=0 +WAIT_TIMEOUT=5
+if test "X$KSFT_MACHINE_SLOW" == "Xyes"; then
- WAIT_TIMEOUT=10
+fi
tests=" arp_ping eth-arp: Basic arp ping between two NS @@ -29,6 +34,30 @@ info() { [ $VERBOSE = 0 ] || echo $* }
+ovs_wait() {
- info "waiting $WAIT_TIMEOUT s for: $@"
- if "$@" ; then
info "wait succeeded immediately"
return 0
- fi
- # A quick re-check helps speed up small races in fast systems.
- # However, fractional sleeps might not necessarily work.
- local start=0
- sleep 0.1 || { sleep 1; start=1; }
- for (( i=start; i<WAIT_TIMEOUT; i++ )); do
if "$@" ; then
info "wait succeeded after $i seconds"
return 0
fi
sleep 1
- done
- info "wait failed after $i seconds"
- return 1
+}
ovs_base=`pwd` sbxs= sbx_add () { @@ -278,20 +307,19 @@ test_psample() {
# Record psample data. ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py psample-events
ovs_wait grep -q "listening for psample events" ${ovs_dir}/stdout
# Send a single ping.
sleep 1 ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 1 || return 1
sleep 1
# We should have received one userspace action upcall and 2 psample packets.
grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || return 1
ovs_wait grep -q "userspace action command" $ovs_dir/s0.out || return 1
# client -> server samples should only contain the first 14 bytes of the packet.
- grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
$ovs_dir/stdout >/dev/null 2>&1 || return 1
- grep -E "rate:4294967295,group:2,cookie:eeff0c" \
$ovs_dir/stdout >/dev/null 2>&1 || return 1
ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
$ovs_dir/stdout || return 1
ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout || return 1
return 0
} @@ -711,7 +739,8 @@ test_upcall_interfaces() { ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \ 172.31.110.1/24 -u || return 1
- sleep 1
- ovs_wait grep -q "listening on upcall packet handler" ${ovs_dir}/left0.out
- info "sending arping" ip netns exec upc arping -I l0 172.31.110.20 -c 1 \ >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 1e15b0818074..8a0396bfaf99 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -2520,6 +2520,7 @@ class PsampleEvent(EventSocket): marshal_class = psample_msg
def read_samples(self):
print("listening for psample events", flush=True) while True: try: for msg in self.get():
-- 2.45.2
This patch is supposed to fix openvswitch selftests on "-dbg" machines. However, as Simon points out, all recent rounds are failing [1]. I don't see this patch being included in the batches and I was wondering why.
Also I see a (presumably unrelated) build error netdev/build_32bit. Is there anything I can do?
[1] https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=...
Thanks. Adrián
On Thu, 11 Jul 2024 09:41:08 +0000 Adrián Moreno wrote:
This patch is supposed to fix openvswitch selftests on "-dbg" machines. However, as Simon points out, all recent rounds are failing [1]. I don't see this patch being included in the batches and I was wondering why.
Also I see a (presumably unrelated) build error netdev/build_32bit. Is there anything I can do?
Hopefully fixed now, we'll see if it gets into the next run.
On 7/10/24 11:04, Adrian Moreno wrote:
There are a couple of places where the test script "sleep"s to wait for some external condition to be met.
This is error prone, specially in slow systems (identified in CI by "KSFT_MACHINE_SLOW=yes").
To fix this, add a "ovs_wait" function that tries to execute a command a few times until it succeeds. The timeout used is set to 5s for "normal" systems and doubled if a slow CI machine is detected.
This should make the following work:
$ vng --build \ --config tools/testing/selftests/net/config \ --config kernel/configs/debug.config
$ vng --run . --user root -- "make -C tools/testing/selftests/ \ KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
Signed-off-by: Adrian Moreno amorenoz@redhat.com
.../selftests/net/openvswitch/openvswitch.sh | 45 +++++++++++++++---- .../selftests/net/openvswitch/ovs-dpctl.py | 1 + 2 files changed, 38 insertions(+), 8 deletions(-)
Seem like we don't have a signal from CI for some reason yet, but I tested this locally and it seem to work fine. Either way it's a better way of doing things than sleep'n'hope.
Reviewed-by: Ilya Maximets i.maximets@ovn.org
Hello:
This patch was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Wed, 10 Jul 2024 11:04:59 +0200 you wrote:
There are a couple of places where the test script "sleep"s to wait for some external condition to be met.
This is error prone, specially in slow systems (identified in CI by "KSFT_MACHINE_SLOW=yes").
To fix this, add a "ovs_wait" function that tries to execute a command a few times until it succeeds. The timeout used is set to 5s for "normal" systems and doubled if a slow CI machine is detected.
[...]
Here is the summary with links: - [net-next,v2] selftests: openvswitch: retry instead of sleep https://git.kernel.org/netdev/net-next/c/5e724cb688a2
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org