On 7/20/23 18:25, Mirsad Todorovac wrote:
On 7/20/23 18:07, Mirsad Todorovac wrote:
On 7/20/23 11:43, Petr Machata wrote:
Mirsad Todorovac mirsad.todorovac@alu.unizg.hr writes:
Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution stop at the exact same line on both boxes (os I reckon it is more than an accident):
# selftests: net/forwarding: sch_ets.sh # TEST: ping vlan 10 [ OK ] # TEST: ping vlan 11 [ OK ] # TEST: ping vlan 12 [ OK ] # Running in priomap mode # Testing ets bands 3 strict 3, streams 0 1 # TEST: band 0 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 1 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 3 strict 3, streams 1 2 # TEST: band 1 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 2 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1 # TEST: band 0 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 1 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2 # TEST: bands 1:2 [ OK ] # INFO: Expected ratio 2.00 Measured ratio 1.99 # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 1.00 Measured ratio .99 # TEST: bands 0:2 [ OK ] # INFO: Expected ratio 1.00 Measured ratio 1.00 # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 1.42 Measured ratio 1.42 # TEST: bands 0:2 [ OK ] # INFO: Expected ratio 3.33 Measured ratio 3.33 # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 1.60 Measured ratio 1.59 # TEST: bands 0:2 [ OK ] # INFO: Expected ratio 3.33 Measured ratio 3.33 # Testing ets bands 2 quanta 5000 2500, streams 0 1 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 2.00 Measured ratio 1.99 # Running in classifier mode # Testing ets bands 3 strict 3, streams 0 1 # TEST: band 0 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 1 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 3 strict 3, streams 1 2 # TEST: band 1 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 2 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
I tried to run 'set -x' enabled version standalone, but that one finished correctly (?).
It could be something previous scripts left, but right now I don't have a clue. I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
If you run it standalone without set -x, does it finish as well?
Added that. Yes, standlone run finishes correctly, with or without 'set -x':
root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding# ./sch_ets.sh TEST: ping vlan 10 [ OK ] TEST: ping vlan 11 [ OK ] TEST: ping vlan 12 [ OK ] Running in priomap mode Testing ets bands 3 strict 3, streams 0 1 TEST: band 0 [ OK ] INFO: Expected ratio >95% Measured ratio 100.00 TEST: band 1 [ OK ] INFO: Expected ratio <5% Measured ratio 0 killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 strict 3, streams 1 2 TEST: band 1 [ OK ] INFO: Expected ratio >95% Measured ratio 100.00 TEST: band 2 [ OK ] INFO: Expected ratio <5% Measured ratio 0 killing MZ killed MZ killing MZ killed MZ Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1 TEST: band 0 [ OK ] INFO: Expected ratio >95% Measured ratio 100.00 TEST: band 1 [ OK ] INFO: Expected ratio <5% Measured ratio 0 killing MZ killed MZ killing MZ killed MZ Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2 TEST: bands 1:2 [ OK ] INFO: Expected ratio 2.00 Measured ratio 1.99 killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2 TEST: bands 0:1 [ OK ] INFO: Expected ratio 1.00 Measured ratio 1.00 TEST: bands 0:2 [ OK ] INFO: Expected ratio 1.00 Measured ratio .99 killing MZ killed MZ killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2 TEST: bands 0:1 [ OK ] INFO: Expected ratio 1.42 Measured ratio 1.42 TEST: bands 0:2 [ OK ] INFO: Expected ratio 3.33 Measured ratio 3.33 killing MZ killed MZ killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2 TEST: bands 0:1 [ OK ] INFO: Expected ratio 1.60 Measured ratio 1.59 TEST: bands 0:2 [ OK ] INFO: Expected ratio 3.33 Measured ratio 3.33 killing MZ killed MZ killing MZ killed MZ killing MZ killed MZ Testing ets bands 2 quanta 5000 2500, streams 0 1 TEST: bands 0:1 [ OK ] INFO: Expected ratio 2.00 Measured ratio 1.99 killing MZ killed MZ killing MZ killed MZ Running in classifier mode Testing ets bands 3 strict 3, streams 0 1 TEST: band 0 [ OK ] INFO: Expected ratio >95% Measured ratio 100.00 TEST: band 1 [ OK ] INFO: Expected ratio <5% Measured ratio 0 killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 strict 3, streams 1 2 TEST: band 1 [ OK ] INFO: Expected ratio >95% Measured ratio 100.00 TEST: band 2 [ OK ] INFO: Expected ratio <5% Measured ratio 0 killing MZ killed MZ killing MZ killed MZ Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1 TEST: band 0 [ OK ] INFO: Expected ratio >95% Measured ratio 100.00 TEST: band 1 [ OK ] INFO: Expected ratio <5% Measured ratio 0 killing MZ killed MZ killing MZ killed MZ Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2 TEST: bands 1:2 [ OK ] INFO: Expected ratio 2.00 Measured ratio 1.99 killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2 TEST: bands 0:1 [ OK ] INFO: Expected ratio 1.00 Measured ratio .99 TEST: bands 0:2 [ OK ] INFO: Expected ratio 1.00 Measured ratio .99 killing MZ killed MZ killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2 TEST: bands 0:1 [ OK ] INFO: Expected ratio 1.42 Measured ratio 1.42 TEST: bands 0:2 [ OK ] INFO: Expected ratio 3.33 Measured ratio 3.33 killing MZ killed MZ killing MZ killed MZ killing MZ killed MZ Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2 TEST: bands 0:1 [ OK ] INFO: Expected ratio 1.60 Measured ratio 1.60 TEST: bands 0:2 [ OK ] INFO: Expected ratio 3.33 Measured ratio 3.33 killing MZ killed MZ killing MZ killed MZ killing MZ killed MZ Testing ets bands 2 quanta 5000 2500, streams 0 1 TEST: bands 0:1 [ OK ] INFO: Expected ratio 2.00 Measured ratio 2.00 killing MZ killed MZ killing MZ killed MZ root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding#
That would imply that the reproducer needs to include the previous tests as well.
This is entirely possible, as timeouts and CTRL+C events do not seem to be caught and the cleanup is not done ...
sch_ets_core.sh: trap cleanup EXIT
Some tests time out even after settings:timeout=240, so IMHO this should be taken into account.
Best regards, Mirsad Todorovac
It looks like the test is stuck in ets_test_mixed in classifier_mode. A way to run just this test would be:
TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh
Looking at the code, the only place that I can see that waits on anything is the "{ kill %% && wait %%; } 2>/dev/null" line in stop_traffic() (in lib.sh). Maybe something like this would let us see if that's the case:
modified tools/testing/selftests/net/forwarding/lib.sh @@ -1468,8 +1470,10 @@ start_tcp_traffic() stop_traffic() { + echo killing MZ # Suppress noise from killing mausezahn. { kill %% && wait %%; } 2>/dev/null + echo killed MZ }
FYI, this is the [incomplete] list of the tests that time out even after being assigned long timeout of 240s instead of default.
marvin@defiant:~/linux/kernel/linux_torvalds$ grep TIMEOUT ../kselftest-6.5-rc2-net-forwarding-8.log not ok 13 selftests: net/forwarding: custom_multipath_hash.sh # TIMEOUT 240 seconds not ok 18 selftests: net/forwarding: gre_custom_multipath_hash.sh # TIMEOUT 240 seconds not ok 19 selftests: net/forwarding: gre_inner_v4_multipath.sh # TIMEOUT 240 seconds not ok 21 selftests: net/forwarding: gre_multipath_nh_res.sh # TIMEOUT 240 seconds not ok 22 selftests: net/forwarding: gre_multipath_nh.sh # TIMEOUT 240 seconds not ok 27 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh # TIMEOUT 240 seconds not ok 34 selftests: net/forwarding: ip6gre_inner_v4_multipath.sh # TIMEOUT 240 seconds not ok 58 selftests: net/forwarding: no_forwarding.sh # TIMEOUT 240 seconds not ok 67 selftests: net/forwarding: router_mpath_nh_res.sh # TIMEOUT 240 seconds not ok 68 selftests: net/forwarding: router_mpath_nh.sh # TIMEOUT 240 seconds not ok 70 selftests: net/forwarding: router_multipath.sh # TIMEOUT 240 seconds marvin@defiant:~/linux/kernel/linux_torvalds$
The test seem to be stuck despite adding SIGTERM cleanup to all tests that timed out:
# selftests: net/forwarding: sch_ets.sh # TEST: ping vlan 10 [ OK ] # TEST: ping vlan 11 [ OK ] # TEST: ping vlan 12 [ OK ] # Running in priomap mode # Testing ets bands 3 strict 3, streams 0 1 # TEST: band 0 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 1 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 3 strict 3, streams 1 2 # TEST: band 1 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 2 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1 # TEST: band 0 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 1 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2 # TEST: bands 1:2 [ OK ] # INFO: Expected ratio 2.00 Measured ratio 2.00 # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 1.00 Measured ratio .99 # TEST: bands 0:2 [ OK ] # INFO: Expected ratio 1.00 Measured ratio .99 # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 1.42 Measured ratio 1.42 # TEST: bands 0:2 [ OK ] # INFO: Expected ratio 3.33 Measured ratio 3.33 # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 1.60 Measured ratio 1.59 # TEST: bands 0:2 [ OK ] # INFO: Expected ratio 3.33 Measured ratio 3.33 # Testing ets bands 2 quanta 5000 2500, streams 0 1 # TEST: bands 0:1 [ OK ] # INFO: Expected ratio 2.00 Measured ratio 1.99 # Running in classifier mode # Testing ets bands 3 strict 3, streams 0 1 # TEST: band 0 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 1 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 3 strict 3, streams 1 2 # TEST: band 1 [ OK ] # INFO: Expected ratio >95% Measured ratio 100.00 # TEST: band 2 [ OK ] # INFO: Expected ratio <5% Measured ratio 0 # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
What is happening is a stuck mousezahn loop:
root 558266 2732 0 01:17 pts/2 00:00:23 mausezahn veth0.10 -p 8000 -A 192.0.2.1 -B 192.0.2.2 -c 0 -a own -b ba:33:37:81:dc:5 root 558273 2732 0 01:17 pts/2 00:00:23 mausezahn veth0.11 -p 8000 -A 192.0.2.17 -B 192.0.2.18 -c 0 -a own -b ba:33:37:81:dc
$ sudo strace -p 558266 ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device) ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
I am not sure if this will ever complete, or is it a runaway loop?
Probably it is stuck, for nothing was written to the test log for 71 minutes.
I hope we are coming closer to the cause.
Previously I didn't figure out what hanged.
FWIW, I will post the patches for the scripts that timed out for your review.
Kind regards, Mirsad