This series should fix the recent instabilities seen by MPTCP and NIPA CIs where the 'mptcp_connect.sh' tests fail regularly when running the 'disconnect' subtests with "plain" TCP sockets, e.g.
# INFO: disconnect # 63 ns1 MPTCP -> ns1 (10.0.1.1:20001 ) MPTCP (duration 996ms) [ OK ] # 64 ns1 MPTCP -> ns1 (10.0.1.1:20002 ) TCP (duration 851ms) [ OK ] # 65 ns1 TCP -> ns1 (10.0.1.1:20003 ) MPTCP Unexpected revents: POLLERR/POLLNVAL(19) # (duration 896ms) [FAIL] file received by server does not match (in, out): # -rw-r--r-- 1 root root 11112852 Aug 19 09:16 /tmp/tmp.hlJe5DoMoq.disconnect # Trailing bytes are: # /{ga 6@=#.8:-rw------- 1 root root 10085368 Aug 19 09:16 /tmp/tmp.blClunilxx # Trailing bytes are: # /{ga 6@=#.8:66 ns1 MPTCP -> ns1 (dead:beef:1::1:20004) MPTCP (duration 987ms) [ OK ] # 67 ns1 MPTCP -> ns1 (dead:beef:1::1:20005) TCP (duration 911ms) [ OK ] # 68 ns1 TCP -> ns1 (dead:beef:1::1:20006) MPTCP (duration 980ms) [ OK ] # [FAIL] Tests of the full disconnection have failed
These issues started to be visible after some behavioural changes in TCP, where too quick re-connections after a shutdown() can now be more easily rejected. Patch 3 modifies the selftests to wait, but this resolution revealed an issue in MPTCP which is fixed by patch 1 (a fix for v5.9 kernel).
Patches 2 and 4 improve some errors reported by the selftests, and patch 5 helps with the debugging of such issues.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Note: The last two patches are not strictly fixes, but they are useful in case similar issues happen again. That's why they have been added here in this series for -net. If that's an issue, please drop them, and I can re-send them later on.
--- Matthieu Baerts (NGI0) (5): mptcp: propagate shutdown to subflows when possible selftests: mptcp: connect: catch IO errors on listen side selftests: mptcp: avoid spurious errors on TCP disconnect selftests: mptcp: print trailing bytes with od selftests: mptcp: connect: print pcap prefix
net/mptcp/protocol.c | 16 ++++++++++++++++ tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 ++++++----- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 6 +++++- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 2 +- 4 files changed, 28 insertions(+), 7 deletions(-) --- base-commit: 2690cb089502b80b905f2abdafd1bf2d54e1abef change-id: 20250912-net-mptcp-fix-sft-connect-f095ad7a6e36
Best regards,
When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related metadata to exchange, and all subflows can be safely shutdown.
Before this patch, the subflows were actually terminated at 'close()' time. That's certainly fine most of the time, but not when the userspace 'shutdown()' a connection, without close()ing it. When doing so, the subflows were staying in LAST_ACK state on one side -- and consequently in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP socket.
Now, when the DATA FIN have been ACKed, all subflows are shutdown. A consequence of this is that the TCP 'FIN' flag can be set earlier now, but the end result is the same. This affects the packetdrill tests looking at the end of the MPTCP connections, but for a good reason.
Note that tcp_shutdown() will check the subflow state, so no need to do that again before calling it.
Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number and add related helpers") Cc: stable@vger.kernel.org Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN") Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e6fd97b21e9eeae816dc355834ba54c21bd2cc94..5e497a83e9675b5217052299463c1dfc58de947c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -371,6 +371,20 @@ static void mptcp_close_wake_up(struct sock *sk) sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); }
+static void mptcp_shutdown_subflows(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + bool slow; + + slow = lock_sock_fast(ssk); + tcp_shutdown(ssk, SEND_SHUTDOWN); + unlock_sock_fast(ssk, slow); + } +} + /* called under the msk socket lock */ static bool mptcp_pending_data_fin_ack(struct sock *sk) { @@ -395,6 +409,7 @@ static void mptcp_check_data_fin_ack(struct sock *sk) break; case TCP_CLOSING: case TCP_LAST_ACK: + mptcp_shutdown_subflows(msk); mptcp_set_state(sk, TCP_CLOSE); break; } @@ -563,6 +578,7 @@ static bool mptcp_check_data_fin(struct sock *sk) mptcp_set_state(sk, TCP_CLOSING); break; case TCP_FIN_WAIT2: + mptcp_shutdown_subflows(msk); mptcp_set_state(sk, TCP_CLOSE); break; default:
IO errors were correctly printed to stderr, and propagated up to the main loop for the server side, but the returned value was ignored. As a consequence, the program for the listener side was no longer exiting with an error code in case of IO issues.
Because of that, some issues might not have been seen. But very likely, most issues either had an effect on the client side, or the file transfer was not the expected one, e.g. the connection got reset before the end. Still, it is better to fix this.
The main consequence of this issue is the error that was reported by the selftests: the received and sent files were different, and the MIB counters were not printed. Also, when such errors happened during the 'disconnect' tests, the program tried to continue until the timeout.
Now when an IO error is detected, the program exits directly with an error.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 4f07ac9fa207cb08a934582b98d688d0b9512f97..1408698df0999730d9a9a4ad6a6362a4ab122d54 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1093,6 +1093,7 @@ int main_loop_s(int listensock) struct pollfd polls; socklen_t salen; int remotesock; + int err = 0; int fd = 0;
again: @@ -1125,7 +1126,7 @@ int main_loop_s(int listensock) SOCK_TEST_TCPULP(remotesock, 0);
memset(&winfo, 0, sizeof(winfo)); - copyfd_io(fd, remotesock, 1, true, &winfo); + err = copyfd_io(fd, remotesock, 1, true, &winfo); } else { perror("accept"); return 1; @@ -1134,10 +1135,10 @@ int main_loop_s(int listensock) if (cfg_input) close(fd);
- if (--cfg_repeat > 0) + if (!err && --cfg_repeat > 0) goto again;
- return 0; + return err; }
static void init_rng(void)
The disconnect test-case, with 'plain' TCP sockets generates spurious errors, e.g.
07 ns1 TCP -> ns1 (dead:beef:1::1:10006) MPTCP read: Connection reset by peer read: Connection reset by peer (duration 155ms) [FAIL] client exit code 3, server 3
netns ns1-FloSdv (listener) socket stat for 10006: TcpActiveOpens 2 0.0 TcpPassiveOpens 2 0.0 TcpEstabResets 2 0.0 TcpInSegs 274 0.0 TcpOutSegs 276 0.0 TcpOutRsts 3 0.0 TcpExtPruneCalled 2 0.0 TcpExtRcvPruned 1 0.0 TcpExtTCPPureAcks 104 0.0 TcpExtTCPRcvCollapsed 2 0.0 TcpExtTCPBacklogCoalesce 42 0.0 TcpExtTCPRcvCoalesce 43 0.0 TcpExtTCPChallengeACK 1 0.0 TcpExtTCPFromZeroWindowAdv 42 0.0 TcpExtTCPToZeroWindowAdv 41 0.0 TcpExtTCPWantZeroWindowAdv 13 0.0 TcpExtTCPOrigDataSent 164 0.0 TcpExtTCPDelivered 165 0.0 TcpExtTCPRcvQDrop 1 0.0
In the failing scenarios (TCP -> MPTCP), the involved sockets are actually plain TCP ones, as fallbacks for passive sockets at 2WHS time cause the MPTCP listeners to actually create 'plain' TCP sockets.
Similar to commit 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect"), the root cause is in the user-space bits: the test program tries to disconnect as soon as all the pending data has been spooled, generating an RST. If such option reaches the peer before the connection has reached the closed status, the TCP socket will report an error to the user-space, as per protocol specification, causing the above failure. Note that it looks like this issue got more visible since the "tcp: receiver changes" series from commit 06baf9bfa6ca ("Merge branch 'tcp-receiver-changes'").
Address the issue by explicitly waiting for the TCP sockets (-t) to reach a closed status before performing the disconnect. More precisely, the test program now waits for plain TCP sockets or TCP subflows in addition to the MPTCP sockets that were already monitored.
While at it, use 'ss' with '-n' to avoid resolving service names, which is not needed here.
Fixes: 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 1408698df0999730d9a9a4ad6a6362a4ab122d54..b148cadb96d0b735901c92a1e4417f29c263dd90 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1248,7 +1248,7 @@ void xdisconnect(int fd) else xerror("bad family");
- strcpy(cmd, "ss -M | grep -q "); + strcpy(cmd, "ss -Mnt | grep -q "); cmdlen = strlen(cmd); if (!inet_ntop(addr.ss_family, raw_addr, &cmd[cmdlen], sizeof(cmd) - cmdlen)) @@ -1258,7 +1258,7 @@ void xdisconnect(int fd)
/* * wait until the pending data is completely flushed and all - * the MPTCP sockets reached the closed status. + * the sockets reached the closed status. * disconnect will bypass/ignore/drop any pending data. */ for (i = 0; ; i += msec_sleep) {
This is better than printing random bytes in the terminal.
Note that Jakub suggested 'hexdump', but Mat found out this tool is not often installed by default. 'od' can do a similar job, and it is in the POSIX specs and available in coreutils, so it should be on more systems.
While at it, display a few more bytes, just to fill in the two lines. And no need to display the 3rd only line showing the next number of bytes: 0000040.
Suggested-by: Jakub Kicinski kuba@kernel.org Suggested-by: Mat Martineau martineau@kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh index 09cd24b2ae466205dacbdf8289eb86c08534c475..d62e653d48b0f2ef7a01e289fa0be8907825667d 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -384,7 +384,7 @@ mptcp_lib_make_file() { mptcp_lib_print_file_err() { ls -l "${1}" 1>&2 echo "Trailing bytes are: " - tail -c 27 "${1}" + tail -c 32 "${1}" | od -x | head -n2 }
# $1: input file ; $2: output file ; $3: what kind of file
To be able to find which capture files have been produced after several runs.
This prefix was not printed anywhere before.
While at it, always use the same prefix by taking info from ns1, instead of "$connector_ns", which is sometimes ns1, sometimes ns2 in the subtests.
Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index c2ab9f7f0d2133559bb18ce884b613d21d1ec5f0..47ecb5b3836eb5c64787bda0f6f1cc7c3af338aa 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -211,6 +211,11 @@ if $checksum; then done fi
+if $capture; then + rndh="${ns1:4}" + mptcp_lib_pr_info "Packet capture files will have this prefix: ${rndh}-" +fi + set_ethtool_flags() { local ns="$1" local dev="$2" @@ -361,7 +366,6 @@ do_transfer()
if $capture; then local capuser - local rndh="${connector_ns:4}" if [ -z $SUDO_USER ] ; then capuser="" else
linux-kselftest-mirror@lists.linaro.org