Hi Jakub,
On 16/07/2025 16:26, Jakub Kicinski wrote:
On Tue, 15 Jul 2025 18:53:08 -0700 Jakub Kicinski wrote:
On Tue, 15 Jul 2025 20:43:27 +0200 Matthieu Baerts (NGI0) wrote:
mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to make sure everything works as expected when using "mmap" and "sendfile" modes instead of "poll", and with the MPTCP checksum support.
These modes should be validated, but they are not when the selftests are executed via the kselftest helpers. It means that most CIs validating these selftests, like NIPA for the net development trees and LKFT for the stable ones, are not covering these modes.
To fix that, new test programs have been added, simply calling mptcp_connect.sh with the right parameters.
The first patch can be backported up to v5.6, and the second one up to v5.14.
Looks like the failures that Paolo flagged yesterday:
https://lore.kernel.org/all/a7a89aa2-7354-42c7-8219-99a3cafd3b33@redhat.com/
are back as soon as this hit NIPA :(
https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-07-16--00-00...
No idea why TBH, the tests run sequentially and connect.sh run before any of the new ones.
And just to be sure, no CPU or IO overload at that moment? I didn't see such errors reported by our CI, but I can try to reproduce them locally in different conditions.
I'm gonna leave it in patchwork in case the next run is clean, please use pw-bot to discard them if they keep failing.
Oops, sorry I forgot to reply: when I checked in the morning, the last two builds were clean. I wanted to check the next one, then I forgot :)
It failed again on the latest run, in a somewhat more concerning way :(
# (duration 30279ms) [FAIL] file received by server does not match (in, out): # -rw------- 1 root root 5171914 Jul 16 05:24 /tmp/tmp.W2c96hxSIz # Trailing bytes are: # w,ѐ)-rw------- 1 root root 5166208 Jul 16 05:24 /tmp/tmp.s33PNcrN6M # Trailing bytes are: # (<v /&^<rnFsaC7INFO: with peek mode: saveAfterPeek
https://netdev-3.bots.linux.dev/vmksft-mptcp/results/211121/4-mptcp-connect-...
I see, the error can be a bit scary :)
If I'm not mistaken, there was a poll timeout error before. When it is detected, the test is stopped. After each test, even in case of errors, the received file is compared with the sending one. So here, this concerning error is expected.
Anyway, even if the errors are not caused by this series, I think it is better to delay these patches while we are investigating that:
pw-bot: cr
BTW feeding the random data into hexdump-like formatter seems advisable? :P
It is just to check that the CIs can correctly parse random data :-D
Cheers, Matt