Started as addressing the flakiness issues in rst_ipv*, that affect netdev dashboard.
Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com --- Dmitry Safonov (4): selftests/tcp_ao: Make RST tests less flaky selftests/tcp_ao: Zero-init tcp_ao_info_opt selftests/tcp_ao: Fix fscanf() call for format-security selftests/tcp_ao: Printing fixes to confirm with format-security
tools/testing/selftests/net/tcp_ao/lib/proc.c | 2 +- tools/testing/selftests/net/tcp_ao/lib/setup.c | 12 +++++------ tools/testing/selftests/net/tcp_ao/rst.c | 23 ++++++++++++---------- .../selftests/net/tcp_ao/setsockopt-closed.c | 2 +- 4 files changed, 21 insertions(+), 18 deletions(-) --- base-commit: 8f2c057754b25075aa3da132cd4fd4478cdab854 change-id: 20240413-tcp-ao-selftests-fixes-adacd65cb8ba
Best regards,
From: Dmitry Safonov 0x7f454c46@gmail.com
Currently, "active reset" cases are flaky, because select() is called for 3 sockets, while only 2 are expected to receive RST. The idea of the third socket was to get into request_sock_queue, but the test mistakenly attempted to connect() after the listener socket was shut down.
Repair this test, it's important to check the different kernel code-paths for signing RST TCP-AO segments.
Fixes: c6df7b2361d7 ("selftests/net: Add TCP-AO RST test") Reported-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com --- tools/testing/selftests/net/tcp_ao/rst.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/tcp_ao/rst.c b/tools/testing/selftests/net/tcp_ao/rst.c index 7df8b8700e39..a2fe88d35ac0 100644 --- a/tools/testing/selftests/net/tcp_ao/rst.c +++ b/tools/testing/selftests/net/tcp_ao/rst.c @@ -256,8 +256,6 @@ static int test_wait_fds(int sk[], size_t nr, bool is_writable[],
static void test_client_active_rst(unsigned int port) { - /* one in queue, another accept()ed */ - unsigned int wait_for = backlog + 2; int i, sk[3], err; bool is_writable[ARRAY_SIZE(sk)] = {false}; unsigned int last = ARRAY_SIZE(sk) - 1; @@ -275,16 +273,20 @@ static void test_client_active_rst(unsigned int port) for (i = 0; i < last; i++) { err = _test_connect_socket(sk[i], this_ip_dest, port, (i == 0) ? TEST_TIMEOUT_SEC : -1); - if (err < 0) test_error("failed to connect()"); }
- synchronize_threads(); /* 2: connection accept()ed, another queued */ - err = test_wait_fds(sk, last, is_writable, wait_for, TEST_TIMEOUT_SEC); + synchronize_threads(); /* 2: two connections: one accept()ed, another queued */ + err = test_wait_fds(sk, last, is_writable, last, TEST_TIMEOUT_SEC); if (err < 0) test_error("test_wait_fds(): %d", err);
+ /* async connect() with third sk to get into request_sock_queue */ + err = _test_connect_socket(sk[last], this_ip_dest, port, -1); + if (err < 0) + test_error("failed to connect()"); + synchronize_threads(); /* 3: close listen socket */ if (test_client_verify(sk[0], packet_sz, quota / packet_sz, TEST_TIMEOUT_SEC)) test_fail("Failed to send data on connected socket"); @@ -292,13 +294,14 @@ static void test_client_active_rst(unsigned int port) test_ok("Verified established tcp connection");
synchronize_threads(); /* 4: finishing up */ - err = _test_connect_socket(sk[last], this_ip_dest, port, -1); - if (err < 0) - test_error("failed to connect()");
synchronize_threads(); /* 5: closed active sk */ - err = test_wait_fds(sk, ARRAY_SIZE(sk), NULL, - wait_for, TEST_TIMEOUT_SEC); + /* + * Wait for 2 connections: one accepted, another in the accept queue, + * the one in request_sock_queue won't get fully established, so + * doesn't receive an active RST, see inet_csk_listen_stop(). + */ + err = test_wait_fds(sk, last, NULL, last, TEST_TIMEOUT_SEC); if (err < 0) test_error("select(): %d", err);
From: Dmitry Safonov 0x7f454c46@gmail.com
The structure is on the stack and has to be zero-initialized as the kernel checks for:
if (in.reserved != 0 || in.reserved2 != 0) return -EINVAL;
Fixes: b26660531cf6 ("selftests/net: Add test for TCP-AO add setsockopt() command") Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com --- tools/testing/selftests/net/tcp_ao/setsockopt-closed.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c b/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c index 452de131fa3a..517930f9721b 100644 --- a/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c +++ b/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c @@ -21,7 +21,7 @@ static void make_listen(int sk) static void test_vefify_ao_info(int sk, struct tcp_ao_info_opt *info, const char *tst) { - struct tcp_ao_info_opt tmp; + struct tcp_ao_info_opt tmp = {}; socklen_t len = sizeof(tmp);
if (getsockopt(sk, IPPROTO_TCP, TCP_AO_INFO, &tmp, &len))
From: Dmitry Safonov 0x7f454c46@gmail.com
On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces:
lib/proc.c: In function ‘netstat_read_type’: lib/proc.c:89:9: error: format not a string literal and no format arguments [-Werror=format-security] 89 | if (fscanf(fnetstat, type->header_name) == EOF) | ^~ cc1: some warnings being treated as errors
Here the selftests lib parses header name, while expectes non-space word ending with a column.
Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library") Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com --- tools/testing/selftests/net/tcp_ao/lib/proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tcp_ao/lib/proc.c b/tools/testing/selftests/net/tcp_ao/lib/proc.c index 2fb6dd8adba6..8b984fa04286 100644 --- a/tools/testing/selftests/net/tcp_ao/lib/proc.c +++ b/tools/testing/selftests/net/tcp_ao/lib/proc.c @@ -86,7 +86,7 @@ static void netstat_read_type(FILE *fnetstat, struct netstat **dest, char *line)
pos = strchr(line, ' ') + 1;
- if (fscanf(fnetstat, type->header_name) == EOF) + if (fscanf(fnetstat, "%[^ :]", type->header_name) == EOF) test_error("fscanf(%s)", type->header_name); if (fread(&tmp, 1, 1, fnetstat) != 1 || tmp != ':') test_error("Unexpected netstat format (%c)", tmp);
On Sat, 13 Apr 2024 at 02:43, Dmitry Safonov via B4 Relay devnull+0x7f454c46.gmail.com@kernel.org wrote:
From: Dmitry Safonov 0x7f454c46@gmail.com
On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces:
lib/proc.c: In function ‘netstat_read_type’: lib/proc.c:89:9: error: format not a string literal and no format arguments [-Werror=format-security] 89 | if (fscanf(fnetstat, type->header_name) == EOF) | ^~ cc1: some warnings being treated as errors
Here the selftests lib parses header name, while expectes non-space word ending with a column.
Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library") Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com
Actually, now I see that it was also reported, adding
Reported-by: Muhammad Usama Anjum usama.anjum@collabora.com Link: https://lore.kernel.org/all/0c6d4f0d-2064-4444-986b-1d1ed782135f@collabora.c...
From: Dmitry Safonov 0x7f454c46@gmail.com
On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces
lib/setup.c: In function ‘__test_msg’: lib/setup.c:20:9: error: format not a string literal and no format arguments [-Werror=format-security] 20 | ksft_print_msg(buf); | ^~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_ok’: lib/setup.c:26:9: error: format not a string literal and no format arguments [-Werror=format-security] 26 | ksft_test_result_pass(buf); | ^~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_fail’: lib/setup.c:32:9: error: format not a string literal and no format arguments [-Werror=format-security] 32 | ksft_test_result_fail(buf); | ^~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_xfail’: lib/setup.c:38:9: error: format not a string literal and no format arguments [-Werror=format-security] 38 | ksft_test_result_xfail(buf); | ^~~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_error’: lib/setup.c:44:9: error: format not a string literal and no format arguments [-Werror=format-security] 44 | ksft_test_result_error(buf); | ^~~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_skip’: lib/setup.c:50:9: error: format not a string literal and no format arguments [-Werror=format-security] 50 | ksft_test_result_skip(buf); | ^~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors
As the buffer was already pre-printed into, print it as a string rather than a format-string.
Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library") Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com --- tools/testing/selftests/net/tcp_ao/lib/setup.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/tcp_ao/lib/setup.c b/tools/testing/selftests/net/tcp_ao/lib/setup.c index 92276f916f2f..e408b9243b2c 100644 --- a/tools/testing/selftests/net/tcp_ao/lib/setup.c +++ b/tools/testing/selftests/net/tcp_ao/lib/setup.c @@ -17,37 +17,37 @@ static pthread_mutex_t ksft_print_lock = PTHREAD_MUTEX_INITIALIZER; void __test_msg(const char *buf) { pthread_mutex_lock(&ksft_print_lock); - ksft_print_msg(buf); + ksft_print_msg("%s", buf); pthread_mutex_unlock(&ksft_print_lock); } void __test_ok(const char *buf) { pthread_mutex_lock(&ksft_print_lock); - ksft_test_result_pass(buf); + ksft_test_result_pass("%s", buf); pthread_mutex_unlock(&ksft_print_lock); } void __test_fail(const char *buf) { pthread_mutex_lock(&ksft_print_lock); - ksft_test_result_fail(buf); + ksft_test_result_fail("%s", buf); pthread_mutex_unlock(&ksft_print_lock); } void __test_xfail(const char *buf) { pthread_mutex_lock(&ksft_print_lock); - ksft_test_result_xfail(buf); + ksft_test_result_xfail("%s", buf); pthread_mutex_unlock(&ksft_print_lock); } void __test_error(const char *buf) { pthread_mutex_lock(&ksft_print_lock); - ksft_test_result_error(buf); + ksft_test_result_error("%s", buf); pthread_mutex_unlock(&ksft_print_lock); } void __test_skip(const char *buf) { pthread_mutex_lock(&ksft_print_lock); - ksft_test_result_skip(buf); + ksft_test_result_skip("%s", buf); pthread_mutex_unlock(&ksft_print_lock); }
On Sat, 13 Apr 2024 at 02:43, Dmitry Safonov via B4 Relay devnull+0x7f454c46.gmail.com@kernel.org wrote:
From: Dmitry Safonov 0x7f454c46@gmail.com
On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces
lib/setup.c: In function ‘__test_msg’: lib/setup.c:20:9: error: format not a string literal and no format arguments [-Werror=format-security] 20 | ksft_print_msg(buf); | ^~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_ok’: lib/setup.c:26:9: error: format not a string literal and no format arguments [-Werror=format-security] 26 | ksft_test_result_pass(buf); | ^~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_fail’: lib/setup.c:32:9: error: format not a string literal and no format arguments [-Werror=format-security] 32 | ksft_test_result_fail(buf); | ^~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_xfail’: lib/setup.c:38:9: error: format not a string literal and no format arguments [-Werror=format-security] 38 | ksft_test_result_xfail(buf); | ^~~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_error’: lib/setup.c:44:9: error: format not a string literal and no format arguments [-Werror=format-security] 44 | ksft_test_result_error(buf); | ^~~~~~~~~~~~~~~~~~~~~~ lib/setup.c: In function ‘__test_skip’: lib/setup.c:50:9: error: format not a string literal and no format arguments [-Werror=format-security] 50 | ksft_test_result_skip(buf); | ^~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors
As the buffer was already pre-printed into, print it as a string rather than a format-string.
Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library") Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com
And this one as well,
Reported-by: Muhammad Usama Anjum usama.anjum@collabora.com Link: https://lore.kernel.org/all/0c6d4f0d-2064-4444-986b-1d1ed782135f@collabora.c...
Hello:
This series was applied to netdev/net.git (main) by Paolo Abeni pabeni@redhat.com:
On Sat, 13 Apr 2024 02:42:51 +0100 you wrote:
Started as addressing the flakiness issues in rst_ipv*, that affect netdev dashboard.
Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com
Dmitry Safonov (4): selftests/tcp_ao: Make RST tests less flaky selftests/tcp_ao: Zero-init tcp_ao_info_opt selftests/tcp_ao: Fix fscanf() call for format-security selftests/tcp_ao: Printing fixes to confirm with format-security
[...]
Here is the summary with links: - [net,1/4] selftests/tcp_ao: Make RST tests less flaky https://git.kernel.org/netdev/net/c/4225dfa4535f - [net,2/4] selftests/tcp_ao: Zero-init tcp_ao_info_opt https://git.kernel.org/netdev/net/c/b089b3bead53 - [net,3/4] selftests/tcp_ao: Fix fscanf() call for format-security https://git.kernel.org/netdev/net/c/beb78cd1329d - [net,4/4] selftests/tcp_ao: Printing fixes to confirm with format-security https://git.kernel.org/netdev/net/c/b476c93654d7
You are awesome, thank you!
On Tue, 16 Apr 2024 at 15:28, Jakub Kicinski kuba@kernel.org wrote:
On Sat, 13 Apr 2024 02:42:51 +0100 Dmitry Safonov via B4 Relay wrote:
Started as addressing the flakiness issues in rst_ipv*, that affect netdev dashboard.
Thank you! :)
Jakub, you are very welcome :) I'll keep an eye on the dashboard, but I very much encourage you to ping me in case of any other issues with tcp_ao selftests.
I currently have v2 for tcp-ao tracepoints, but delaying it as working on a reproducer/selftest for an issue I think I have a patch for.
BTW, do you know if those were addressed or anyone is looking into them? (from other tcp-ao hits, that seem not anyhow related to tcp-ao itself):
1. [ 240.001391][ T833] Possible interrupt unsafe locking scenario: [ 240.001391][ T833] [ 240.001635][ T833] CPU0 CPU1 [ 240.001797][ T833] ---- ---- [ 240.001958][ T833] lock(&p->alloc_lock); [ 240.002083][ T833] local_irq_disable(); [ 240.002284][ T833] lock(&ndev->lock); [ 240.002490][ T833] lock(&p->alloc_lock); [ 240.002709][ T833] <Interrupt> [ 240.002819][ T833] lock(&ndev->lock); [ 240.002937][ T833] [ 240.002937][ T833] *** DEADLOCK ***
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537021/14-self-con...
2. [ 251.411647][ T71] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 251.411986][ T71] 6.9.0-rc1-virtme #1 Not tainted [ 251.412214][ T71] ----------------------------------------------------- [ 251.412533][ T71] kworker/u16:1/71 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: [ 251.412837][ T71] ffff888005182c28 (&p->alloc_lock){+.+.}-{2:2}, at: __get_task_comm+0x27/0x70 [ 251.413214][ T71] [ 251.413214][ T71] and this task is already holding: [ 251.413527][ T71] ffff88802f83efd8 (&ul->lock){+.-.}-{2:2}, at: rt6_uncached_list_flush_dev+0x138/0x840 [ 251.413887][ T71] which would create a new lock dependency: [ 251.414153][ T71] (&ul->lock){+.-.}-{2:2} -> (&p->alloc_lock){+.+.}-{2:2} [ 251.414464][ T71] [ 251.414464][ T71] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 251.414808][ T71] (&ul->lock){+.-.}-{2:2}
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537201/17-icmps-di...
3. [ 264.280734][ C3] Possible unsafe locking scenario: [ 264.280734][ C3] [ 264.280968][ C3] CPU0 CPU1 [ 264.281117][ C3] ---- ---- [ 264.281263][ C3] lock((&tw->tw_timer)); [ 264.281427][ C3] lock(&hashinfo->ehash_locks[i]); [ 264.281647][ C3] lock((&tw->tw_timer)); [ 264.281834][ C3] lock(&hashinfo->ehash_locks[i]);
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/547461/19-self-con...
I can spend some time on them after I verify that my fix for -stable is actually fixing an issue I think it fixes. Seems like your automation + my selftests are giving some fruits, hehe.
Thanks, Dmitry
On Wed, 17 Apr 2024 19:47:18 +0100 Dmitry Safonov wrote:
- [ 240.001391][ T833] Possible interrupt unsafe locking scenario:
[ 240.001391][ T833] [ 240.001635][ T833] CPU0 CPU1 [ 240.001797][ T833] ---- ---- [ 240.001958][ T833] lock(&p->alloc_lock); [ 240.002083][ T833] local_irq_disable(); [ 240.002284][ T833] lock(&ndev->lock); [ 240.002490][ T833] lock(&p->alloc_lock); [ 240.002709][ T833] <Interrupt> [ 240.002819][ T833] lock(&ndev->lock); [ 240.002937][ T833] [ 240.002937][ T833] *** DEADLOCK ***
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537021/14-self-con...
- [ 251.411647][ T71] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock
order detected [ 251.411986][ T71] 6.9.0-rc1-virtme #1 Not tainted [ 251.412214][ T71] ----------------------------------------------------- [ 251.412533][ T71] kworker/u16:1/71 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: [ 251.412837][ T71] ffff888005182c28 (&p->alloc_lock){+.+.}-{2:2}, at: __get_task_comm+0x27/0x70 [ 251.413214][ T71] [ 251.413214][ T71] and this task is already holding: [ 251.413527][ T71] ffff88802f83efd8 (&ul->lock){+.-.}-{2:2}, at: rt6_uncached_list_flush_dev+0x138/0x840 [ 251.413887][ T71] which would create a new lock dependency: [ 251.414153][ T71] (&ul->lock){+.-.}-{2:2} -> (&p->alloc_lock){+.+.}-{2:2} [ 251.414464][ T71] [ 251.414464][ T71] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 251.414808][ T71] (&ul->lock){+.-.}-{2:2}
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537201/17-icmps-di...
- [ 264.280734][ C3] Possible unsafe locking scenario:
[ 264.280734][ C3] [ 264.280968][ C3] CPU0 CPU1 [ 264.281117][ C3] ---- ---- [ 264.281263][ C3] lock((&tw->tw_timer)); [ 264.281427][ C3] lock(&hashinfo->ehash_locks[i]); [ 264.281647][ C3] lock((&tw->tw_timer)); [ 264.281834][ C3] lock(&hashinfo->ehash_locks[i]);
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/547461/19-self-con...
I can spend some time on them after I verify that my fix for -stable is actually fixing an issue I think it fixes. Seems like your automation + my selftests are giving some fruits, hehe.
Oh, very interesting, I don't recall these coming up before.
We try to extract crashes but apparently we're missing lockdep splats. I'll try to improve the extraction logic...
On Wed, 17 Apr 2024 13:46:36 -0700 Jakub Kicinski wrote:
I can spend some time on them after I verify that my fix for -stable is actually fixing an issue I think it fixes. Seems like your automation + my selftests are giving some fruits, hehe.
Oh, very interesting, I don't recall these coming up before.
Correction, these are old, and if I plug the branch names here: https://netdev.bots.linux.dev/contest.html there is a whole bunch of tests failing that day.
Keep in mind these run pre-commit so not all failures are flakes.
On Wed, 17 Apr 2024 at 22:28, Jakub Kicinski kuba@kernel.org wrote:
On Wed, 17 Apr 2024 13:46:36 -0700 Jakub Kicinski wrote:
I can spend some time on them after I verify that my fix for -stable is actually fixing an issue I think it fixes. Seems like your automation + my selftests are giving some fruits, hehe.
Oh, very interesting, I don't recall these coming up before.
Correction, these are old, and if I plug the branch names here: https://netdev.bots.linux.dev/contest.html there is a whole bunch of tests failing that day.
Hmm, yeah, I was looking at the history of selftests to see if there is anything else interesting:
2024-04-11--15-00 - lockdep for hashinfo->ehash_locks vs tw->tw_timer
It seems that you actually reported that already here: https://lore.kernel.org/all/20240411100536.224fa1e7@kernel.org/
2024-04-04--12-00 - lockdep for p->alloc_lock vs ul->lock (rt6_uncached_list_flush_dev) 2024-04-04--09-00 - lockdep for p->alloc_lock vs ndev->lock (addrconf_permanent_addr) 2024-04-04--03-00 - lockdep for p->alloc_lock vs ul->lock
Was reported as well: https://lore.kernel.org/all/8576a80ac958812ac75b01299c2de3a6485f84a1.camel@r...
2024-03-06--00-00 - kernel BUG at net/core/skbuff.c:2813
Can't really track this down to any report/fix. Probably as it's month old and hasn't happened since on these tests - something was borken on that particular day.
Keep in mind these run pre-commit so not all failures are flakes.
Thanks, Dmitry
linux-kselftest-mirror@lists.linaro.org