Patch 1 avoids scheduling the MPTCP worker on a closed socket on some edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate disconnected sockets. This is also a fix for an issue that can be visible from v5.11.
Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used and an early fallback is done. A fix for v6.2.
Patch 4 improves the stability of the userspace PM selftest for a subtest added in v6.2.
Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- Matthieu Baerts (1): selftests: mptcp: userspace pm: uniform verify events
Paolo Abeni (3): mptcp: use mptcp_schedule_work instead of open-coding it mptcp: stricter state check in mptcp_worker mptcp: fix NULL pointer dereference on fastopen early fallback
net/mptcp/fastopen.c | 11 +++++++++-- net/mptcp/options.c | 5 ++--- net/mptcp/protocol.c | 2 +- net/mptcp/subflow.c | 18 ++++++------------ tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 ++ 5 files changed, 20 insertions(+), 18 deletions(-) --- base-commit: a4506722dc39ca840593f14e3faa4c9ba9408211 change-id: 20230411-upstream-net-20230411-mptcp-fixes-db47f50c2688
Best regards,
From: Paolo Abeni pabeni@redhat.com
Beyond reducing code duplication this also avoids scheduling the mptcp_worker on a closed socket on some edge scenarios.
The addressed issue is actually older than the blamed commit below, but this fix needs it as a pre-requisite.
Fixes: ba8f48f7a4d7 ("mptcp: introduce mptcp_schedule_work") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- net/mptcp/options.c | 5 ++--- net/mptcp/subflow.c | 18 ++++++------------ 2 files changed, 8 insertions(+), 15 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c index b30cea2fbf3f..355f798d575a 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1192,9 +1192,8 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) */ if (TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) { if (mp_opt.data_fin && mp_opt.data_len == 1 && - mptcp_update_rcv_data_fin(msk, mp_opt.data_seq, mp_opt.dsn64) && - schedule_work(&msk->work)) - sock_hold(subflow->conn); + mptcp_update_rcv_data_fin(msk, mp_opt.data_seq, mp_opt.dsn64)) + mptcp_schedule_work((struct sock *)msk);
return true; } diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index a0041360ee9d..d34588850545 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -408,9 +408,8 @@ void mptcp_subflow_reset(struct sock *ssk)
tcp_send_active_reset(ssk, GFP_ATOMIC); tcp_done(ssk); - if (!test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &mptcp_sk(sk)->flags) && - schedule_work(&mptcp_sk(sk)->work)) - return; /* worker will put sk for us */ + if (!test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &mptcp_sk(sk)->flags)) + mptcp_schedule_work(sk);
sock_put(sk); } @@ -1118,8 +1117,8 @@ static enum mapping_status get_mapping_status(struct sock *ssk, skb_ext_del(skb, SKB_EXT_MPTCP); return MAPPING_OK; } else { - if (updated && schedule_work(&msk->work)) - sock_hold((struct sock *)msk); + if (updated) + mptcp_schedule_work((struct sock *)msk);
return MAPPING_DATA_FIN; } @@ -1222,17 +1221,12 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb, /* sched mptcp worker to remove the subflow if no more data is pending */ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ssk) { - struct sock *sk = (struct sock *)msk; - if (likely(ssk->sk_state != TCP_CLOSE)) return;
if (skb_queue_empty(&ssk->sk_receive_queue) && - !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) { - sock_hold(sk); - if (!schedule_work(&msk->work)) - sock_put(sk); - } + !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) + mptcp_schedule_work((struct sock *)msk); }
static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
From: Paolo Abeni pabeni@redhat.com
As reported by Christoph, the mptcp protocol can run the worker when the relevant msk socket is in an unexpected state:
connect() // incoming reset + fastclose // the mptcp worker is scheduled mptcp_disconnect() // msk is now CLOSED listen() mptcp_worker()
Leading to the following splat:
divide error: 0000 [#1] PREEMPT SMP CPU: 1 PID: 21 Comm: kworker/1:0 Not tainted 6.3.0-rc1-gde5e8fd0123c #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__tcp_select_window+0x22c/0x4b0 net/ipv4/tcp_output.c:3018 RSP: 0018:ffffc900000b3c98 EFLAGS: 00010293 RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8214ce97 RDI: 0000000000000004 RBP: 000000000000ffd7 R08: 0000000000000004 R09: 0000000000010000 R10: 000000000000ffd7 R11: ffff888005afa148 R12: 000000000000ffd7 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000405270 CR3: 000000003011e006 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_select_window net/ipv4/tcp_output.c:262 [inline] __tcp_transmit_skb+0x356/0x1280 net/ipv4/tcp_output.c:1345 tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline] tcp_send_active_reset+0x13e/0x320 net/ipv4/tcp_output.c:3459 mptcp_check_fastclose net/mptcp/protocol.c:2530 [inline] mptcp_worker+0x6c7/0x800 net/mptcp/protocol.c:2705 process_one_work+0x3bd/0x950 kernel/workqueue.c:2390 worker_thread+0x5b/0x610 kernel/workqueue.c:2537 kthread+0x138/0x170 kernel/kthread.c:376 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 </TASK>
This change addresses the issue explicitly checking for bad states before running the mptcp worker.
Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch cpaasch@apple.com Link: https://github.com/multipath-tcp/mptcp_net-next/issues/374 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Tested-by: Christoph Paasch cpaasch@apple.com Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- net/mptcp/protocol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 60b23b2716c4..06c5872e3b00 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2626,7 +2626,7 @@ static void mptcp_worker(struct work_struct *work)
lock_sock(sk); state = sk->sk_state; - if (unlikely(state == TCP_CLOSE)) + if (unlikely((1 << state) & (TCPF_CLOSE | TCPF_LISTEN))) goto unlock;
mptcp_check_data_fin_ack(sk);
From: Paolo Abeni pabeni@redhat.com
In case of early fallback to TCP, subflow_syn_recv_sock() deletes the subflow context before returning the newly allocated sock to the caller.
The fastopen path does not cope with the above unconditionally dereferencing the subflow context.
Fixes: 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- net/mptcp/fastopen.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c index d237d142171c..bceaab8dd8e4 100644 --- a/net/mptcp/fastopen.c +++ b/net/mptcp/fastopen.c @@ -9,11 +9,18 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subflow, struct request_sock *req) { - struct sock *ssk = subflow->tcp_sock; - struct sock *sk = subflow->conn; + struct sock *sk, *ssk; struct sk_buff *skb; struct tcp_sock *tp;
+ /* on early fallback the subflow context is deleted by + * subflow_syn_recv_sock() + */ + if (!subflow) + return; + + ssk = subflow->tcp_sock; + sk = subflow->conn; tp = tcp_sk(ssk);
subflow->is_mptfo = 1;
Simply adding a "sleep" before checking something is usually not a good idea because the time that has been picked can not be enough or too much. The best is to wait for events with a timeout.
In this selftest, 'sleep 0.5' is used more than 40 times. It is always used before calling a 'verify_*' function except for this verify_listener_events which has been added later.
At the end, using all these 'sleep 0.5' seems to work: the slow CIs don't complain so far. Also because it doesn't take too much time, we can just add two more 'sleep 0.5' to uniform what is done before calling a 'verify_*' function. For the same reasons, we can also delay a bigger refactoring to replace all these 'sleep 0.5' by functions waiting for events instead of waiting for a fix time and hope for the best.
Fixes: 6c73008aa301 ("selftests: mptcp: listener test for userspace PM") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/userspace_pm.sh b/tools/testing/selftests/net/mptcp/userspace_pm.sh index 48e52f995a98..b1eb7bce599d 100755 --- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -913,6 +913,7 @@ test_listener() $client4_port > /dev/null 2>&1 & local listener_pid=$!
+ sleep 0.5 verify_listener_events $client_evts $LISTENER_CREATED $AF_INET 10.0.2.2 $client4_port
# ADD_ADDR from client to server machine reusing the subflow port @@ -928,6 +929,7 @@ test_listener() # Delete the listener from the client ns, if one was created kill_wait $listener_pid
+ sleep 0.5 verify_listener_events $client_evts $LISTENER_CLOSED $AF_INET 10.0.2.2 $client4_port }
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Tue, 11 Apr 2023 22:42:08 +0200 you wrote:
Patch 1 avoids scheduling the MPTCP worker on a closed socket on some edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate disconnected sockets. This is also a fix for an issue that can be visible from v5.11.
[...]
Here is the summary with links: - [net,1/4] mptcp: use mptcp_schedule_work instead of open-coding it https://git.kernel.org/netdev/net/c/a5cb752b1257 - [net,2/4] mptcp: stricter state check in mptcp_worker https://git.kernel.org/netdev/net/c/d6a044373343 - [net,3/4] mptcp: fix NULL pointer dereference on fastopen early fallback https://git.kernel.org/netdev/net/c/c0ff6f6da66a - [net,4/4] selftests: mptcp: userspace pm: uniform verify events https://git.kernel.org/netdev/net/c/711ae788cbbb
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org