When a peer decides to close one subflow in the middle of a connection having multiple subflows, the receiver of the first FIN should accept that, and close the subflow on its side as well. If not, the subflow will stay half closed, and would even continue to be used until the end of the MPTCP connection or a reset from the network.
The issue has not been seen before, probably because the in-kernel path-manager always sends a RM_ADDR before closing the subflow. Upon the reception of this RM_ADDR, the other peer will initiate the closure on its side as well. On the other hand, if the RM_ADDR is lost, or if the path-manager of the other peer only closes the subflow without sending a RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it, leaving the subflow half-closed.
So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if the MPTCP connection has not been closed before with a DATA_FIN, the kernel owning the subflow schedules its worker to initiate the closure on its side as well.
This issue can be easily reproduced with packetdrill, as visible in [1], by creating an additional subflow, injecting a FIN+ACK before sending the DATA_FIN, and expecting a FIN+ACK in return.
Fixes: 40947e13997a ("mptcp: schedule worker when subflow is closed") Cc: stable@vger.kernel.org Link: https://github.com/multipath-tcp/packetdrill/pull/154 [1] Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 5 ++++- net/mptcp/subflow.c | 8 ++++++-- 2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0d536b183a6c..151e82e2ff2e 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2533,8 +2533,11 @@ static void __mptcp_close_subflow(struct sock *sk)
mptcp_for_each_subflow_safe(msk, subflow, tmp) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + int ssk_state = inet_sk_state_load(ssk);
- if (inet_sk_state_load(ssk) != TCP_CLOSE) + if (ssk_state != TCP_CLOSE && + (ssk_state != TCP_CLOSE_WAIT || + inet_sk_state_load(sk) != TCP_ESTABLISHED)) continue;
/* 'subflow_data_ready' will re-sched once rx queue is empty */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index a21c712350c3..4834e7fc2fb6 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1255,12 +1255,16 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb, /* sched mptcp worker to remove the subflow if no more data is pending */ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ssk) { - if (likely(ssk->sk_state != TCP_CLOSE)) + struct sock *sk = (struct sock *)msk; + + if (likely(ssk->sk_state != TCP_CLOSE && + (ssk->sk_state != TCP_CLOSE_WAIT || + inet_sk_state_load(sk) != TCP_ESTABLISHED))) return;
if (skb_queue_empty(&ssk->sk_receive_queue) && !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) - mptcp_schedule_work((struct sock *)msk); + mptcp_schedule_work(sk); }
static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)