[PATCH net-next 0/8] mptcp: receive path improvement

List overview All Threads
Download

newer

older

[PATCH v4 0/7] platform/chrome:...

[PATCH] KVM: selftests: fix...

Matthieu Baerts (NGI0)

27 Sep 2025 27 Sep '25

9:40 a.m.

This series includes several changes to the MPTCP RX path. The main goals are improving the RX performances, and increase the long term maintainability.

Some changes reflects recent(ish) improvements introduced in the TCP stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free and auto-tuning improvements. Note that patch 3 could possibly fix additional issues, and overall such patch should protect from similar issues to arise in the future.

Patches 4-7 are aimed at introducing the socket backlog usage which will be done in a later series to process the packets received by the different subflows while the msk socket is owned.

Patch 8 is not related to the RX path, but it contains additional tests for new features recently introduced in net-next.

Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - Sorry for sending this series that late, we had quite a few patches to upstream during this cycle. This is the last batch, and it has been heavily tested the last 2 weeks. - If there are some issues with some patches, but not with 1-3, it would be nice, if possible, if these 3 first patches can be accepted, to reduce the recently introduced gap with TCP. - Patches can be grouped like this if needed: 1-3, 4-5, 6-7, 8. 6-7 are preparing the ground for future on-going work, they can be dropped if there are issues with them.

--- Matthieu Baerts (NGI0) (1): selftests: mptcp: join: validate new laminar endp

Paolo Abeni (7): mptcp: leverage skb deferral free tcp: make tcp_rcvbuf_grow() accessible to mptcp code mptcp: rcvbuf auto-tuning improvement mptcp: introduce the mptcp_init_skb helper mptcp: remove unneeded mptcp_move_skb() mptcp: factor out a basic skb coalesce helper mptcp: minor move_skbs_to_msk() cleanup

include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 2 +- net/mptcp/protocol.c | 187 ++++++++++++------------ net/mptcp/protocol.h | 4 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 69 +++++++++ tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 ++ 6 files changed, 177 insertions(+), 95 deletions(-) --- base-commit: 1493c18fe8696bfc758a97130a485fc4e08387f5 change-id: 20250927-net-next-mptcp-rcv-path-imp-192d8c24c9c7

Best regards,

-- Matthieu Baerts (NGI0) matttbe@kernel.org

Show replies by date

Matthieu Baerts (NGI0)

27 Sep 27 Sep

9:40 a.m.

New subject: [PATCH net-next 1/8] mptcp: leverage skb deferral free

From: Paolo Abeni pabeni@redhat.com

Usage of the skb deferral API is straight-forward; with multiple subflows actives this allow moving part of the received application load into multiple CPUs.

Also fix a typo in the related comment.

Reviewed-by: Geliang Tang geliang@kernel.org Tested-by: Geliang Tang geliang@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 735a209d40725f077de1056de5e1c64ffec77f55..62cdd2bcff9da12783b97fd40813ede85b5c83d9 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1943,12 +1943,13 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, }

if (!(flags & MSG_PEEK)) { - /* avoid the indirect call, we know the destructor is sock_wfree */ + /* avoid the indirect call, we know the destructor is sock_rfree */ skb->destructor = NULL; + skb->sk = NULL; atomic_sub(skb->truesize, &sk->sk_rmem_alloc); sk_mem_uncharge(sk, skb->truesize); __skb_unlink(skb, &sk->sk_receive_queue); - __kfree_skb(skb); + skb_attempt_defer_free(skb); msk->bytes_consumed += count; }

-- 2.51.0

Matthieu Baerts (NGI0)

9:40 a.m.

New subject: [PATCH net-next 2/8] tcp: make tcp_rcvbuf_grow() accessible to mptcp code

From: Paolo Abeni pabeni@redhat.com

To leverage the auto-tuning improvements brought by commit 2da35e4b4df9 ("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need to access the mentioned helper.

Acked-by: Geliang Tang geliang@kernel.org Acked-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h index 7c51a0a5ace820bd45d4cc551a15154f8488a880..5ca230ed526ae02711e8d2a409b91664b73390f2 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -370,6 +370,7 @@ void tcp_delack_timer_handler(struct sock *sk); int tcp_ioctl(struct sock *sk, int cmd, int *karg); enum skb_drop_reason tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb); void tcp_rcv_established(struct sock *sk, struct sk_buff *skb); +void tcp_rcvbuf_grow(struct sock *sk); void tcp_rcv_space_adjust(struct sock *sk); int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp); void tcp_twsk_destructor(struct sock *sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 79d5252ed6cc1a24ec898f4168d47c39c6e92fe1..e2b5a739fb16dcbfce62d63f28bbd1c971aad747 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -891,7 +891,7 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk, } }

-static void tcp_rcvbuf_grow(struct sock *sk) +void tcp_rcvbuf_grow(struct sock *sk) { const struct net *net = sock_net(sk); struct tcp_sock *tp = tcp_sk(sk);

-- 2.51.0

Matthieu Baerts (NGI0)

9:40 a.m.

New subject: [PATCH net-next 3/8] mptcp: rcvbuf auto-tuning improvement

From: Paolo Abeni pabeni@redhat.com

Apply to the MPTCP auto-tuning the same improvements introduced for the TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch 'tcp-receive-side-improvements'").

The main difference is that TCP subflow and the main MPTCP socket need to account separately for OoO: MPTCP does not care for TCP-level OoO and vice versa, as a consequence do not reflect MPTCP-level rcvbuf increase due to OoO packets at the subflow level.

This refeactor additionally allow dropping the msk receive buffer update at receive time, as the latter only intended to cope with subflow receive buffer increase due to OoO packets.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/559 Reviewed-by: Geliang Tang geliang@kernel.org Tested-by: Geliang Tang geliang@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 97 +++++++++++++++++++++++++--------------------------- net/mptcp/protocol.h | 4 +-- 2 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 62cdd2bcff9da12783b97fd40813ede85b5c83d9..f994e7f45f7b96c280708d7a29c1423a91e4cfee 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -179,6 +179,35 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *msk, struct sk_buff *to, return mptcp_try_coalesce((struct sock *)msk, to, from); }

+/* "inspired" by tcp_rcvbuf_grow(), main difference: + * - mptcp does not maintain a msk-level window clamp + * - returns true when the receive buffer is actually updated + */ +static bool mptcp_rcvbuf_grow(struct sock *sk) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + const struct net *net = sock_net(sk); + int rcvwin, rcvbuf, cap; + + if (!READ_ONCE(net->ipv4.sysctl_tcp_moderate_rcvbuf) || + (sk->sk_userlocks & SOCK_RCVBUF_LOCK)) + return false; + + rcvwin = msk->rcvq_space.space << 1; + + if (!RB_EMPTY_ROOT(&msk->out_of_order_queue)) + rcvwin += MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - msk->ack_seq; + + cap = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]); + + rcvbuf = min_t(u32, mptcp_space_from_win(sk, rcvwin), cap); + if (rcvbuf > sk->sk_rcvbuf) { + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); + return true; + } + return false; +} + /* "inspired" by tcp_data_queue_ofo(), main differences: * - use mptcp seqs * - don't cope with sacks @@ -292,6 +321,9 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb) end: skb_condense(skb); skb_set_owner_r(skb, sk); + /* do not grow rcvbuf for not-yet-accepted or orphaned sockets. */ + if (sk->sk_socket) + mptcp_rcvbuf_grow(sk); }

static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, @@ -784,18 +816,10 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk) return moved; }

-static void __mptcp_rcvbuf_update(struct sock *sk, struct sock *ssk) -{ - if (unlikely(ssk->sk_rcvbuf > sk->sk_rcvbuf)) - WRITE_ONCE(sk->sk_rcvbuf, ssk->sk_rcvbuf); -} - static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_sock *msk = mptcp_sk(sk);

- __mptcp_rcvbuf_update(sk, ssk); - /* Wake-up the reader only for in-sequence data */ if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) sk->sk_data_ready(sk); @@ -2014,48 +2038,26 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) if (msk->rcvq_space.copied <= msk->rcvq_space.space) goto new_measure;

- if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { - u64 rcvwin, grow; - int rcvbuf; + msk->rcvq_space.space = msk->rcvq_space.copied; + if (mptcp_rcvbuf_grow(sk)) {

- rcvwin = ((u64)msk->rcvq_space.copied << 1) + 16 * advmss; + /* Make subflows follow along. If we do not do this, we + * get drops at subflow level if skbs can't be moved to + * the mptcp rx queue fast enough (announced rcv_win can + * exceed ssk->sk_rcvbuf). + */ + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk; + bool slow;

- grow = rcvwin * (msk->rcvq_space.copied - msk->rcvq_space.space); - - do_div(grow, msk->rcvq_space.space); - rcvwin += (grow << 1); - - rcvbuf = min_t(u64, mptcp_space_from_win(sk, rcvwin), - READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); - - if (rcvbuf > sk->sk_rcvbuf) { - u32 window_clamp; - - window_clamp = mptcp_win_from_space(sk, rcvbuf); - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); - - /* Make subflows follow along. If we do not do this, we - * get drops at subflow level if skbs can't be moved to - * the mptcp rx queue fast enough (announced rcv_win can - * exceed ssk->sk_rcvbuf). - */ - mptcp_for_each_subflow(msk, subflow) { - struct sock *ssk; - bool slow; - - ssk = mptcp_subflow_tcp_sock(subflow); - slow = lock_sock_fast(ssk); - WRITE_ONCE(ssk->sk_rcvbuf, rcvbuf); - WRITE_ONCE(tcp_sk(ssk)->window_clamp, window_clamp); - if (tcp_can_send_ack(ssk)) - tcp_cleanup_rbuf(ssk, 1); - unlock_sock_fast(ssk, slow); - } + ssk = mptcp_subflow_tcp_sock(subflow); + slow = lock_sock_fast(ssk); + tcp_sk(ssk)->rcvq_space.space = msk->rcvq_space.copied; + tcp_rcvbuf_grow(ssk); + unlock_sock_fast(ssk, slow); } }

- msk->rcvq_space.space = msk->rcvq_space.copied; new_measure: msk->rcvq_space.copied = 0; msk->rcvq_space.time = mstamp; @@ -2084,11 +2086,6 @@ static bool __mptcp_move_skbs(struct sock *sk) if (list_empty(&msk->conn_list)) return false;

- /* verify we can move any data from the subflow, eventually updating */ - if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) - mptcp_for_each_subflow(msk, subflow) - __mptcp_rcvbuf_update(sk, subflow->tcp_sock); - subflow = list_first_entry(&msk->conn_list, struct mptcp_subflow_context, node); for (;;) { diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 371084a3fc225391fe98ad42a2e2f63465119989..52f9cfa4ce95c789a7b9c53c47095abe7964d18f 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -341,8 +341,8 @@ struct mptcp_sock { struct mptcp_pm_data pm; struct mptcp_sched_ops *sched; struct { - u32 space; /* bytes copied in last measurement window */ - u32 copied; /* bytes copied in this measurement window */ + int space; /* bytes copied in last measurement window */ + int copied; /* bytes copied in this measurement window */ u64 time; /* start time of measurement window */ u64 rtt_us; /* last maximum rtt of subflows */ } rcvq_space;

-- 2.51.0

Matthieu Baerts (NGI0)

9:40 a.m.

New subject: [PATCH net-next 4/8] mptcp: introduce the mptcp_init_skb helper

From: Paolo Abeni pabeni@redhat.com

Factor out all the skb initialization step in a new helper and use it. Note that this change moves the MPTCP CB initialization earlier: we can do such step as soon as the skb leaves the subflow socket receive queues.

Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Geliang Tang geliang@kernel.org Tested-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 46 +++++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f994e7f45f7b96c280708d7a29c1423a91e4cfee..832782e23740d22acda608966c29516666c9b111 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -326,27 +326,11 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb) mptcp_rcvbuf_grow(sk); }

-static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, - struct sk_buff *skb, unsigned int offset, - size_t copy_len) +static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offset, + int copy_len) { - struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); - struct sock *sk = (struct sock *)msk; - struct sk_buff *tail; - bool has_rxtstamp; - - __skb_unlink(skb, &ssk->sk_receive_queue); - - skb_ext_reset(skb); - skb_orphan(skb); - - /* try to fetch required memory from subflow */ - if (!sk_rmem_schedule(sk, skb, skb->truesize)) { - MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); - goto drop; - } - - has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp; + const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + bool has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;

/* the skb map_seq accounts for the skb offset: * mptcp_subflow_get_mapped_dsn() is based on the current tp->copied_seq @@ -358,6 +342,24 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp; MPTCP_SKB_CB(skb)->cant_coalesce = 0;

+ __skb_unlink(skb, &ssk->sk_receive_queue); + + skb_ext_reset(skb); + skb_dst_drop(skb); +} + +static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb) +{ + u64 copy_len = MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq; + struct mptcp_sock *msk = mptcp_sk(sk); + struct sk_buff *tail; + + /* try to fetch required memory from subflow */ + if (!sk_rmem_schedule(sk, skb, skb->truesize)) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); + goto drop; + } + if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) { /* in sequence */ msk->bytes_received += copy_len; @@ -678,7 +680,9 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, if (offset < skb->len) { size_t len = skb->len - offset;

- ret = __mptcp_move_skb(msk, ssk, skb, offset, len) || ret; + mptcp_init_skb(ssk, skb, offset, len); + skb_orphan(skb); + ret = __mptcp_move_skb(sk, skb) || ret; seq += len;

if (unlikely(map_remaining < len)) {

-- 2.51.0

Matthieu Baerts (NGI0)

9:40 a.m.

New subject: [PATCH net-next 5/8] mptcp: remove unneeded mptcp_move_skb()

From: Paolo Abeni pabeni@redhat.com

Since commit b7535cfed223 ("mptcp: drop legacy code around RX EOF"), sk_shutdown can't change during the main recvmsg loop, we can drop the related race breaker.

Reviewed-by: Geliang Tang geliang@kernel.org Tested-by: Geliang Tang geliang@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 832782e23740d22acda608966c29516666c9b111..26fbd9f6a3f7802c428e79c7f4e1da45aa9533e5 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2207,14 +2207,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, break; }

- if (sk->sk_shutdown & RCV_SHUTDOWN) { - /* race breaker: the shutdown could be after the - * previous receive queue check - */ - if (__mptcp_move_skbs(sk)) - continue; + if (sk->sk_shutdown & RCV_SHUTDOWN) break; - }

if (sk->sk_state == TCP_CLOSE) { copied = -ENOTCONN;

-- 2.51.0

Matthieu Baerts (NGI0)

9:40 a.m.

New subject: [PATCH net-next 6/8] mptcp: factor out a basic skb coalesce helper

From: Paolo Abeni pabeni@redhat.com

The upcoming patch will introduced backlog processing for MPTCP socket, and we want to leverage coalescing in such data path.

Factor out the relevant bits not touching memory accounting to deal with such use-case.

Co-developed-by: Geliang Tang geliang@kernel.org Signed-off-by: Geliang Tang geliang@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 26fbd9f6a3f7802c428e79c7f4e1da45aa9533e5..da21f1807729acdb7d9427a399af66286ed125e2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -142,22 +142,33 @@ static void mptcp_drop(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); }

-static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, - struct sk_buff *from) +static bool __mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, + struct sk_buff *from, bool *fragstolen, + int *delta) { - bool fragstolen; - int delta; + int limit = READ_ONCE(sk->sk_rcvbuf);

if (unlikely(MPTCP_SKB_CB(to)->cant_coalesce) || MPTCP_SKB_CB(from)->offset || - ((to->len + from->len) > (sk->sk_rcvbuf >> 3)) || - !skb_try_coalesce(to, from, &fragstolen, &delta)) + ((to->len + from->len) > (limit >> 3)) || + !skb_try_coalesce(to, from, fragstolen, delta)) return false;

pr_debug("colesced seq %llx into %llx new len %d new end seq %llx\n", MPTCP_SKB_CB(from)->map_seq, MPTCP_SKB_CB(to)->map_seq, to->len, MPTCP_SKB_CB(from)->end_seq); MPTCP_SKB_CB(to)->end_seq = MPTCP_SKB_CB(from)->end_seq; + return true; +} + +static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, + struct sk_buff *from) +{ + bool fragstolen; + int delta; + + if (!__mptcp_try_coalesce(sk, to, from, &fragstolen, &delta)) + return false;

/* note the fwd memory can reach a negative value after accounting * for the delta, but the later skb free will restore a non

-- 2.51.0

Matthieu Baerts (NGI0)

9:40 a.m.

New subject: [PATCH net-next 7/8] mptcp: minor move_skbs_to_msk() cleanup

From: Paolo Abeni pabeni@redhat.com

Such function is called only by __mptcp_data_ready(), which in turn is always invoked when msk is not owned by the user: we can drop the redundant, related check.

Additionally mptcp needs to propagate the socket error only for current subflow.

Reviewed-by: Geliang Tang geliang@kernel.org Tested-by: Geliang Tang geliang@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index da21f1807729acdb7d9427a399af66286ed125e2..0292162a14eedffde166cc2a2d4eaa7c3aa6760d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -814,12 +814,8 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)

moved = __mptcp_move_skbs_from_subflow(msk, ssk); __mptcp_ofo_queue(msk); - if (unlikely(ssk->sk_err)) { - if (!sock_owned_by_user(sk)) - __mptcp_error_report(sk); - else - __set_bit(MPTCP_ERROR_REPORT, &msk->cb_flags); - } + if (unlikely(ssk->sk_err)) + __mptcp_subflow_error_report(sk, ssk);

/* If the moves have caught up with the DATA_FIN sequence number * it's time to ack the DATA_FIN and change socket state, but

-- 2.51.0

Matthieu Baerts (NGI0)

9:40 a.m.

New subject: [PATCH net-next 8/8] selftests: mptcp: join: validate new laminar endp

Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar' endpoint type.

In a setup where subflows created using the routing rules would be rejected by the listener, and where the latter announces one IP address, some cases are verified:

- Without any 'laminar' endpoints: no new subflows are created.

- With one 'laminar' endpoint: a second subflow is created.

- With multiple 'laminar' endpoints: 2 IPv4 subflows are created.

- With one 'laminar' endpoint, but the server announcing a second IP address, only one subflow is created.

- With one 'laminar' + 'subflow' endpoint, the same endpoint is only used once.

Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 69 +++++++++++++++++++++++++ tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 ++++ 2 files changed, 78 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index a94b3960ad5e009dbead66b6ff2aa01f70aa3e1f..c90d8e8b95cbb6ba80f79208d1cc844673f2c249 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2320,6 +2320,74 @@ signal_address_tests() fi }

+laminar_endp_tests() +{ + # no laminar endpoints: routing rules are used + if reset_with_tcp_filter "without a laminar endpoint" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_laminar_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + run_tests $ns1 $ns2 10.0.1.1 + join_syn_tx=1 \ + chk_join_nr 0 0 0 + chk_add_nr 1 1 + fi + + # laminar endpoints: this endpoint is used + if reset_with_tcp_filter "with a laminar endpoint" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_laminar_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns2 10.0.3.2 flags laminar + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 1 1 + fi + + # laminar endpoints: these endpoints are used + if reset_with_tcp_filter "with multiple laminar endpoints" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_laminar_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns1 10.0.3.1 flags signal + pm_nl_add_endpoint $ns2 dead:beef:3::2 flags laminar + pm_nl_add_endpoint $ns2 10.0.3.2 flags laminar + pm_nl_add_endpoint $ns2 10.0.4.2 flags laminar + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 2 2 2 + chk_add_nr 2 2 + fi + + # laminar endpoints: only one endpoint is used + if reset_with_tcp_filter "single laminar endpoint" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_laminar_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns1 10.0.3.1 flags signal + pm_nl_add_endpoint $ns2 10.0.3.2 flags laminar + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 2 2 + fi + + # laminar endpoints: subflow and laminar flags + if reset_with_tcp_filter "sublow + laminar endpoints" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_laminar_max$"; then + pm_nl_set_limits $ns1 0 4 + pm_nl_set_limits $ns2 2 4 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,laminar + pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,laminar + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 1 1 + fi +} + link_failure_tests() { # accept and use add_addr with additional subflows and link loss @@ -4109,6 +4177,7 @@ all_tests_sorted=( f@subflows_tests e@subflows_error_tests s@signal_address_tests + L@laminar_endp_tests l@link_failure_tests t@add_addr_timeout_tests r@remove_tests diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c index d4981b76693bbddca74169437a540ad6294cf1d5..65b374232ff5ac06876dcd621fd2109c4d82cd12 100644 --- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c +++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c @@ -830,6 +830,8 @@ int add_addr(int fd, int pm_family, int argc, char *argv[]) flags |= MPTCP_PM_ADDR_FLAG_SUBFLOW; else if (!strcmp(tok, "signal")) flags |= MPTCP_PM_ADDR_FLAG_SIGNAL; + else if (!strcmp(tok, "laminar")) + flags |= MPTCP_PM_ADDR_FLAG_LAMINAR; else if (!strcmp(tok, "backup")) flags |= MPTCP_PM_ADDR_FLAG_BACKUP; else if (!strcmp(tok, "fullmesh")) @@ -1018,6 +1020,13 @@ static void print_addr(struct rtattr *attrs, int len) printf(","); }

+ if (flags & MPTCP_PM_ADDR_FLAG_LAMINAR) { + printf("laminar"); + flags &= ~MPTCP_PM_ADDR_FLAG_LAMINAR; + if (flags) + printf(","); + } + if (flags & MPTCP_PM_ADDR_FLAG_BACKUP) { printf("backup"); flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP;

-- 2.51.0

Jakub Kicinski

30 Sep 30 Sep

1:27 a.m.

On Sat, 27 Sep 2025 11:40:36 +0200 Matthieu Baerts (NGI0) wrote:

...

This series includes several changes to the MPTCP RX path. The main goals are improving the RX performances, and increase the long term maintainability.

Some changes reflects recent(ish) improvements introduced in the TCP stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free and auto-tuning improvements. Note that patch 3 could possibly fix additional issues, and overall such patch should protect from similar issues to arise in the future.

Patches 4-7 are aimed at introducing the socket backlog usage which will be done in a later series to process the packets received by the different subflows while the msk socket is owned.

Patch 8 is not related to the RX path, but it contains additional tests for new features recently introduced in net-next.

Could be a coincidence but we got 3 simult_flows.sh flakes since this was posted. Previous one was 20+ days ago: https://netdev.bots.linux.dev/contest.html?ld_cnt=250&pw-pass=n&pass...

Matthieu Baerts

1 Oct 1 Oct

8:50 a.m.

Hi Jakub,

On 30/09/2025 03:27, Jakub Kicinski wrote:

...

On Sat, 27 Sep 2025 11:40:36 +0200 Matthieu Baerts (NGI0) wrote:

...
This series includes several changes to the MPTCP RX path. The main goals are improving the RX performances, and increase the long term maintainability.

Some changes reflects recent(ish) improvements introduced in the TCP stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free and auto-tuning improvements. Note that patch 3 could possibly fix additional issues, and overall such patch should protect from similar issues to arise in the future.

Patches 4-7 are aimed at introducing the socket backlog usage which will be done in a later series to process the packets received by the different subflows while the msk socket is owned.

Patch 8 is not related to the RX path, but it contains additional tests for new features recently introduced in net-next.

Could be a coincidence but we got 3 simult_flows.sh flakes since this was posted. Previous one was 20+ days ago: https://netdev.bots.linux.dev/contest.html?ld_cnt=250&pw-pass=n&pass...

Thank you for this message! Our CI didn't spot this issue in the last 2 weeks. I see it happened again on NIPA (sorry for that, and thank you for having ignored this selftest) so I guess it is not a coincidence, I'm going to investigate this issue ASAP.

Cheers, Matt

-- Sponsored by the NGI0 Core fund.

patchwork-bot+netdevbpf＠kernel.org

30 Sep 30 Sep

1:30 a.m.

Hello:

This series was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:

On Sat, 27 Sep 2025 11:40:36 +0200 you wrote:

...

This series includes several changes to the MPTCP RX path. The main goals are improving the RX performances, and increase the long term maintainability.

Some changes reflects recent(ish) improvements introduced in the TCP stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free and auto-tuning improvements. Note that patch 3 could possibly fix additional issues, and overall such patch should protect from similar issues to arise in the future.

[...]

Here is the summary with links: - [net-next,1/8] mptcp: leverage skb deferral free https://git.kernel.org/netdev/net-next/c/9aa59323f270 - [net-next,2/8] tcp: make tcp_rcvbuf_grow() accessible to mptcp code https://git.kernel.org/netdev/net-next/c/a7556779745c - [net-next,3/8] mptcp: rcvbuf auto-tuning improvement https://git.kernel.org/netdev/net-next/c/e118cdc34dd1 - [net-next,4/8] mptcp: introduce the mptcp_init_skb helper https://git.kernel.org/netdev/net-next/c/9a0afe0db467 - [net-next,5/8] mptcp: remove unneeded mptcp_move_skb() https://git.kernel.org/netdev/net-next/c/c4ebc4ee4e75 - [net-next,6/8] mptcp: factor out a basic skb coalesce helper https://git.kernel.org/netdev/net-next/c/68c7af988bd1 - [net-next,7/8] mptcp: minor move_skbs_to_msk() cleanup https://git.kernel.org/netdev/net-next/c/59701b187003 - [net-next,8/8] selftests: mptcp: join: validate new laminar endp https://git.kernel.org/netdev/net-next/c/c912f935a5c7

You are awesome, thank you!

-- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html

days inactive

days old

linux-kselftest-mirror@lists.linaro.org

11 comments

participants

tags (0)

participants (4)

Jakub Kicinski
Matthieu Baerts
Matthieu Baerts (NGI0)
patchwork-bot+netdevbpf＠kernel.org