The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 4b1ff850e0c1aacc23e923ed22989b827b9808f9 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025101658-underwire-colonize-b998@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4b1ff850e0c1aacc23e923ed22989b827b9808f9 Mon Sep 17 00:00:00 2001 From: "Matthieu Baerts (NGI0)" matttbe@kernel.org Date: Thu, 25 Sep 2025 12:32:36 +0200 Subject: [PATCH] mptcp: pm: in-kernel: usable client side with C-flag
When servers set the C-flag in their MP_CAPABLE to tell clients not to create subflows to the initial address and port, clients will likely not use their other endpoints. That's because the in-kernel path-manager uses the 'subflow' endpoints to create subflows only to the initial address and port.
If the limits have not been modified to accept ADD_ADDR, the client doesn't try to establish new subflows. If the limits accept ADD_ADDR, the routing routes will be used to select the source IP.
The C-flag is typically set when the server is operating behind a legacy Layer 4 load balancer, or using anycast IP address. Clients having their different 'subflow' endpoints setup, don't end up creating multiple subflows as expected, and causing some deployment issues.
A special case is then added here: when servers set the C-flag in the MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted. The 'subflows' endpoints will then be used with this new remote IP and port. This exception is only allowed when the ADD_ADDR is sent immediately after the 3WHS, and makes the client switching to the 'fully established' mode. After that, 'select_local_address()' will not be able to find any subflows, because 'id_avail_bitmap' will be filled in mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully established' mode.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536 Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-1-ad126cc... Signed-off-by: Jakub Kicinski kuba@kernel.org
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 204e1f61212e..584cab90aa6e 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -637,9 +637,12 @@ void mptcp_pm_add_addr_received(const struct sock *ssk, } else { __MPTCP_INC_STATS(sock_net((struct sock *)msk), MPTCP_MIB_ADDADDRDROP); } - /* id0 should not have a different address */ + /* - id0 should not have a different address + * - special case for C-flag: linked to fill_local_addresses_vec() + */ } else if ((addr->id == 0 && !mptcp_pm_is_init_remote_addr(msk, addr)) || - (addr->id > 0 && !READ_ONCE(pm->accept_addr))) { + (addr->id > 0 && !READ_ONCE(pm->accept_addr) && + !mptcp_pm_add_addr_c_flag_case(msk))) { mptcp_pm_announce_addr(msk, addr, true); mptcp_pm_add_addr_send_ack(msk); } else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) { diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index 667803d72b64..8c46493a0835 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -389,10 +389,12 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info mpc_addr; struct pm_nl_pernet *pernet; unsigned int subflows_max; + bool c_flag_case; int i = 0;
pernet = pm_nl_get_pernet_from_msk(msk); subflows_max = mptcp_pm_get_subflows_max(msk); + c_flag_case = remote->id && mptcp_pm_add_addr_c_flag_case(msk);
mptcp_local_address((struct sock_common *)msk, &mpc_addr);
@@ -405,12 +407,27 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, continue;
if (msk->pm.subflows < subflows_max) { + bool is_id0; + locals[i].addr = entry->addr; locals[i].flags = entry->flags; locals[i].ifindex = entry->ifindex;
+ is_id0 = mptcp_addresses_equal(&locals[i].addr, + &mpc_addr, + locals[i].addr.port); + + if (c_flag_case && + (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) { + __clear_bit(locals[i].addr.id, + msk->pm.id_avail_bitmap); + + if (!is_id0) + msk->pm.local_addr_used++; + } + /* Special case for ID0: set the correct ID */ - if (mptcp_addresses_equal(&locals[i].addr, &mpc_addr, locals[i].addr.port)) + if (is_id0) locals[i].addr.id = 0;
msk->pm.subflows++; @@ -419,6 +436,37 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, } rcu_read_unlock();
+ /* Special case: peer sets the C flag, accept one ADD_ADDR if default + * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints + */ + if (!i && c_flag_case) { + unsigned int local_addr_max = mptcp_pm_get_local_addr_max(msk); + + while (msk->pm.local_addr_used < local_addr_max && + msk->pm.subflows < subflows_max) { + struct mptcp_pm_local *local = &locals[i]; + + if (!select_local_address(pernet, msk, local)) + break; + + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + + if (!mptcp_pm_addr_families_match(sk, &local->addr, + remote)) + continue; + + if (mptcp_addresses_equal(&local->addr, &mpc_addr, + local->addr.port)) + continue; + + msk->pm.local_addr_used++; + msk->pm.subflows++; + i++; + } + + return i; + } + /* If the array is empty, fill in the single * 'IPADDRANY' local address */ diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a1787a1344ac..cbe54331e5c7 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -1199,6 +1199,14 @@ static inline void mptcp_pm_close_subflow(struct mptcp_sock *msk) spin_unlock_bh(&msk->pm.lock); }
+static inline bool mptcp_pm_add_addr_c_flag_case(struct mptcp_sock *msk) +{ + return READ_ONCE(msk->pm.remote_deny_join_id0) && + msk->pm.local_addr_used == 0 && + mptcp_pm_get_add_addr_accept_max(msk) == 0 && + msk->pm.subflows < mptcp_pm_get_subflows_max(msk); +} + void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk);
static inline struct mptcp_ext *mptcp_get_ext(const struct sk_buff *skb)
Greg recently reported the following patch could not be applied without conflicts in this tree:
- 4b1ff850e0c1 ("mptcp: pm: in-kernel: usable client side with C-flag")
Note that the following patch got applied, but at the wrong place and requiring additional modifications:
- 008385efd05e ("selftests: mptcp: join: validate C-flag + def limit")
Conflicts have been resolved, and documented in each patch.
Matthieu Baerts (NGI0) (2): mptcp: pm: in-kernel: usable client side with C-flag selftests: mptcp: join: validate C-flag + def limit
net/mptcp/pm.c | 7 ++- net/mptcp/pm_netlink.c | 49 ++++++++++++++++++- net/mptcp/protocol.h | 8 +++ .../testing/selftests/net/mptcp/mptcp_join.sh | 10 ++++ 4 files changed, 71 insertions(+), 3 deletions(-)
commit 4b1ff850e0c1aacc23e923ed22989b827b9808f9 upstream.
When servers set the C-flag in their MP_CAPABLE to tell clients not to create subflows to the initial address and port, clients will likely not use their other endpoints. That's because the in-kernel path-manager uses the 'subflow' endpoints to create subflows only to the initial address and port.
If the limits have not been modified to accept ADD_ADDR, the client doesn't try to establish new subflows. If the limits accept ADD_ADDR, the routing routes will be used to select the source IP.
The C-flag is typically set when the server is operating behind a legacy Layer 4 load balancer, or using anycast IP address. Clients having their different 'subflow' endpoints setup, don't end up creating multiple subflows as expected, and causing some deployment issues.
A special case is then added here: when servers set the C-flag in the MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted. The 'subflows' endpoints will then be used with this new remote IP and port. This exception is only allowed when the ADD_ADDR is sent immediately after the 3WHS, and makes the client switching to the 'fully established' mode. After that, 'select_local_address()' will not be able to find any subflows, because 'id_avail_bitmap' will be filled in mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully established' mode.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536 Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-1-ad126cc... Signed-off-by: Jakub Kicinski kuba@kernel.org [ Conflict in pm.c, because commit 498d7d8b75f1 ("mptcp: pm: remove '_nl' from mptcp_pm_nl_is_init_remote_addr") renamed an helper in the context, and it is not in this version. The same new code can be applied at the same place. Another conflict in pm.c, because commit 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") switched the modified 'if' statement to an 'else if', and is not in this version. The same modification can still be applied. Conflict in pm_kernel.c, because the modified code has been moved from pm_netlink.c to pm_kernel.c in commit 8617e85e04bd ("mptcp: pm: split in-kernel PM specific code"), which is not in this version. The resolution is easy: simply by applying the patch where 'pm_kernel.c' has been replaced 'pm_netlink.c'. Conflict in pm_netlink.c, because commit b83fbca1b4c9 ("mptcp: pm: reduce entries iterations on connect") is not in this version. Instead of using the 'locals' variable (struct mptcp_pm_local *) from the new version and embedding a "struct mptcp_addr_info", we can simply continue to use the 'addrs' variable (struct mptcp_addr_info *). Because commit b9d69db87fb7 ("mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addresses") is not in this version, it is also required to pass an extra parameter to fill_local_addresses_vec(): struct mptcp_addr_info *remote, which is available from the caller side. Same with commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs") adding the 'mptcp_' prefix to addresses_equal(). Conflict in protocol.h, because commit af3dc0ad3167 ("mptcp: Remove unused declaration mptcp_sockopt_sync()") is not in this version and it removed one line in the context. The resolution is easy because the new function can still be added at the same place. A similar conflict has been resolved due to commit 95d686517884 ("mptcp: fix subflow accounting on close"). ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/pm.c | 7 ++++-- net/mptcp/pm_netlink.c | 49 +++++++++++++++++++++++++++++++++++++++++- net/mptcp/protocol.h | 8 +++++++ 3 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 4fa31301fe84..737643e84ed1 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -189,9 +189,12 @@ void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
spin_lock_bh(&pm->lock);
- /* id0 should not have a different address */ + /* - id0 should not have a different address + * - special case for C-flag: linked to fill_local_addresses_vec() + */ if ((addr->id == 0 && !mptcp_pm_nl_is_init_remote_addr(msk, addr)) || - (addr->id > 0 && !READ_ONCE(pm->accept_addr))) { + (addr->id > 0 && !READ_ONCE(pm->accept_addr) && + !mptcp_pm_add_addr_c_flag_case(msk))) { mptcp_pm_announce_addr(msk, addr, true); mptcp_pm_add_addr_send_ack(msk); } else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) { diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index f9839980fcaf..df46ca14ce23 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -571,6 +571,7 @@ static void mptcp_pm_nl_subflow_established(struct mptcp_sock *msk) * and return the array size. */ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, + struct mptcp_addr_info *remote, struct mptcp_addr_info *addrs) { struct sock *sk = (struct sock *)msk; @@ -578,10 +579,12 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info mpc_addr; struct pm_nl_pernet *pernet; unsigned int subflows_max; + bool c_flag_case; int i = 0;
pernet = net_generic(sock_net(sk), pm_nl_pernet_id); subflows_max = mptcp_pm_get_subflows_max(msk); + c_flag_case = remote->id && mptcp_pm_add_addr_c_flag_case(msk);
mptcp_local_address((struct sock_common *)msk, &mpc_addr);
@@ -605,6 +608,10 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, msk->pm.subflows++; addrs[i] = entry->addr;
+ if (c_flag_case && + (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) + msk->pm.local_addr_used++; + /* Special case for ID0: set the correct ID */ if (addresses_equal(&entry->addr, &mpc_addr, entry->addr.port)) addrs[i].id = 0; @@ -614,6 +621,46 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, } rcu_read_unlock();
+ /* Special case: peer sets the C flag, accept one ADD_ADDR if default + * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints + */ + if (!i && c_flag_case) { + unsigned int local_addr_max = mptcp_pm_get_local_addr_max(msk); + + rcu_read_lock(); + __mptcp_flush_join_list(msk); + list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { + if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) + continue; + + if (entry->addr.family != sk->sk_family) { +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + if ((entry->addr.family == AF_INET && + !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) || + (sk->sk_family == AF_INET && + !ipv6_addr_v4mapped(&entry->addr.addr6))) +#endif + continue; + } + + /* avoid any address already in use by subflows and + * pending join + */ + if (!lookup_subflow_by_saddr(&msk->conn_list, &entry->addr) && + msk->pm.local_addr_used < local_addr_max && + msk->pm.subflows < subflows_max) { + addrs[i] = entry->addr; + + msk->pm.local_addr_used++; + msk->pm.subflows++; + i++; + } + } + rcu_read_unlock(); + + return i; + } + /* If the array is empty, fill in the single * 'IPADDRANY' local address */ @@ -661,7 +708,7 @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) /* connect to the specified remote address, using whatever * local address the routing configuration will pick. */ - nr = fill_local_addresses_vec(msk, addrs); + nr = fill_local_addresses_vec(msk, &remote, addrs);
spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 8d05fb205a31..c93399d11650 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -846,6 +846,14 @@ unsigned int mptcp_pm_get_add_addr_accept_max(const struct mptcp_sock *msk); unsigned int mptcp_pm_get_subflows_max(const struct mptcp_sock *msk); unsigned int mptcp_pm_get_local_addr_max(const struct mptcp_sock *msk);
+static inline bool mptcp_pm_add_addr_c_flag_case(struct mptcp_sock *msk) +{ + return READ_ONCE(msk->pm.remote_deny_join_id0) && + msk->pm.local_addr_used == 0 && + mptcp_pm_get_add_addr_accept_max(msk) == 0 && + msk->pm.subflows < mptcp_pm_get_subflows_max(msk); +} + void mptcp_sockopt_sync(struct mptcp_sock *msk, struct sock *ssk); void mptcp_sockopt_sync_all(struct mptcp_sock *msk);
commit 008385efd05e04d8dff299382df2e8be0f91d8a0 upstream.
The previous commit adds an exception for the C-flag case. The 'mptcp_join.sh' selftest is extended to validate this case.
In this subtest, there is a typical CDN deployment with a client where MPTCP endpoints have been 'automatically' configured:
- the server set net.mptcp.allow_join_initial_addr_port=0
- the client has multiple 'subflow' endpoints, and the default limits: not accepting ADD_ADDRs.
Without the parent patch, the client is not able to establish new subflows using its 'subflow' endpoints. The parent commit fixes that.
The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-2-ad126cc... Signed-off-by: Jakub Kicinski kuba@kernel.org [ Conflicts in mptcp_join.sh, because many different helpers have been modified in newer kernel versions, e.g. in commit 03668c65d153 ("selftests: mptcp: join: rework detailed report"), or commit 985de45923e2 ("selftests: mptcp: centralize stats dumping"), etc. Adaptations have been made to use the old way, similar to what is done just above. ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 06634417e3c4..2cf9bb39b22b 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -1826,6 +1826,16 @@ deny_join_id0_tests() ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow run_tests $ns1 $ns2 10.0.1.1 chk_join_nr "subflow and address allow join id0 2" 1 1 1 + + # default limits, server deny join id 0 + signal + reset_with_allow_join_id0 0 1 + ip netns exec $ns1 ./pm_nl_ctl limits 0 2 + ip netns exec $ns2 ./pm_nl_ctl limits 0 2 + ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal + ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow + ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags subflow + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr "default limits, server deny join id 0" 2 2 2 }
fullmesh_tests()
linux-stable-mirror@lists.linaro.org