Two small fixes related to the MPTCP packets scheduler:
- Patch 1: add missing rcu_read_(un)lock(). A fix for >= 6.6.
- Patch 2: remove unneeded lock when listing packets schedulers. A fix for >= 6.10.
And some modifications in the MPTCP selftests:
- Patch 3: a small addition to the MPTCP selftests to cover more code.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Matthieu Baerts (NGI0) (3): mptcp: init: protect sched with rcu_read_lock mptcp: remove unneeded lock when listing scheds selftests: mptcp: list sysctl data
net/mptcp/protocol.c | 2 ++ net/mptcp/sched.c | 2 -- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +++++++++ 3 files changed, 11 insertions(+), 2 deletions(-) --- base-commit: 3b05b9c36ddd01338e1352588f2ec1ea23f97d43 change-id: 20241021-net-mptcp-sched-lock-10dfc75d1e00
Best regards,
Enabling CONFIG_PROVE_RCU_LIST with its dependence CONFIG_RCU_EXPERT creates this splat when an MPTCP socket is created:
============================= WARNING: suspicious RCU usage 6.12.0-rc2+ #11 Not tainted ----------------------------- net/mptcp/sched.c:44 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 no locks held by mptcp_connect/176.
stack backtrace: CPU: 0 UID: 0 PID: 176 Comm: mptcp_connect Not tainted 6.12.0-rc2+ #11 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) mptcp_sched_find (net/mptcp/sched.c:44 (discriminator 7)) mptcp_init_sock (net/mptcp/protocol.c:2867 (discriminator 1)) ? sock_init_data_uid (arch/x86/include/asm/atomic.h:28) inet_create.part.0.constprop.0 (net/ipv4/af_inet.c:386) ? __sock_create (include/linux/rcupdate.h:347 (discriminator 1)) __sock_create (net/socket.c:1576) __sys_socket (net/socket.c:1671) ? __pfx___sys_socket (net/socket.c:1712) ? do_user_addr_fault (arch/x86/mm/fault.c:1419 (discriminator 1)) __x64_sys_socket (net/socket.c:1728) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
That's because when the socket is initialised, rcu_read_lock() is not used despite the explicit comment written above the declaration of mptcp_sched_find() in sched.c. Adding the missing lock/unlock avoids the warning.
Fixes: 1730b2b2c5a5 ("mptcp: add sched in mptcp_sock") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/523 Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 6d0e201c3eb26aa6ca0ff27e5a65cb6f911012f2..d263091659e076587bc3406dfdcb4409adb3247e 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2864,8 +2864,10 @@ static int mptcp_init_sock(struct sock *sk) if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net)) return -ENOMEM;
+ rcu_read_lock(); ret = mptcp_init_sched(mptcp_sk(sk), mptcp_sched_find(mptcp_get_scheduler(net))); + rcu_read_unlock(); if (ret) return ret;
On Mon, Oct 21, 2024 at 12:25:26PM +0200, Matthieu Baerts (NGI0) wrote:
Enabling CONFIG_PROVE_RCU_LIST with its dependence CONFIG_RCU_EXPERT creates this splat when an MPTCP socket is created:
============================= WARNING: suspicious RCU usage 6.12.0-rc2+ #11 Not tainted
net/mptcp/sched.c:44 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 no locks held by mptcp_connect/176.
stack backtrace: CPU: 0 UID: 0 PID: 176 Comm: mptcp_connect Not tainted 6.12.0-rc2+ #11 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace:
<TASK> dump_stack_lvl (lib/dump_stack.c:123) lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) mptcp_sched_find (net/mptcp/sched.c:44 (discriminator 7)) mptcp_init_sock (net/mptcp/protocol.c:2867 (discriminator 1)) ? sock_init_data_uid (arch/x86/include/asm/atomic.h:28) inet_create.part.0.constprop.0 (net/ipv4/af_inet.c:386) ? __sock_create (include/linux/rcupdate.h:347 (discriminator 1)) __sock_create (net/socket.c:1576) __sys_socket (net/socket.c:1671) ? __pfx___sys_socket (net/socket.c:1712) ? do_user_addr_fault (arch/x86/mm/fault.c:1419 (discriminator 1)) __x64_sys_socket (net/socket.c:1728) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
That's because when the socket is initialised, rcu_read_lock() is not used despite the explicit comment written above the declaration of mptcp_sched_find() in sched.c. Adding the missing lock/unlock avoids the warning.
Fixes: 1730b2b2c5a5 ("mptcp: add sched in mptcp_sock") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/523 Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
Reviewed-by: Simon Horman horms@kernel.org
mptcp_get_available_schedulers() needs to iterate over the schedulers' list only to read the names: it doesn't modify anything there.
In this case, it is enough to hold the RCU read lock, no need to combine this with the associated spin lock.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/sched.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c index 78ed508ebc1b8dd9f0e020cca1bdd86f24f0afeb..df7dbcfa3b71370cc4d7e4e4f16cc1e41a50dddf 100644 --- a/net/mptcp/sched.c +++ b/net/mptcp/sched.c @@ -60,7 +60,6 @@ void mptcp_get_available_schedulers(char *buf, size_t maxlen) size_t offs = 0;
rcu_read_lock(); - spin_lock(&mptcp_sched_list_lock); list_for_each_entry_rcu(sched, &mptcp_sched_list, list) { offs += snprintf(buf + offs, maxlen - offs, "%s%s", @@ -69,7 +68,6 @@ void mptcp_get_available_schedulers(char *buf, size_t maxlen) if (WARN_ON_ONCE(offs >= maxlen)) break; } - spin_unlock(&mptcp_sched_list_lock); rcu_read_unlock(); }
On Mon, Oct 21, 2024 at 12:25:27PM +0200, Matthieu Baerts (NGI0) wrote:
mptcp_get_available_schedulers() needs to iterate over the schedulers' list only to read the names: it doesn't modify anything there.
In this case, it is enough to hold the RCU read lock, no need to combine this with the associated spin lock.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
I do wonder if it would be more appropriate to route this via net-next (without a fixes tag) rather than via net. But either way this looks good to me.
Reviewed-by: Simon Horman horms@kernel.org
...
Hi Simon,
Thank you for the reviews!
On 23/10/2024 14:21, Simon Horman wrote:
On Mon, Oct 21, 2024 at 12:25:27PM +0200, Matthieu Baerts (NGI0) wrote:
mptcp_get_available_schedulers() needs to iterate over the schedulers' list only to read the names: it doesn't modify anything there.
In this case, it is enough to hold the RCU read lock, no need to combine this with the associated spin lock.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
I do wonder if it would be more appropriate to route this via net-next (without a fixes tag) rather than via net. But either way this looks good to me.
Good point. On one hand, I marked it as a fix, because when working on the patch 1/3, we noticed these spin_(un)lock() were not supposed to be there in the first place. On the other hand, even it's fixing a small performance issue, it is not fixing a regression.
I think it is easier to route this via -net, but I'm fine if it is applied in net-next.
Cheers, Matt
On Wed, Oct 23, 2024 at 04:13:36PM +0200, Matthieu Baerts wrote:
Hi Simon,
Thank you for the reviews!
On 23/10/2024 14:21, Simon Horman wrote:
On Mon, Oct 21, 2024 at 12:25:27PM +0200, Matthieu Baerts (NGI0) wrote:
mptcp_get_available_schedulers() needs to iterate over the schedulers' list only to read the names: it doesn't modify anything there.
In this case, it is enough to hold the RCU read lock, no need to combine this with the associated spin lock.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
I do wonder if it would be more appropriate to route this via net-next (without a fixes tag) rather than via net. But either way this looks good to me.
Good point. On one hand, I marked it as a fix, because when working on the patch 1/3, we noticed these spin_(un)lock() were not supposed to be there in the first place. On the other hand, even it's fixing a small performance issue, it is not fixing a regression.
I think it is easier to route this via -net, but I'm fine if it is applied in net-next.
Understood. FTR, I don't feel strongly about this either way.
On Wed, 23 Oct 2024 16:13:36 +0200 Matthieu Baerts wrote:
On 23/10/2024 14:21, Simon Horman wrote:
On Mon, Oct 21, 2024 at 12:25:27PM +0200, Matthieu Baerts (NGI0) wrote:
mptcp_get_available_schedulers() needs to iterate over the schedulers' list only to read the names: it doesn't modify anything there.
In this case, it is enough to hold the RCU read lock, no need to combine this with the associated spin lock.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
I do wonder if it would be more appropriate to route this via net-next (without a fixes tag) rather than via net. But either way this looks good to me.
Good point. On one hand, I marked it as a fix, because when working on the patch 1/3, we noticed these spin_(un)lock() were not supposed to be there in the first place. On the other hand, even it's fixing a small performance issue, it is not fixing a regression.
I think it is easier to route this via -net, but I'm fine if it is applied in net-next.
I agree with Simon's initial response. Let's not blur the lines. Please re-queue for net-next, I'll apply the rest.
BTW thanks a lot for proactively fixing the CONFIG_PROVE_RCU_LIST splats!
Listing all the values linked to the MPTCP sysctl knobs was not exercised in MPTCP test suite.
Let's do that to avoid any regressions, but also to have a kernel with a debug kconfig verifying more assumptions. For the moment, we are not interested by the output, only to avoid crashes and warnings.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index 57325d57e4c6e3653019db2de09620d692143683..b48b4e56826a9cfdb3501242b707ae2ebe29b220 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -259,6 +259,15 @@ check_mptcp_disabled() mptcp_lib_ns_init disabled_ns
print_larger_title "New MPTCP socket can be blocked via sysctl" + + # mainly to cover more code + if ! ip netns exec ${disabled_ns} sysctl net.mptcp >/dev/null; then + mptcp_lib_pr_fail "not able to list net.mptcp sysctl knobs" + mptcp_lib_result_fail "not able to list net.mptcp sysctl knobs" + ret=${KSFT_FAIL} + return 1 + fi + # net.mptcp.enabled should be enabled by default if [ "$(ip netns exec ${disabled_ns} sysctl net.mptcp.enabled | awk '{ print $3 }')" -ne 1 ]; then mptcp_lib_pr_fail "net.mptcp.enabled sysctl is not 1 by default"
On Mon, Oct 21, 2024 at 12:25:28PM +0200, Matthieu Baerts (NGI0) wrote:
Listing all the values linked to the MPTCP sysctl knobs was not exercised in MPTCP test suite.
Let's do that to avoid any regressions, but also to have a kernel with a debug kconfig verifying more assumptions. For the moment, we are not interested by the output, only to avoid crashes and warnings.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
I am assuming that we are ok with expanding test coverage via net, which FWIIW, does seem reasonable to me.
Reviewed-by: Simon Horman horms@kernel.org
...
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Mon, 21 Oct 2024 12:25:25 +0200 you wrote:
Two small fixes related to the MPTCP packets scheduler:
Patch 1: add missing rcu_read_(un)lock(). A fix for >= 6.6.
Patch 2: remove unneeded lock when listing packets schedulers. A fix for >= 6.10.
[...]
Here is the summary with links: - [net,1/3] mptcp: init: protect sched with rcu_read_lock https://git.kernel.org/netdev/net/c/3deb12c788c3 - [net,2/3] mptcp: remove unneeded lock when listing scheds (no matching commit) - [net,3/3] selftests: mptcp: list sysctl data https://git.kernel.org/netdev/net/c/5513dc1d8fec
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org