Hello Matthieu,
On Wed, Oct 30, 2024 at 03:45:02PM +0100, Matthieu Baerts wrote:
Hi Breno
30 Oct 2024 15:02:45 Breno Leitao leitao@debian.org:
The mptcp_sched_find() function must be called with the RCU read lock held, as it accesses RCU-protected data structures. This requirement was not properly enforced in the mptcp_init_sock() function, leading to a RCU list traversal in a non-reader section error when CONFIG_PROVE_RCU_LIST is enabled.
net/mptcp/sched.c:44 RCU-list traversed in non-reader section!!
Fix it by acquiring the RCU read lock before calling the mptcp_sched_find() function. This ensures that the function is invoked with the necessary RCU protection in place, as it accesses RCU-protected data structures.
Thank you for having looked at that, but there is already a fix:
https://lore.kernel.org/netdev/20241021-net-mptcp-sched-lock-v1-1-637759cf06...
This fix has even been applied in the net tree already:
https://git.kernel.org/netdev/net/c/3deb12c788c3
Did you not get conflicts when rebasing your branch on top of the latest version?
Oh, I was testing on Linus' tree when I got the problem, and net was not merged in Linus' tree yet.
Additionally, the patch breaks down the mptcp_init_sched() call into smaller parts, with the RCU read lock only covering the specific call to mptcp_sched_find(). This helps minimize the critical section, reducing the time during which RCU grace periods are blocked.
I agree with Eric (thank you for the review!): this creates other issues.
Let me comment there.
The mptcp_sched_list_lock is not held in this case, and it is not clear if it is necessary.
It is not needed, the list is not modified, only read with RCU.
Thanks