The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7b40066c97ec66a44e388f82fcf694987451768f Mon Sep 17 00:00:00 2001
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com Date: Thu, 5 Aug 2021 15:29:54 -0400 Subject: [PATCH] tracepoint: Use rcu get state and cond sync for static call updates
State transitions from 1->0->1 and N->2->1 callbacks require RCU synchronization. Rather than performing the RCU synchronization every time the state change occurs, which is quite slow when many tracepoints are registered in batch, instead keep a snapshot of the RCU state on the most recent transitions which belong to a chain, and conditionally wait for a grace period on the last transition of the chain if one g.p. has not elapsed since the last snapshot.
This applies to both RCU and SRCU.
This brings the performance regression caused by commit 231264d6927f ("Fix: tracepoint: static call function vs data state mismatch") back to what it was originally.
Before this commit:
# trace-cmd start -e all # time trace-cmd start -p nop
real 0m10.593s user 0m0.017s sys 0m0.259s
After this commit:
# trace-cmd start -e all # time trace-cmd start -p nop
real 0m0.878s user 0m0.000s sys 0m0.103s
Link: https://lkml.kernel.org/r/20210805192954.30688-1-mathieu.desnoyers@efficios.... Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba....
Cc: stable@vger.kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Andrew Morton akpm@linux-foundation.org Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Stefan Metzmacher metze@samba.org Fixes: 231264d6927f ("Fix: tracepoint: static call function vs data state mismatch") Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Reviewed-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 8d772bd6894d..efd14c79fab4 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -28,6 +28,44 @@ extern tracepoint_ptr_t __stop___tracepoints_ptrs[]; DEFINE_SRCU(tracepoint_srcu); EXPORT_SYMBOL_GPL(tracepoint_srcu);
+enum tp_transition_sync { + TP_TRANSITION_SYNC_1_0_1, + TP_TRANSITION_SYNC_N_2_1, + + _NR_TP_TRANSITION_SYNC, +}; + +struct tp_transition_snapshot { + unsigned long rcu; + unsigned long srcu; + bool ongoing; +}; + +/* Protected by tracepoints_mutex */ +static struct tp_transition_snapshot tp_transition_snapshot[_NR_TP_TRANSITION_SYNC]; + +static void tp_rcu_get_state(enum tp_transition_sync sync) +{ + struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; + + /* Keep the latest get_state snapshot. */ + snapshot->rcu = get_state_synchronize_rcu(); + snapshot->srcu = start_poll_synchronize_srcu(&tracepoint_srcu); + snapshot->ongoing = true; +} + +static void tp_rcu_cond_sync(enum tp_transition_sync sync) +{ + struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; + + if (!snapshot->ongoing) + return; + cond_synchronize_rcu(snapshot->rcu); + if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu)) + synchronize_srcu(&tracepoint_srcu); + snapshot->ongoing = false; +} + /* Set to 1 to enable tracepoint debug output */ static const int tracepoint_debug;
@@ -311,6 +349,11 @@ static int tracepoint_add_func(struct tracepoint *tp, */ switch (nr_func_state(tp_funcs)) { case TP_FUNC_1: /* 0->1 */ + /* + * Make sure new static func never uses old data after a + * 1->0->1 transition sequence. + */ + tp_rcu_cond_sync(TP_TRANSITION_SYNC_1_0_1); /* Set static call to first function */ tracepoint_update_call(tp, tp_funcs); /* Both iterator and static call handle NULL tp->funcs */ @@ -325,10 +368,15 @@ static int tracepoint_add_func(struct tracepoint *tp, * Requires ordering between RCU assign/dereference and * static call update/call. */ - rcu_assign_pointer(tp->funcs, tp_funcs); - break; + fallthrough; case TP_FUNC_N: /* N->N+1 (N>1) */ rcu_assign_pointer(tp->funcs, tp_funcs); + /* + * Make sure static func never uses incorrect data after a + * N->...->2->1 (N>1) transition sequence. + */ + if (tp_funcs[0].data != old[0].data) + tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1); break; default: WARN_ON_ONCE(1); @@ -372,24 +420,23 @@ static int tracepoint_remove_func(struct tracepoint *tp, /* Both iterator and static call handle NULL tp->funcs */ rcu_assign_pointer(tp->funcs, NULL); /* - * Make sure new func never uses old data after a 1->0->1 - * transition sequence. - * Considering that transition 0->1 is the common case - * and don't have rcu-sync, issue rcu-sync after - * transition 1->0 to break that sequence by waiting for - * readers to be quiescent. + * Make sure new static func never uses old data after a + * 1->0->1 transition sequence. */ - tracepoint_synchronize_unregister(); + tp_rcu_get_state(TP_TRANSITION_SYNC_1_0_1); break; case TP_FUNC_1: /* 2->1 */ rcu_assign_pointer(tp->funcs, tp_funcs); /* - * On 2->1 transition, RCU sync is needed before setting - * static call to first callback, because the observer - * may have loaded any prior tp->funcs after the last one - * associated with an rcu-sync. + * Make sure static func never uses incorrect data after a + * N->...->2->1 (N>2) transition sequence. If the first + * element's data has changed, then force the synchronization + * to prevent current readers that have loaded the old data + * from calling the new function. */ - tracepoint_synchronize_unregister(); + if (tp_funcs[0].data != old[0].data) + tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1); + tp_rcu_cond_sync(TP_TRANSITION_SYNC_N_2_1); /* Set static call to first function */ tracepoint_update_call(tp, tp_funcs); break; @@ -397,6 +444,12 @@ static int tracepoint_remove_func(struct tracepoint *tp, fallthrough; case TP_FUNC_N: rcu_assign_pointer(tp->funcs, tp_funcs); + /* + * Make sure static func never uses incorrect data after a + * N->...->2->1 (N>2) transition sequence. + */ + if (tp_funcs[0].data != old[0].data) + tp_rcu_get_state(TP_TRANSITION_SYNC_N_2_1); break; default: WARN_ON_ONCE(1);
On Mon, 09 Aug 2021 11:20:32 +0200 gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
Crap.
Mathieu, seems that the "slow down 10x" patch was able to be backported to 5.10, where as this patch was not. Reason being is that start_poll_synchronize_rcu() was added in 5.13.
This means because the other patch was able to be backported without this patch, we just slowed down disabling all events! Which may be a surprise to some people not expecting a stable branch from having such a big performance regression :-/
-- Steve
On Thu, 19 Aug 2021 17:09:33 -0400 Steven Rostedt rostedt@goodmis.org wrote:
Mathieu, seems that the "slow down 10x" patch was able to be backported to 5.10, where as this patch was not. Reason being is that start_poll_synchronize_rcu() was added in 5.13.
I can get this to work if I backport the following RCU patches:
29d2bb94a8a126ce80ffbb433b648b32fdea524e srcu: Provide internal interface to start a Tree SRCU grace period
5358c9fa54b09b5d3d7811b033aa0838c1bbaaf2 srcu: Provide polling interfaces for Tree SRCU grace periods
1a893c711a600ab57526619b56e6f6b7be00956e srcu: Provide internal interface to start a Tiny SRCU grace period
8b5bd67cf6422b63ee100d76d8de8960ca2df7f0 srcu: Provide polling interfaces for Tiny SRCU grace periods
The first three can be cherry-picked without issue. The last one has a small conflict, of:
include/linux/srcutiny.h.rej: --- include/linux/srcutiny.h +++ include/linux/srcutiny.h @@ -16,6 +16,7 @@ struct srcu_struct { short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */ unsigned short srcu_idx; /* Current reader array element in bit 0x2. */ + unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */ u8 srcu_gp_running; /* GP workqueue running? */ u8 srcu_gp_waiting; /* GP waiting for readers? */ struct swait_queue_head srcu_wq;
Which I just added that line, and everything worked.
Paul, do you have any issues with these four patches getting backported?
Greg, Are you OK with them too?
Once those are backported, this patch can be backported as well, and everything should work. This patch really needs to stay with:
231264d6927f6740af36855a622d0e240be9d94c tracepoint: Fix static call function vs data state mismatch
Otherwise I would say to revert it if this one can't be backported with it.
-- Steve
Greg, Paul,
Any comment on this?
-- Steve
On Thu, 19 Aug 2021 20:42:04 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Thu, 19 Aug 2021 17:09:33 -0400 Steven Rostedt rostedt@goodmis.org wrote:
Mathieu, seems that the "slow down 10x" patch was able to be backported to 5.10, where as this patch was not. Reason being is that start_poll_synchronize_rcu() was added in 5.13.
I can get this to work if I backport the following RCU patches:
29d2bb94a8a126ce80ffbb433b648b32fdea524e srcu: Provide internal interface to start a Tree SRCU grace period
5358c9fa54b09b5d3d7811b033aa0838c1bbaaf2 srcu: Provide polling interfaces for Tree SRCU grace periods
1a893c711a600ab57526619b56e6f6b7be00956e srcu: Provide internal interface to start a Tiny SRCU grace period
8b5bd67cf6422b63ee100d76d8de8960ca2df7f0 srcu: Provide polling interfaces for Tiny SRCU grace periods
The first three can be cherry-picked without issue. The last one has a small conflict, of:
include/linux/srcutiny.h.rej: --- include/linux/srcutiny.h +++ include/linux/srcutiny.h @@ -16,6 +16,7 @@ struct srcu_struct { short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */ unsigned short srcu_idx; /* Current reader array element in bit 0x2. */
unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */ u8 srcu_gp_running; /* GP workqueue running? */ u8 srcu_gp_waiting; /* GP waiting for readers? */ struct swait_queue_head srcu_wq;
Which I just added that line, and everything worked.
Paul, do you have any issues with these four patches getting backported?
Greg, Are you OK with them too?
Once those are backported, this patch can be backported as well, and everything should work. This patch really needs to stay with:
231264d6927f6740af36855a622d0e240be9d94c tracepoint: Fix static call function vs data state mismatch
Otherwise I would say to revert it if this one can't be backported with it.
-- Steve
----- On Aug 19, 2021, at 8:42 PM, rostedt rostedt@goodmis.org wrote:
On Thu, 19 Aug 2021 17:09:33 -0400 Steven Rostedt rostedt@goodmis.org wrote:
Mathieu, seems that the "slow down 10x" patch was able to be backported to 5.10, where as this patch was not. Reason being is that start_poll_synchronize_rcu() was added in 5.13.
I can get this to work if I backport the following RCU patches:
29d2bb94a8a126ce80ffbb433b648b32fdea524e srcu: Provide internal interface to start a Tree SRCU grace period
5358c9fa54b09b5d3d7811b033aa0838c1bbaaf2 srcu: Provide polling interfaces for Tree SRCU grace periods
1a893c711a600ab57526619b56e6f6b7be00956e srcu: Provide internal interface to start a Tiny SRCU grace period
8b5bd67cf6422b63ee100d76d8de8960ca2df7f0 srcu: Provide polling interfaces for Tiny SRCU grace periods
The first three can be cherry-picked without issue. The last one has a small conflict, of:
include/linux/srcutiny.h.rej: --- include/linux/srcutiny.h +++ include/linux/srcutiny.h @@ -16,6 +16,7 @@ struct srcu_struct { short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */ unsigned short srcu_idx; /* Current reader array element in bit 0x2. */
unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */ u8 srcu_gp_running; /* GP workqueue running? */ u8 srcu_gp_waiting; /* GP waiting for readers? */ struct swait_queue_head srcu_wq;
Which I just added that line, and everything worked.
Paul, do you have any issues with these four patches getting backported?
Greg, Are you OK with them too?
Once those are backported, this patch can be backported as well, and everything should work. This patch really needs to stay with:
231264d6927f6740af36855a622d0e240be9d94c tracepoint: Fix static call function vs data state mismatch
Otherwise I would say to revert it if this one can't be backported with it.
In my opinion backporting those patches to stable is important, because the tracepoint fix which causes the slowdown is really fixing a correctness issue: it ensures that the kernel does not crash in specific race scenarios.
Indeed the slowdown associated with that patch is quite big on typical use-cases of tracepoint registration/unregistration, so the second fixing the speed regression is also important, and that last fix requires the SRCU backports.
Thanks,
Mathieu
On Thu, Aug 19, 2021 at 08:42:04PM -0400, Steven Rostedt wrote:
On Thu, 19 Aug 2021 17:09:33 -0400 Steven Rostedt rostedt@goodmis.org wrote:
Mathieu, seems that the "slow down 10x" patch was able to be backported to 5.10, where as this patch was not. Reason being is that start_poll_synchronize_rcu() was added in 5.13.
I can get this to work if I backport the following RCU patches:
29d2bb94a8a126ce80ffbb433b648b32fdea524e srcu: Provide internal interface to start a Tree SRCU grace period
5358c9fa54b09b5d3d7811b033aa0838c1bbaaf2 srcu: Provide polling interfaces for Tree SRCU grace periods
1a893c711a600ab57526619b56e6f6b7be00956e srcu: Provide internal interface to start a Tiny SRCU grace period
8b5bd67cf6422b63ee100d76d8de8960ca2df7f0 srcu: Provide polling interfaces for Tiny SRCU grace periods
The first three can be cherry-picked without issue. The last one has a small conflict, of:
include/linux/srcutiny.h.rej: --- include/linux/srcutiny.h +++ include/linux/srcutiny.h @@ -16,6 +16,7 @@ struct srcu_struct { short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */ unsigned short srcu_idx; /* Current reader array element in bit 0x2. */
unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */ u8 srcu_gp_running; /* GP workqueue running? */ u8 srcu_gp_waiting; /* GP waiting for readers? */ struct swait_queue_head srcu_wq;
Which I just added that line, and everything worked.
Paul, do you have any issues with these four patches getting backported?
I believe that you also need to backport 74612a07b83f ("srcu: Make Tiny SRCU use multi-bit grace-period counter"). Otherwise, Tiny SRCU polling grace periods will be at best working by accident.
This will also make your small conflict go away.
Thanx, Paul
Greg, Are you OK with them too?
Once those are backported, this patch can be backported as well, and everything should work. This patch really needs to stay with:
231264d6927f6740af36855a622d0e240be9d94c tracepoint: Fix static call function vs data state mismatch
Otherwise I would say to revert it if this one can't be backported with it.
-- Steve
On Fri, 20 Aug 2021 10:57:38 -0700 "Paul E. McKenney" paulmck@kernel.org wrote:
Paul, do you have any issues with these four patches getting backported?
I believe that you also need to backport 74612a07b83f ("srcu: Make Tiny SRCU use multi-bit grace-period counter"). Otherwise, Tiny SRCU polling grace periods will be at best working by accident.
This will also make your small conflict go away.
Thanks for looking at this Paul. Yes, that commit helps where I don't have to do any fixes.
Greg,
Can you please backport the following commits to 5.10, and then reapply this patch?
29d2bb94a8a1 ("srcu: Provide internal interface to start a Tree SRCU grace period") 5358c9fa54b0 ("srcu: Provide polling interfaces for Tree SRCU grace periods") 1a893c711a60 ("srcu: Provide internal interface to start a Tiny SRCU grace period") 74612a07b83f ("srcu: Make Tiny SRCU use multi-bit grace-period counter") 8b5bd67cf642 ("srcu: Provide polling interfaces for Tiny SRCU grace periods")
Thanks,
-- Steve
On Fri, Aug 20, 2021 at 06:13:28PM -0400, Steven Rostedt wrote:
On Fri, 20 Aug 2021 10:57:38 -0700 "Paul E. McKenney" paulmck@kernel.org wrote:
Paul, do you have any issues with these four patches getting backported?
I believe that you also need to backport 74612a07b83f ("srcu: Make Tiny SRCU use multi-bit grace-period counter"). Otherwise, Tiny SRCU polling grace periods will be at best working by accident.
This will also make your small conflict go away.
Thanks for looking at this Paul. Yes, that commit helps where I don't have to do any fixes.
Greg,
Can you please backport the following commits to 5.10, and then reapply this patch?
29d2bb94a8a1 ("srcu: Provide internal interface to start a Tree SRCU grace period") 5358c9fa54b0 ("srcu: Provide polling interfaces for Tree SRCU grace periods") 1a893c711a60 ("srcu: Provide internal interface to start a Tiny SRCU grace period") 74612a07b83f ("srcu: Make Tiny SRCU use multi-bit grace-period counter") 8b5bd67cf642 ("srcu: Provide polling interfaces for Tiny SRCU grace periods")
Now done, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org