From: "Joel Fernandes (Google)" joel@joelfernandes.org
This is a posting of v9 preempt/irq tracepoint clean up series rebased onto v4.18-rc2. No changes in the series, just a rebase + repost.
All patches have a Reviewed-by tags now from reviewers. This series has been well tested and is a simplification/refactoring of existing code, along with giving a speed-up for tracepoints using the rcu-idle API. With this our users will find it easier to use tools depending on existing preempt tracepoints since it simplifies the configuration for them.
Future enhancements/fixes I am developing for preempt-off tracer will depend on these patches, so I suggest prioritizing these well reviewed and tested patches for that reason as well.
Introduction to the series: The preempt/irq tracepoints exist but not everything in the kernel is using it whenever they need to be notified that a preempt disable/enable or an irq disable/enable has occurred. This makes things not work simultaneously (for example, only either lockdep or irqsoff trace-events can be used at a time).
This is particularly painful to deal with, since turning on lockdep breaks tracers that install probes on IRQ events, such as the BCC atomic critical section tracer [1]. This constraint also makes it impossible to use synthetic events to trace irqsoff sections with lockdep simulataneously turned on.
This series solves that, and also results in a nice clean up of relevant parts of the kernel. Several ifdefs are simpler, and the design is more unified and better. Also as a result of this, we also speeded performance all rcuidle tracepoints since their handling is simpler.
[1] https://github.com/iovisor/bcc/blob/master/tools/criticalstat_example.txt
v8->v9: - Small style changes to tracepoint code (Mathieu) - Minor style fix to use PTR_ERR_OR_ZERO (0-day bot) - Minor fix to test_atomic_sections to use unsigned long. - Added Namhyung's, Mathieu's Reviewed-by to some patches. - Added Acks from Matsami
v7->v8: - Refactored irqsoff tracer probe defines (Namhyung)
v6->v7: - Added a module to simulate an atomic section, a kselftest to load and and trigger it which verifies the preempt-tracer and this series.
- Fixed a new warning after I rebased in early boot, this is because early_boot_irqs_disabled was set too early, I moved it after the lockdep initialization.
- added back the softirq fix since it appears it wasn't picked up.
- Ran Ingo's locking API selftest suite which are passing with this series.
- Mathieu suggested ifdef'ing the tracepoint_synchronize_unregister function incase tracepoints aren't enabled, did that.
Joel Fernandes (Google) (6): srcu: Add notrace variant of srcu_dereference trace/irqsoff: Split reset into separate functions tracepoint: Make rcuidle tracepoint callers use SRCU tracing: Centralize preemptirq tracepoints and unify their usage lib: Add module to simulate atomic sections for testing preemptoff tracers kselftests: Add tests for the preemptoff and irqsoff tracers
Paul McKenney (1): srcu: Add notrace variants of srcu_read_{lock,unlock}
include/linux/ftrace.h | 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/linux/srcu.h | 22 ++ include/linux/tracepoint.h | 49 +++- include/trace/events/preemptirq.h | 23 +- init/main.c | 5 +- kernel/locking/lockdep.c | 35 +-- kernel/sched/core.c | 2 +- kernel/trace/Kconfig | 22 +- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 253 ++++++------------ kernel/trace/trace_preemptirq.c | 71 +++++ kernel/tracepoint.c | 16 +- lib/Kconfig.debug | 8 + lib/Makefile | 1 + lib/test_atomic_sections.c | 77 ++++++ tools/testing/selftests/ftrace/config | 3 + .../test.d/preemptirq/irqsoff_tracer.tc | 73 +++++ 20 files changed, 453 insertions(+), 241 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c create mode 100644 lib/test_atomic_sections.c create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
From: Paul McKenney paulmck@linux.vnet.ibm.com
This is needed for a future tracepoint patch that uses srcu, and to make sure it doesn't call into lockdep.
tracepoint code already calls notrace variants for rcu_read_lock_sched so this patch does the same for srcu which will be used in a later patch. Keeps it consistent with rcu-sched.
[Joel: Added commit message] Reviewed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Paul McKenney paulmck@linux.vnet.ibm.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- include/linux/srcu.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 91494d7e8e41..3e72a291c401 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp) return retval; }
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */ +static inline notrace int +srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp) +{ + int retval; + + retval = __srcu_read_lock(sp); + return retval; +} + /** * srcu_read_unlock - unregister a old reader from an SRCU-protected structure. * @sp: srcu_struct in which to unregister the old reader. @@ -209,6 +219,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx) __srcu_read_unlock(sp, idx); }
+/* Used by tracing, cannot be traced and cannot call lockdep. */ +static inline notrace void +srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp) +{ + __srcu_read_unlock(sp, idx); +} + /** * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock *
From: "Joel Fernandes (Google)" joel@joelfernandes.org
In the last patch in this series, we are making lockdep register hooks onto the irq_{disable,enable} tracepoints. These tracepoints use the _rcuidle tracepoint variant. In this series we switch the _rcuidle tracepoint callers to use SRCU instead of sched-RCU. Inorder to dereference the pointer to the probe functions, we could call srcu_dereference, however this API will call back into lockdep to check if the lock is held *before* the lockdep probe hooks have a chance to run and annotate the IRQ enabled/disabled state.
For this reason we need a notrace variant of srcu_dereference since otherwise we get lockdep splats. This patch adds the needed srcu_dereference_notrace variant.
Reviewed-by: Paul E. McKenney paulmck@linux.vnet.ibm.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- include/linux/srcu.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 3e72a291c401..67135d4a8a30 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -169,6 +169,11 @@ static inline int srcu_read_lock_held(const struct srcu_struct *sp) */ #define srcu_dereference(p, sp) srcu_dereference_check((p), (sp), 0)
+/** + * srcu_dereference_notrace - no tracing and no lockdep calls from here + */ +#define srcu_dereference_notrace(p, sp) srcu_dereference_check((p), (sp), 1) + /** * srcu_read_lock - register a new reader for an SRCU-protected structure. * @sp: srcu_struct in which to register the new reader.
From: "Joel Fernandes (Google)" joel@joelfernandes.org
Split reset functions into seperate functions in preparation of future patches that need to do tracer specific reset.
Reviewed-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- kernel/trace/trace_irqsoff.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 03ecb4465ee4..f8daa754cce2 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -634,7 +634,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr) return 0; }
-static void irqsoff_tracer_reset(struct trace_array *tr) +static void __irqsoff_tracer_reset(struct trace_array *tr) { int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; @@ -665,6 +665,12 @@ static int irqsoff_tracer_init(struct trace_array *tr)
return __irqsoff_tracer_init(tr); } + +static void irqsoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer irqsoff_tracer __read_mostly = { .name = "irqsoff", @@ -697,11 +703,16 @@ static int preemptoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); }
+static void preemptoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer preemptoff_tracer __read_mostly = { .name = "preemptoff", .init = preemptoff_tracer_init, - .reset = irqsoff_tracer_reset, + .reset = preemptoff_tracer_reset, .start = irqsoff_tracer_start, .stop = irqsoff_tracer_stop, .print_max = true, @@ -731,11 +742,16 @@ static int preemptirqsoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); }
+static void preemptirqsoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer preemptirqsoff_tracer __read_mostly = { .name = "preemptirqsoff", .init = preemptirqsoff_tracer_init, - .reset = irqsoff_tracer_reset, + .reset = preemptirqsoff_tracer_reset, .start = irqsoff_tracer_start, .stop = irqsoff_tracer_stop, .print_max = true,
From: "Joel Fernandes (Google)" joel@joelfernandes.org
In recent tests with IRQ on/off tracepoints, a large performance overhead ~10% is noticed when running hackbench. This is root caused to calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the tracepoint code. Following a long discussion on the list [1] about this, we concluded that srcu is a better alternative for use during rcu idle. Although it does involve extra barriers, its lighter than the sched-rcu version which has to do additional RCU calls to notify RCU idle about entry into RCU sections.
In this patch, we change the underlying implementation of the trace_*_rcuidle API to use SRCU. This has shown to improve performance alot for the high frequency irq enable/disable tracepoints.
Test: Tested idle and preempt/irq tracepoints.
Here are some performance numbers:
With a run of the following 30 times on a single core x86 Qemu instance with 1GB memory: hackbench -g 4 -f 2 -l 3000
Completion times in seconds. CONFIG_PROVE_LOCKING=y.
No patches (without this series) Mean: 3.048 Median: 3.025 Std Dev: 0.064
With Lockdep using irq tracepoints with RCU implementation: Mean: 3.451 (-11.66 %) Median: 3.447 (-12.22%) Std Dev: 0.049
With Lockdep using irq tracepoints with SRCU implementation (this series): Mean: 3.020 (I would consider the improvement against the "without this series" case as just noise). Median: 3.013 Std Dev: 0.033
[1] https://patchwork.kernel.org/patch/10344297/
Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- include/linux/tracepoint.h | 49 +++++++++++++++++++++++++++++++------- kernel/tracepoint.c | 16 ++++++++++++- 2 files changed, 56 insertions(+), 9 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 19a690b559ca..beeb01e147f8 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -15,6 +15,7 @@ */
#include <linux/smp.h> +#include <linux/srcu.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/cpumask.h> @@ -33,6 +34,8 @@ struct trace_eval_map {
#define TRACEPOINT_DEFAULT_PRIO 10
+extern struct srcu_struct tracepoint_srcu; + extern int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); extern int @@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) * probe unregistration and the end of module exit to make sure there is no * caller executing a probe when it is freed. */ +#ifdef CONFIG_TRACEPOINTS static inline void tracepoint_synchronize_unregister(void) { + synchronize_srcu(&tracepoint_srcu); synchronize_sched(); } +#else +static inline void tracepoint_synchronize_unregister(void) +{ } +#endif
#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS extern int syscall_regfunc(void); @@ -129,18 +138,38 @@ extern void syscall_unregfunc(void); * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto". */ -#define __DO_TRACE(tp, proto, args, cond, rcucheck) \ +#define __DO_TRACE(tp, proto, args, cond, rcuidle) \ do { \ struct tracepoint_func *it_func_ptr; \ void *it_func; \ void *__data; \ + int __maybe_unused idx = 0; \ \ if (!(cond)) \ return; \ - if (rcucheck) \ - rcu_irq_enter_irqson(); \ - rcu_read_lock_sched_notrace(); \ - it_func_ptr = rcu_dereference_sched((tp)->funcs); \ + \ + /* \ + * For rcuidle callers, use srcu since sched-rcu \ + * doesn't work from the idle path. \ + */ \ + if (rcuidle) { \ + if (in_nmi()) { \ + WARN_ON_ONCE(1); \ + return; /* no srcu from nmi */ \ + } \ + \ + idx = srcu_read_lock_notrace(&tracepoint_srcu); \ + it_func_ptr = \ + srcu_dereference_notrace((tp)->funcs, \ + &tracepoint_srcu); \ + /* To keep it consistent with !rcuidle path */ \ + preempt_disable_notrace(); \ + } else { \ + rcu_read_lock_sched_notrace(); \ + it_func_ptr = \ + rcu_dereference_sched((tp)->funcs); \ + } \ + \ if (it_func_ptr) { \ do { \ it_func = (it_func_ptr)->func; \ @@ -148,9 +177,13 @@ extern void syscall_unregfunc(void); ((void(*)(proto))(it_func))(args); \ } while ((++it_func_ptr)->func); \ } \ - rcu_read_unlock_sched_notrace(); \ - if (rcucheck) \ - rcu_irq_exit_irqson(); \ + \ + if (rcuidle) { \ + preempt_enable_notrace(); \ + srcu_read_unlock_notrace(&tracepoint_srcu, idx);\ + } else { \ + rcu_read_unlock_sched_notrace(); \ + } \ } while (0)
#ifndef MODULE diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 6dc6356c3327..955148d91b74 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -31,6 +31,9 @@ extern struct tracepoint * const __start___tracepoints_ptrs[]; extern struct tracepoint * const __stop___tracepoints_ptrs[];
+DEFINE_SRCU(tracepoint_srcu); +EXPORT_SYMBOL_GPL(tracepoint_srcu); + /* Set to 1 to enable tracepoint debug output */ static const int tracepoint_debug;
@@ -67,16 +70,27 @@ static inline void *allocate_probes(int count) return p == NULL ? NULL : p->probes; }
-static void rcu_free_old_probes(struct rcu_head *head) +static void srcu_free_old_probes(struct rcu_head *head) { kfree(container_of(head, struct tp_probes, rcu)); }
+static void rcu_free_old_probes(struct rcu_head *head) +{ + call_srcu(&tracepoint_srcu, head, srcu_free_old_probes); +} + static inline void release_probes(struct tracepoint_func *old) { if (old) { struct tp_probes *tp_probes = container_of(old, struct tp_probes, probes[0]); + /* + * Tracepoint probes are protected by both sched RCU and SRCU, + * by calling the SRCU callback in the sched RCU callback we + * cover both cases. So let us chain the SRCU and sched RCU + * callbacks to wait for both grace periods. + */ call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes); } }
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
I would convert to rcu_dereference_raw() to appease sparse. The fancy stuff below is pointless if you then turn off all checking.
\
/* \
* For rcuidle callers, use srcu since sched-rcu \
* doesn't work from the idle path. \
*/ \
if (rcuidle) { \
if (in_nmi()) { \
WARN_ON_ONCE(1); \
return; /* no srcu from nmi */ \
} \
\
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
it_func_ptr = \
srcu_dereference_notrace((tp)->funcs, \
&tracepoint_srcu); \
/* To keep it consistent with !rcuidle path */ \
preempt_disable_notrace(); \
} else { \
rcu_read_lock_sched_notrace(); \
it_func_ptr = \
rcu_dereference_sched((tp)->funcs); \
} \
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 14:49:54 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
I would convert to rcu_dereference_raw() to appease sparse. The fancy stuff below is pointless if you then turn off all checking.
The problem with doing this is if we use a trace event without the proper _idle() or whatever, we wont get a warning that it is used incorrectly with lockdep. Or does lockdep still check if "rcu is watching" with rcu_dereference_raw()?
-- Steve
\
/* \
* For rcuidle callers, use srcu since sched-rcu \
* doesn't work from the idle path. \
*/ \
if (rcuidle) { \
if (in_nmi()) { \
WARN_ON_ONCE(1); \
return; /* no srcu from nmi */ \
} \
\
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
it_func_ptr = \
srcu_dereference_notrace((tp)->funcs, \
&tracepoint_srcu); \
/* To keep it consistent with !rcuidle path */ \
preempt_disable_notrace(); \
} else { \
rcu_read_lock_sched_notrace(); \
it_func_ptr = \
rcu_dereference_sched((tp)->funcs); \
} \
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:49:54 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
I would convert to rcu_dereference_raw() to appease sparse. The fancy stuff below is pointless if you then turn off all checking.
The problem with doing this is if we use a trace event without the proper _idle() or whatever, we wont get a warning that it is used incorrectly with lockdep. Or does lockdep still check if "rcu is watching" with rcu_dereference_raw()?
No lockdep checking is done by rcu_dereference_raw().
Thanx, Paul
-- Steve
\
/* \
* For rcuidle callers, use srcu since sched-rcu \
* doesn't work from the idle path. \
*/ \
if (rcuidle) { \
if (in_nmi()) { \
WARN_ON_ONCE(1); \
return; /* no srcu from nmi */ \
} \
\
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
it_func_ptr = \
srcu_dereference_notrace((tp)->funcs, \
&tracepoint_srcu); \
/* To keep it consistent with !rcuidle path */ \
preempt_disable_notrace(); \
} else { \
rcu_read_lock_sched_notrace(); \
it_func_ptr = \
rcu_dereference_sched((tp)->funcs); \
} \
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 07:27:44 -0700 "Paul E. McKenney" paulmck@linux.vnet.ibm.com wrote:
On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:49:54 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
I would convert to rcu_dereference_raw() to appease sparse. The fancy stuff below is pointless if you then turn off all checking.
The problem with doing this is if we use a trace event without the proper _idle() or whatever, we wont get a warning that it is used incorrectly with lockdep. Or does lockdep still check if "rcu is watching" with rcu_dereference_raw()?
No lockdep checking is done by rcu_dereference_raw().
Correct, but I think we can do this regardless. So Joel please resend with Peter's suggestion.
The reason being is because of this:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 10:46:18AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 07:27:44 -0700 "Paul E. McKenney" paulmck@linux.vnet.ibm.com wrote:
On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:49:54 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
I would convert to rcu_dereference_raw() to appease sparse. The fancy stuff below is pointless if you then turn off all checking.
The problem with doing this is if we use a trace event without the proper _idle() or whatever, we wont get a warning that it is used incorrectly with lockdep. Or does lockdep still check if "rcu is watching" with rcu_dereference_raw()?
No lockdep checking is done by rcu_dereference_raw().
Correct, but I think we can do this regardless. So Joel please resend with Peter's suggestion.
The reason being is because of this:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Indeed, the rcu_dereference_sched() would catch it in that case, so agreed, Peter's suggestion isn't losing any debuggability.
Thanx, Paul
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 08:15:59AM -0700, Paul E. McKenney wrote:
On Wed, Jul 11, 2018 at 10:46:18AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 07:27:44 -0700 "Paul E. McKenney" paulmck@linux.vnet.ibm.com wrote:
On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:49:54 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
I would convert to rcu_dereference_raw() to appease sparse. The fancy stuff below is pointless if you then turn off all checking.
The problem with doing this is if we use a trace event without the proper _idle() or whatever, we wont get a warning that it is used incorrectly with lockdep. Or does lockdep still check if "rcu is watching" with rcu_dereference_raw()?
No lockdep checking is done by rcu_dereference_raw().
Correct, but I think we can do this regardless. So Joel please resend with Peter's suggestion.
The reason being is because of this:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Indeed, the rcu_dereference_sched() would catch it in that case, so agreed, Peter's suggestion isn't losing any debuggability.
Hmm, but if we are doing the check later anyway, then why not do it in __DO_TRACE itself?
Also I guess we are discussing about changing the rcu_dereference_sched which I think should go into a separate patch since my patch isn't touching how the rcuidle==0 paths use the RCU API. So I think this is an existing issue independent of this series.
thanks!
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 13:56:39 -0700 Joel Fernandes joel@joelfernandes.org wrote:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Indeed, the rcu_dereference_sched() would catch it in that case, so agreed, Peter's suggestion isn't losing any debuggability.
Hmm, but if we are doing the check later anyway, then why not do it in __DO_TRACE itself?
Because __DO_TRACE is only called if the trace event is enabled. If we never enable a trace event, we never know if it has a potential of doing it wrong. The second part is to trigger the warning immediately regardless if the trace event is enabled or not.
Also I guess we are discussing about changing the rcu_dereference_sched which I think should go into a separate patch since my patch isn't touching how the rcuidle==0 paths use the RCU API. So I think this is an existing issue independent of this series.
But the code you added made it much more complex to keep the checks as is. If we remove the checks then this patch doesn't need to have all the if statements, and we can do it the way Peter suggested.
But sure, go ahead and make a separate patch first that removes the checks from __DO_TRACE() first if you want to.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 09:22:37PM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 13:56:39 -0700 Joel Fernandes joel@joelfernandes.org wrote:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Indeed, the rcu_dereference_sched() would catch it in that case, so agreed, Peter's suggestion isn't losing any debuggability.
Hmm, but if we are doing the check later anyway, then why not do it in __DO_TRACE itself?
Because __DO_TRACE is only called if the trace event is enabled. If we never enable a trace event, we never know if it has a potential of doing it wrong. The second part is to trigger the warning immediately regardless if the trace event is enabled or not.
I see, thanks for the clarification.
Also I guess we are discussing about changing the rcu_dereference_sched which I think should go into a separate patch since my patch isn't touching how the rcuidle==0 paths use the RCU API. So I think this is an existing issue independent of this series.
But the code you added made it much more complex to keep the checks as is. If we remove the checks then this patch doesn't need to have all the if statements, and we can do it the way Peter suggested.
Yes, I agree Peter's suggestion is very clean.
But sure, go ahead and make a separate patch first that removes the checks from __DO_TRACE() first if you want to.
No its ok, no problem, I can just do it in the same patch now that I see the code is much simplified with what Peter is suggesting.
thanks!
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 10:46:18AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 07:27:44 -0700 "Paul E. McKenney" paulmck@linux.vnet.ibm.com wrote:
On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:49:54 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
I would convert to rcu_dereference_raw() to appease sparse. The fancy stuff below is pointless if you then turn off all checking.
The problem with doing this is if we use a trace event without the proper _idle() or whatever, we wont get a warning that it is used incorrectly with lockdep. Or does lockdep still check if "rcu is watching" with rcu_dereference_raw()?
No lockdep checking is done by rcu_dereference_raw().
Correct, but I think we can do this regardless. So Joel please resend with Peter's suggestion.
The reason being is because of this:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Sounds good, I'm Ok with making this change.
Just to clarify, are you proposing to change the rcu_dereference_sched to rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?
thanks!
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 13:52:49 -0700 Joel Fernandes joel@joelfernandes.org wrote:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Sounds good, I'm Ok with making this change.
Just to clarify, are you proposing to change the rcu_dereference_sched to rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?
No, just in __DO_TRACE(). The rcu_dereference_sched() above in __DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is required to show the warnings if trace_##name() is used wrong, and is the reason we can use rcu_dereference_raw() in __DO_TRACE() in the first place ;-)
This brings up another point. We should probably add to __DECLARE_TRACE_RCU() this:
#ifndef MODULE #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \ static inline void trace_##name##_rcuidle(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 1); \ + if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ + int idx; \ + idx = srcu_read_lock_notrace(&tracepoint_srcu); \ + srcu_dereference_notrace(__tracepoint_##name.funcs, \ + &tracepoint_srcu); \ + srcu_read_unlock_notrace(&tracepoint_srcu, idx); \ + } \ } #else
So that lockdep works with trace_##name##__rcuidle() when the trace event is not enabled.
But that should be a separate patch and not part of this series. I may write that up tomorrow.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 11:21:20PM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 13:52:49 -0700 Joel Fernandes joel@joelfernandes.org wrote:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Sounds good, I'm Ok with making this change.
Just to clarify, are you proposing to change the rcu_dereference_sched to rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?
No, just in __DO_TRACE(). The rcu_dereference_sched() above in __DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is required to show the warnings if trace_##name() is used wrong, and is the reason we can use rcu_dereference_raw() in __DO_TRACE() in the first place ;-)
This brings up another point. We should probably add to __DECLARE_TRACE_RCU() this:
#ifndef MODULE #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \ static inline void trace_##name##_rcuidle(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 1); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
int idx; \
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
srcu_dereference_notrace(__tracepoint_##name.funcs, \
&tracepoint_srcu); \
srcu_read_unlock_notrace(&tracepoint_srcu, idx); \
}} \
#else
So that lockdep works with trace_##name##__rcuidle() when the trace event is not enabled.
But that should be a separate patch and not part of this series. I may write that up tomorrow.
Yes, that sounds good to me and would be good to add the safe guard there. But you meant srcu_dereference above, not srcu_dereference_notrace right?
Meanwhile I'll drop that lockdep_recursion tomorrow and run some tests and see how it behaves with Peter's changes.
thanks!
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 21:28:25 -0700 Joel Fernandes joel@joelfernandes.org wrote:
On Wed, Jul 11, 2018 at 11:21:20PM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 13:52:49 -0700 Joel Fernandes joel@joelfernandes.org wrote:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Sounds good, I'm Ok with making this change.
Just to clarify, are you proposing to change the rcu_dereference_sched to rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?
No, just in __DO_TRACE(). The rcu_dereference_sched() above in __DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is required to show the warnings if trace_##name() is used wrong, and is the reason we can use rcu_dereference_raw() in __DO_TRACE() in the first place ;-)
This brings up another point. We should probably add to __DECLARE_TRACE_RCU() this:
#ifndef MODULE #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \ static inline void trace_##name##_rcuidle(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 1); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
int idx; \
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
srcu_dereference_notrace(__tracepoint_##name.funcs, \
&tracepoint_srcu); \
srcu_read_unlock_notrace(&tracepoint_srcu, idx); \
}} \
#else
So that lockdep works with trace_##name##__rcuidle() when the trace event is not enabled.
But that should be a separate patch and not part of this series. I may write that up tomorrow.
Yes, that sounds good to me and would be good to add the safe guard there. But you meant srcu_dereference above, not srcu_dereference_notrace right?
We don't need to trace them. I believe that the "srcu_*_notrace" still performs the lockdep checks. That's what we want. If they don't then we should not use notrace. But I believe they still do lockdep.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 12, 2018 at 09:35:12AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 21:28:25 -0700 Joel Fernandes joel@joelfernandes.org wrote:
On Wed, Jul 11, 2018 at 11:21:20PM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 13:52:49 -0700 Joel Fernandes joel@joelfernandes.org wrote:
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ }
Because lockdep would only trigger warnings when the tracepoint was enabled and used in a place it shouldn't be, we added the above IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the tracepoint was enabled or not. Because we do this, we don't need to have the test in the __DO_TRACE() code itself. That means we can clean up the code as per Peter's suggestion.
Sounds good, I'm Ok with making this change.
Just to clarify, are you proposing to change the rcu_dereference_sched to rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?
No, just in __DO_TRACE(). The rcu_dereference_sched() above in __DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is required to show the warnings if trace_##name() is used wrong, and is the reason we can use rcu_dereference_raw() in __DO_TRACE() in the first place ;-)
This brings up another point. We should probably add to __DECLARE_TRACE_RCU() this:
#ifndef MODULE #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \ static inline void trace_##name##_rcuidle(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 1); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
int idx; \
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
srcu_dereference_notrace(__tracepoint_##name.funcs, \
&tracepoint_srcu); \
srcu_read_unlock_notrace(&tracepoint_srcu, idx); \
}} \
#else
So that lockdep works with trace_##name##__rcuidle() when the trace event is not enabled.
But that should be a separate patch and not part of this series. I may write that up tomorrow.
Yes, that sounds good to me and would be good to add the safe guard there. But you meant srcu_dereference above, not srcu_dereference_notrace right?
We don't need to trace them. I believe that the "srcu_*_notrace" still performs the lockdep checks. That's what we want. If they don't then we should not use notrace. But I believe they still do lockdep.
AFAICT, _notrace doesn't call into lockdep or tracing (there's also a comment that says so):
/** * srcu_dereference_notrace - no tracing and no lockdep calls from here */
So then, we should use the regular variant for this additional check you're suggesting.
thanks,
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 12 Jul 2018 12:17:01 -0700 Joel Fernandes joel@joelfernandes.org wrote:
AFAICT, _notrace doesn't call into lockdep or tracing (there's also a comment that says so):
/**
- srcu_dereference_notrace - no tracing and no lockdep calls from here
*/
Note, I had a different tree checked out, so I didn't have the source available without digging through my email.
So then, we should use the regular variant for this additional check you're suggesting.
OK, I thought we had a rcu_dereference_notrace() that did checks and thought that this followed suit, but it appears there is no such call. That's where my confusion was.
Sure, I'll nuke the notrace() portion, thanks.
Also, I've applied 1-3, since 4 and 5 looks to be getting a remake, I'm going to remove them from my queue. Please fold the SPDX patch into 5.
Thanks!
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 12, 2018 at 1:15 PM, Steven Rostedt rostedt@goodmis.org wrote:
On Thu, 12 Jul 2018 12:17:01 -0700
So then, we should use the regular variant for this additional check you're suggesting.
OK, I thought we had a rcu_dereference_notrace() that did checks and thought that this followed suit, but it appears there is no such call. That's where my confusion was.
Sure, I'll nuke the notrace() portion, thanks.
Also, I've applied 1-3, since 4 and 5 looks to be getting a remake, I'm going to remove them from my queue. Please fold the SPDX patch into 5.
Will do, and send out the 4 and 5 shortly with the SPDK folded.
Also the kselftest patches were acked and can be taken independently, I had reposted them as a separate 2 patch series with some minor changes based on your suggestions. Could you check them?
thanks!
- Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 12 Jul 2018 13:29:32 -0700 Joel Fernandes joel@joelfernandes.org wrote:
Also the kselftest patches were acked and can be taken independently, I had reposted them as a separate 2 patch series with some minor changes based on your suggestions. Could you check them?
Yep, I saw them. I was going to wait till these patches were sent, but since they are agnostic, I'll look at them now. Thanks for letting me know.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
rcu_read_lock_sched_notrace(); \
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
\
/* \
* For rcuidle callers, use srcu since sched-rcu \
* doesn't work from the idle path. \
*/ \
if (rcuidle) { \
if (in_nmi()) { \
WARN_ON_ONCE(1); \
return; /* no srcu from nmi */ \
} \
\
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
it_func_ptr = \
srcu_dereference_notrace((tp)->funcs, \
&tracepoint_srcu); \
/* To keep it consistent with !rcuidle path */ \
preempt_disable_notrace(); \
} else { \
rcu_read_lock_sched_notrace(); \
it_func_ptr = \
rcu_dereference_sched((tp)->funcs); \
} \
if (it_func_ptr) { \ do { \ it_func = (it_func_ptr)->func; \\
@@ -148,9 +177,13 @@ extern void syscall_unregfunc(void); ((void(*)(proto))(it_func))(args); \ } while ((++it_func_ptr)->func); \ } \
rcu_read_unlock_sched_notrace(); \
if (rcucheck) \
rcu_irq_exit_irqson(); \
\
if (rcuidle) { \
preempt_enable_notrace(); \
srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
} else { \
rcu_read_unlock_sched_notrace(); \
} while (0)} \
In fact, I would write the thing like:
preempt_disable_notrace(); if (rcuidle) idx = srcu_read_lock_notrace(&tracepoint_srcu);
it_func_ptr = rcu_dereference_raw((tp)->funcs);
/* ... */
if (rcu_idle) srcu_read_unlock_notrace(&tracepoint_srcu, idx); preempt_enable_notrace();
Much simpler and very much the same. -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 02:53:22PM +0200, Peter Zijlstra wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
rcu_read_lock_sched_notrace(); \
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
\
/* \
* For rcuidle callers, use srcu since sched-rcu \
* doesn't work from the idle path. \
*/ \
if (rcuidle) { \
if (in_nmi()) { \
WARN_ON_ONCE(1); \
return; /* no srcu from nmi */ \
} \
\
idx = srcu_read_lock_notrace(&tracepoint_srcu); \
it_func_ptr = \
srcu_dereference_notrace((tp)->funcs, \
&tracepoint_srcu); \
/* To keep it consistent with !rcuidle path */ \
preempt_disable_notrace(); \
} else { \
rcu_read_lock_sched_notrace(); \
it_func_ptr = \
rcu_dereference_sched((tp)->funcs); \
} \
if (it_func_ptr) { \ do { \ it_func = (it_func_ptr)->func; \\
@@ -148,9 +177,13 @@ extern void syscall_unregfunc(void); ((void(*)(proto))(it_func))(args); \ } while ((++it_func_ptr)->func); \ } \
rcu_read_unlock_sched_notrace(); \
if (rcucheck) \
rcu_irq_exit_irqson(); \
\
if (rcuidle) { \
preempt_enable_notrace(); \
srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
} else { \
rcu_read_unlock_sched_notrace(); \
} while (0)} \
In fact, I would write the thing like:
preempt_disable_notrace(); if (rcuidle) idx = srcu_read_lock_notrace(&tracepoint_srcu); it_func_ptr = rcu_dereference_raw((tp)->funcs); /* ... */ if (rcu_idle) srcu_read_unlock_notrace(&tracepoint_srcu, idx); preempt_enable_notrace();
Much simpler and very much the same.
Cool, thanks! I will do it this way and resend.
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
static inline void tracepoint_synchronize_unregister(void) {
- synchronize_srcu(&tracepoint_srcu); synchronize_sched();
}
Given you below do call_rcu_sched() and then call_srcu(), isn't the above the wrong way around?
Also, does the above want to be barrier instead of synchronize, so as to guarantee completion of the callbacks.
+static void srcu_free_old_probes(struct rcu_head *head) { kfree(container_of(head, struct tp_probes, rcu)); } +static void rcu_free_old_probes(struct rcu_head *head) +{
- call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
static inline void release_probes(struct tracepoint_func *old) { if (old) { struct tp_probes *tp_probes = container_of(old, struct tp_probes, probes[0]);
/*
* Tracepoint probes are protected by both sched RCU and SRCU,
* by calling the SRCU callback in the sched RCU callback we
* cover both cases. So let us chain the SRCU and sched RCU
* callbacks to wait for both grace periods.
call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes); }*/
}
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 14:56:47 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
static inline void tracepoint_synchronize_unregister(void) {
- synchronize_srcu(&tracepoint_srcu); synchronize_sched();
}
Given you below do call_rcu_sched() and then call_srcu(), isn't the above the wrong way around?
Good catch!
release_probes() call_rcu_sched() ---> rcu_free_old_probes() queued
tracepoint_synchronize_unregister() synchronize_srcu(&tracepoint_srcu); < finishes right away > synchronize_sched() --> rcu_free_old_probes() --> srcu_free_old_probes() queued Here tracepoint_synchronize_unregister() returned before the srcu portion ran.
Also, does the above want to be barrier instead of synchronize, so as to guarantee completion of the callbacks.
Not sure what you mean here.
-- Steve
+static void srcu_free_old_probes(struct rcu_head *head) { kfree(container_of(head, struct tp_probes, rcu)); } +static void rcu_free_old_probes(struct rcu_head *head) +{
- call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
static inline void release_probes(struct tracepoint_func *old) { if (old) { struct tp_probes *tp_probes = container_of(old, struct tp_probes, probes[0]);
/*
* Tracepoint probes are protected by both sched RCU and SRCU,
* by calling the SRCU callback in the sched RCU callback we
* cover both cases. So let us chain the SRCU and sched RCU
* callbacks to wait for both grace periods.
call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes); }*/
}
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:56:47 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
static inline void tracepoint_synchronize_unregister(void) {
- synchronize_srcu(&tracepoint_srcu); synchronize_sched();
}
Given you below do call_rcu_sched() and then call_srcu(), isn't the above the wrong way around?
Good catch!
release_probes() call_rcu_sched() ---> rcu_free_old_probes() queued
tracepoint_synchronize_unregister() synchronize_srcu(&tracepoint_srcu); < finishes right away > synchronize_sched() --> rcu_free_old_probes() --> srcu_free_old_probes() queued Here tracepoint_synchronize_unregister() returned before the srcu portion ran.
I just read the comment that goes with that function; the order doesn't matter. All we want to ensure is that the unregistration is visible to either sched or srcu tracepoint users. -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 17:17:51 +0200 Peter Zijlstra peterz@infradead.org wrote:
I just read the comment that goes with that function; the order doesn't matter. All we want to ensure is that the unregistration is visible to either sched or srcu tracepoint users.
Yeah, but I think it is still good to change the order. It doesn't hurt, and in my opinion makes the code a bit more robust.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- On Jul 11, 2018, at 11:26 AM, rostedt rostedt@goodmis.org wrote:
On Wed, 11 Jul 2018 17:17:51 +0200 Peter Zijlstra peterz@infradead.org wrote:
I just read the comment that goes with that function; the order doesn't matter. All we want to ensure is that the unregistration is visible to either sched or srcu tracepoint users.
Yeah, but I think it is still good to change the order. It doesn't hurt, and in my opinion makes the code a bit more robust.
I don't mind. It makes the code more regular. It does not change anything wrt robustness here though.
Thanks,
Mathieu
----- On Jul 11, 2018, at 11:17 AM, Peter Zijlstra peterz@infradead.org wrote:
On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:56:47 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
static inline void tracepoint_synchronize_unregister(void) {
- synchronize_srcu(&tracepoint_srcu); synchronize_sched();
}
Given you below do call_rcu_sched() and then call_srcu(), isn't the above the wrong way around?
Good catch!
release_probes() call_rcu_sched() ---> rcu_free_old_probes() queued
tracepoint_synchronize_unregister() synchronize_srcu(&tracepoint_srcu); < finishes right away > synchronize_sched() --> rcu_free_old_probes() --> srcu_free_old_probes() queued Here tracepoint_synchronize_unregister() returned before the srcu portion ran.
I just read the comment that goes with that function; the order doesn't matter. All we want to ensure is that the unregistration is visible to either sched or srcu tracepoint users.
Exactly, the order does not matter here.
Thanks,
Mathieu
On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:56:47 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
static inline void tracepoint_synchronize_unregister(void) {
- synchronize_srcu(&tracepoint_srcu); synchronize_sched();
}
Given you below do call_rcu_sched() and then call_srcu(), isn't the above the wrong way around?
Good catch!
release_probes() call_rcu_sched() ---> rcu_free_old_probes() queued
tracepoint_synchronize_unregister() synchronize_srcu(&tracepoint_srcu); < finishes right away > synchronize_sched() --> rcu_free_old_probes() --> srcu_free_old_probes() queued Here tracepoint_synchronize_unregister() returned before the srcu portion ran.
But isn't the point of synchronize_rcu to make sure that we're no longer in an RCU read-side section, not that *all* queued callbacks already ran? So in that case, I think it doesn't matter which order the 2 synchronize functions are called in. Please let me know if if I missed something!
I believe what we're trying to guarantee here is that no tracepoints using either flavor of RCU are active after tracepoint_synchronize_unregister returns.
thanks!
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 17:31:00 -0700 Joel Fernandes joel@joelfernandes.org wrote:
On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
On Wed, 11 Jul 2018 14:56:47 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
static inline void tracepoint_synchronize_unregister(void) {
- synchronize_srcu(&tracepoint_srcu); synchronize_sched();
}
Given you below do call_rcu_sched() and then call_srcu(), isn't the above the wrong way around?
Good catch!
release_probes() call_rcu_sched() ---> rcu_free_old_probes() queued
tracepoint_synchronize_unregister() synchronize_srcu(&tracepoint_srcu); < finishes right away > synchronize_sched() --> rcu_free_old_probes() --> srcu_free_old_probes() queued Here tracepoint_synchronize_unregister() returned before the srcu portion ran.
But isn't the point of synchronize_rcu to make sure that we're no longer in an RCU read-side section, not that *all* queued callbacks already ran? So in that case, I think it doesn't matter which order the 2 synchronize functions are called in. Please let me know if if I missed something!
I believe what we're trying to guarantee here is that no tracepoints using either flavor of RCU are active after tracepoint_synchronize_unregister returns.
Yes you are correct. If tracepoint_synchronize_unregister() is only to make sure that there is no more trace events using the probes, then this should work. I was focused on looking at it with release_probes() too. So the patch is fine as is.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Joel Fernandes (Google)" joel@joelfernandes.org
This patch detaches the preemptirq tracepoints from the tracers and keeps it separate.
Advantages: * Lockdep and irqsoff event can now run in parallel since they no longer have their own calls.
* This unifies the usecase of adding hooks to an irqsoff and irqson event, and a preemptoff and preempton event. 3 users of the events exist: - Lockdep - irqsoff and preemptoff tracers - irqs and preempt trace events
The unification cleans up several ifdefs and makes the code in preempt tracer and irqsoff tracers simpler. It gets rid of all the horrific ifdeferry around PROVE_LOCKING and makes configuration of the different users of the tracepoints more easy and understandable. It also gets rid of the time_* function calls from the lockdep hooks used to call into the preemptirq tracer which is not needed anymore. The negative delta in lines of code in this patch is quite large too.
In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS as a single point for registering probes onto the tracepoints. With this, the web of config options for preempt/irq toggle tracepoints and its users becomes:
PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING | | \ | | \ (selects) / \ \ (selects) / TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS \ / \ (depends on) / PREEMPTIRQ_TRACEPOINTS
One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things.
Other than the performance tests mentioned in the previous patch, I also ran the locking API test suite. I verified that all tests cases are passing.
I also injected issues by not registering lockdep probes onto the tracepoints and I see failures to confirm that the probes are indeed working.
This series + lockdep probes not registered (just to inject errors): [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok | [ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
With this series + lockdep probes registered, all locking tests pass:
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
Reviewed-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- include/linux/ftrace.h | 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/trace/events/preemptirq.h | 23 +-- init/main.c | 5 +- kernel/locking/lockdep.c | 35 ++--- kernel/sched/core.c | 2 +- kernel/trace/Kconfig | 22 ++- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 231 ++++++++---------------------- kernel/trace/trace_preemptirq.c | 71 +++++++++ 12 files changed, 194 insertions(+), 229 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 8154f4920fcb..f32e3c81407e 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void) return CALLER_ADDR2; }
-#ifdef CONFIG_IRQSOFF_TRACER - extern void time_hardirqs_on(unsigned long a0, unsigned long a1); - extern void time_hardirqs_off(unsigned long a0, unsigned long a1); -#else - static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { } - static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { } -#endif - -#if defined(CONFIG_PREEMPT_TRACER) || \ - (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS)) +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE extern void trace_preempt_on(unsigned long a0, unsigned long a1); extern void trace_preempt_off(unsigned long a0, unsigned long a1); #else diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 9700f00bbc04..50edb9cbbd26 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -15,9 +15,16 @@ #include <linux/typecheck.h> #include <asm/irqflags.h>
-#ifdef CONFIG_TRACE_IRQFLAGS +/* Currently trace_softirqs_on/off is used only by lockdep */ +#ifdef CONFIG_PROVE_LOCKING extern void trace_softirqs_on(unsigned long ip); extern void trace_softirqs_off(unsigned long ip); +#else +# define trace_softirqs_on(ip) do { } while (0) +# define trace_softirqs_off(ip) do { } while (0) +#endif + +#ifdef CONFIG_TRACE_IRQFLAGS extern void trace_hardirqs_on(void); extern void trace_hardirqs_off(void); # define trace_hardirq_context(p) ((p)->hardirq_context) @@ -43,8 +50,6 @@ do { \ #else # define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_off() do { } while (0) -# define trace_softirqs_on(ip) do { } while (0) -# define trace_softirqs_off(ip) do { } while (0) # define trace_hardirq_context(p) 0 # define trace_softirq_context(p) 0 # define trace_hardirqs_enabled(p) 0 diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 6fc77d4dbdcd..a8113357ceeb 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -266,7 +266,8 @@ struct held_lock { /* * Initialization, self-test and debugging-output methods: */ -extern void lockdep_info(void); +extern void lockdep_init(void); +extern void lockdep_init_early(void); extern void lockdep_reset(void); extern void lockdep_reset_lock(struct lockdep_map *lock); extern void lockdep_free_key_range(void *start, unsigned long size); @@ -406,7 +407,8 @@ static inline void lockdep_on(void) # define lock_downgrade(l, i) do { } while (0) # define lock_set_class(l, n, k, s, i) do { } while (0) # define lock_set_subclass(l, s, i) do { } while (0) -# define lockdep_info() do { } while (0) +# define lockdep_init() do { } while (0) +# define lockdep_init_early() do { } while (0) # define lockdep_init_map(lock, name, key, sub) \ do { (void)(name); (void)(key); } while (0) # define lockdep_set_class(lock, key) do { (void)(key); } while (0) @@ -532,7 +534,7 @@ do { \
#endif /* CONFIG_LOCKDEP */
-#ifdef CONFIG_TRACE_IRQFLAGS +#ifdef CONFIG_PROVE_LOCKING extern void print_irqtrace_events(struct task_struct *curr); #else static inline void print_irqtrace_events(struct task_struct *curr) diff --git a/include/linux/preempt.h b/include/linux/preempt.h index 5bd3f151da78..c01813c3fbe9 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -150,7 +150,7 @@ */ #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE) extern void preempt_count_add(int val); extern void preempt_count_sub(int val); #define preempt_count_dec_and_test() \ diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h index 9c4eb33c5a1d..9a0d4ceeb166 100644 --- a/include/trace/events/preemptirq.h +++ b/include/trace/events/preemptirq.h @@ -1,4 +1,4 @@ -#ifdef CONFIG_PREEMPTIRQ_EVENTS +#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS
#undef TRACE_SYSTEM #define TRACE_SYSTEM preemptirq @@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template, (void *)((unsigned long)(_stext) + __entry->parent_offs)) );
-#ifndef CONFIG_PROVE_LOCKING +#ifdef CONFIG_TRACE_IRQFLAGS DEFINE_EVENT(preemptirq_template, irq_disable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); @@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable, DEFINE_EVENT(preemptirq_template, irq_enable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); +#else +#define trace_irq_enable(...) +#define trace_irq_disable(...) +#define trace_irq_enable_rcuidle(...) +#define trace_irq_disable_rcuidle(...) #endif
-#ifdef CONFIG_DEBUG_PREEMPT +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE DEFINE_EVENT(preemptirq_template, preempt_disable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); @@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable, DEFINE_EVENT(preemptirq_template, preempt_enable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); +#else +#define trace_preempt_enable(...) +#define trace_preempt_disable(...) +#define trace_preempt_enable_rcuidle(...) +#define trace_preempt_disable_rcuidle(...) #endif
#endif /* _TRACE_PREEMPTIRQ_H */
#include <trace/define_trace.h>
-#endif /* !CONFIG_PREEMPTIRQ_EVENTS */ - -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING) +#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */ #define trace_irq_enable(...) #define trace_irq_disable(...) #define trace_irq_enable_rcuidle(...) #define trace_irq_disable_rcuidle(...) -#endif - -#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT) #define trace_preempt_enable(...) #define trace_preempt_disable(...) #define trace_preempt_enable_rcuidle(...) diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..44fe43be84c1 100644 --- a/init/main.c +++ b/init/main.c @@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void) profile_init(); call_function_init(); WARN(!irqs_disabled(), "Interrupts were enabled early\n"); + + lockdep_init_early(); + early_boot_irqs_disabled = false; local_irq_enable();
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void) panic("Too many boot %s vars at `%s'", panic_later, panic_param);
- lockdep_info(); + lockdep_init();
/* * Need to run this when irqs are enabled, because it wants diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 5fa4d3138bf1..b961a1698e98 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -55,6 +55,7 @@
#include "lockdep_internals.h"
+#include <trace/events/preemptirq.h> #define CREATE_TRACE_POINTS #include <trace/events/lock.h>
@@ -2845,10 +2846,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip) debug_atomic_inc(hardirqs_on_events); }
-__visible void trace_hardirqs_on_caller(unsigned long ip) +static void lockdep_hardirqs_on(void *none, unsigned long ignore, + unsigned long ip) { - time_hardirqs_on(CALLER_ADDR0, ip); - if (unlikely(!debug_locks || current->lockdep_recursion)) return;
@@ -2887,23 +2887,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip) __trace_hardirqs_on_caller(ip); current->lockdep_recursion = 0; } -EXPORT_SYMBOL(trace_hardirqs_on_caller); - -void trace_hardirqs_on(void) -{ - trace_hardirqs_on_caller(CALLER_ADDR0); -} -EXPORT_SYMBOL(trace_hardirqs_on);
/* * Hardirqs were disabled: */ -__visible void trace_hardirqs_off_caller(unsigned long ip) +static void lockdep_hardirqs_off(void *none, unsigned long ignore, + unsigned long ip) { struct task_struct *curr = current;
- time_hardirqs_off(CALLER_ADDR0, ip); - if (unlikely(!debug_locks || current->lockdep_recursion)) return;
@@ -2925,13 +2917,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip) } else debug_atomic_inc(redundant_hardirqs_off); } -EXPORT_SYMBOL(trace_hardirqs_off_caller); - -void trace_hardirqs_off(void) -{ - trace_hardirqs_off_caller(CALLER_ADDR0); -} -EXPORT_SYMBOL(trace_hardirqs_off);
/* * Softirqs will be enabled: @@ -4338,7 +4323,15 @@ void lockdep_reset_lock(struct lockdep_map *lock) raw_local_irq_restore(flags); }
-void __init lockdep_info(void) +void __init lockdep_init_early(void) +{ +#ifdef CONFIG_PROVE_LOCKING + register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX); + register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN); +#endif +} + +void __init lockdep_init(void) { printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78d8facba456..4c956f6849ec 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3192,7 +3192,7 @@ static inline void sched_tick_stop(int cpu) { } #endif
#if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ - defined(CONFIG_PREEMPT_TRACER)) + defined(CONFIG_TRACE_PREEMPT_TOGGLE)) /* * If the value passed in is equal to the current preempt count * then we just disabled preemption. Start timing the latency. diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index dcc0166d1997..8d51351e3149 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP Allow the use of ring_buffer_swap_cpu. Adds a very slight overhead to tracing when enabled.
+config PREEMPTIRQ_TRACEPOINTS + bool + depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS + select TRACING + default y + help + Create preempt/irq toggle tracepoints if needed, so that other parts + of the kernel can use them to generate or add hooks to them. + # All tracer options should select GENERIC_TRACER. For those options that are # enabled by all tracers (context switch and event tracer) they select TRACING. # This allows those options to appear when no other tracer is selected. But the @@ -155,18 +164,20 @@ config FUNCTION_GRAPH_TRACER the return value. This is done by setting the current return address on the current task structure into a stack of calls.
+config TRACE_PREEMPT_TOGGLE + bool + help + Enables hooks which will be called when preemption is first disabled, + and last enabled.
config PREEMPTIRQ_EVENTS bool "Enable trace events for preempt and irq disable/enable" select TRACE_IRQFLAGS - depends on DEBUG_PREEMPT || !PROVE_LOCKING - depends on TRACING + select TRACE_PREEMPT_TOGGLE if PREEMPT + select GENERIC_TRACER default n help Enable tracing of disable and enable events for preemption and irqs. - For tracing preempt disable/enable events, DEBUG_PREEMPT must be - enabled. For tracing irq disable/enable events, PROVE_LOCKING must - be disabled.
config IRQSOFF_TRACER bool "Interrupts-off Latency Tracer" @@ -203,6 +214,7 @@ config PREEMPT_TRACER select RING_BUFFER_ALLOW_SWAP select TRACER_SNAPSHOT select TRACER_SNAPSHOT_PER_CPU_SWAP + select TRACE_PREEMPT_TOGGLE help This option measures the time spent in preemption-off critical sections, with microsecond accuracy. diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index e2538c7638d4..84a0cb222f20 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o obj-$(CONFIG_TRACING_MAP) += tracing_map.o obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o -obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o +obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index f8daa754cce2..770cd30cda40 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -16,7 +16,6 @@
#include "trace.h"
-#define CREATE_TRACE_POINTS #include <trace/events/preemptirq.h>
#if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER) @@ -450,66 +449,6 @@ void stop_critical_timings(void) } EXPORT_SYMBOL_GPL(stop_critical_timings);
-#ifdef CONFIG_IRQSOFF_TRACER -#ifdef CONFIG_PROVE_LOCKING -void time_hardirqs_on(unsigned long a0, unsigned long a1) -{ - if (!preempt_trace() && irq_trace()) - stop_critical_timing(a0, a1); -} - -void time_hardirqs_off(unsigned long a0, unsigned long a1) -{ - if (!preempt_trace() && irq_trace()) - start_critical_timing(a0, a1); -} - -#else /* !CONFIG_PROVE_LOCKING */ - -/* - * We are only interested in hardirq on/off events: - */ -static inline void tracer_hardirqs_on(void) -{ - if (!preempt_trace() && irq_trace()) - stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); -} - -static inline void tracer_hardirqs_off(void) -{ - if (!preempt_trace() && irq_trace()) - start_critical_timing(CALLER_ADDR0, CALLER_ADDR1); -} - -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) -{ - if (!preempt_trace() && irq_trace()) - stop_critical_timing(CALLER_ADDR0, caller_addr); -} - -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) -{ - if (!preempt_trace() && irq_trace()) - start_critical_timing(CALLER_ADDR0, caller_addr); -} - -#endif /* CONFIG_PROVE_LOCKING */ -#endif /* CONFIG_IRQSOFF_TRACER */ - -#ifdef CONFIG_PREEMPT_TRACER -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) -{ - if (preempt_trace() && !irq_trace()) - stop_critical_timing(a0, a1); -} - -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) -{ - if (preempt_trace() && !irq_trace()) - start_critical_timing(a0, a1); -} -#endif /* CONFIG_PREEMPT_TRACER */ - #ifdef CONFIG_FUNCTION_TRACER static bool function_enabled;
@@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr) }
#ifdef CONFIG_IRQSOFF_TRACER +/* + * We are only interested in hardirq on/off events: + */ +static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1) +{ + if (!preempt_trace() && irq_trace()) + stop_critical_timing(a0, a1); +} + +static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1) +{ + if (!preempt_trace() && irq_trace()) + start_critical_timing(a0, a1); +} + static int irqsoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_IRQS_OFF;
+ register_trace_irq_disable(tracer_hardirqs_off, NULL); + register_trace_irq_enable(tracer_hardirqs_on, NULL); return __irqsoff_tracer_init(tr); }
static void irqsoff_tracer_reset(struct trace_array *tr) { + unregister_trace_irq_disable(tracer_hardirqs_off, NULL); + unregister_trace_irq_enable(tracer_hardirqs_on, NULL); __irqsoff_tracer_reset(tr); }
@@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, }; -# define register_irqsoff(trace) register_tracer(&trace) -#else -# define register_irqsoff(trace) do { } while (0) -#endif +#endif /* CONFIG_IRQSOFF_TRACER */
#ifdef CONFIG_PREEMPT_TRACER +static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1) +{ + if (preempt_trace() && !irq_trace()) + stop_critical_timing(a0, a1); +} + +static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1) +{ + if (preempt_trace() && !irq_trace()) + start_critical_timing(a0, a1); +} + static int preemptoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_PREEMPT_OFF;
+ register_trace_preempt_disable(tracer_preempt_off, NULL); + register_trace_preempt_enable(tracer_preempt_on, NULL); return __irqsoff_tracer_init(tr); }
static void preemptoff_tracer_reset(struct trace_array *tr) { + unregister_trace_preempt_disable(tracer_preempt_off, NULL); + unregister_trace_preempt_enable(tracer_preempt_on, NULL); __irqsoff_tracer_reset(tr); }
@@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, }; -# define register_preemptoff(trace) register_tracer(&trace) -#else -# define register_preemptoff(trace) do { } while (0) -#endif +#endif /* CONFIG_PREEMPT_TRACER */
-#if defined(CONFIG_IRQSOFF_TRACER) && \ - defined(CONFIG_PREEMPT_TRACER) +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
static int preemptirqsoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
+ register_trace_irq_disable(tracer_hardirqs_off, NULL); + register_trace_irq_enable(tracer_hardirqs_on, NULL); + register_trace_preempt_disable(tracer_preempt_off, NULL); + register_trace_preempt_enable(tracer_preempt_on, NULL); + return __irqsoff_tracer_init(tr); }
static void preemptirqsoff_tracer_reset(struct trace_array *tr) { + unregister_trace_irq_disable(tracer_hardirqs_off, NULL); + unregister_trace_irq_enable(tracer_hardirqs_on, NULL); + unregister_trace_preempt_disable(tracer_preempt_off, NULL); + unregister_trace_preempt_enable(tracer_preempt_on, NULL); + __irqsoff_tracer_reset(tr); }
@@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, }; - -# define register_preemptirqsoff(trace) register_tracer(&trace) -#else -# define register_preemptirqsoff(trace) do { } while (0) #endif
__init static int init_irqsoff_tracer(void) { - register_irqsoff(irqsoff_tracer); - register_preemptoff(preemptoff_tracer); - register_preemptirqsoff(preemptirqsoff_tracer); - - return 0; -} -core_initcall(init_irqsoff_tracer); -#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */ - -#ifndef CONFIG_IRQSOFF_TRACER -static inline void tracer_hardirqs_on(void) { } -static inline void tracer_hardirqs_off(void) { } -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { } -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { } +#ifdef CONFIG_IRQSOFF_TRACER + register_tracer(&irqsoff_tracer); #endif - -#ifndef CONFIG_PREEMPT_TRACER -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { } -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { } +#ifdef CONFIG_PREEMPT_TRACER + register_tracer(&preemptoff_tracer); #endif - -#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING) -/* Per-cpu variable to prevent redundant calls when IRQs already off */ -static DEFINE_PER_CPU(int, tracing_irq_cpu); - -void trace_hardirqs_on(void) -{ - if (!this_cpu_read(tracing_irq_cpu)) - return; - - trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); - tracer_hardirqs_on(); - - this_cpu_write(tracing_irq_cpu, 0); -} -EXPORT_SYMBOL(trace_hardirqs_on); - -void trace_hardirqs_off(void) -{ - if (this_cpu_read(tracing_irq_cpu)) - return; - - this_cpu_write(tracing_irq_cpu, 1); - - trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); - tracer_hardirqs_off(); -} -EXPORT_SYMBOL(trace_hardirqs_off); - -__visible void trace_hardirqs_on_caller(unsigned long caller_addr) -{ - if (!this_cpu_read(tracing_irq_cpu)) - return; - - trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); - tracer_hardirqs_on_caller(caller_addr); - - this_cpu_write(tracing_irq_cpu, 0); -} -EXPORT_SYMBOL(trace_hardirqs_on_caller); - -__visible void trace_hardirqs_off_caller(unsigned long caller_addr) -{ - if (this_cpu_read(tracing_irq_cpu)) - return; - - this_cpu_write(tracing_irq_cpu, 1); - - trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); - tracer_hardirqs_off_caller(caller_addr); -} -EXPORT_SYMBOL(trace_hardirqs_off_caller); - -/* - * Stubs: - */ - -void trace_softirqs_on(unsigned long ip) -{ -} - -void trace_softirqs_off(unsigned long ip) -{ -} - -inline void print_irqtrace_events(struct task_struct *curr) -{ -} +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER) + register_tracer(&preemptirqsoff_tracer); #endif
-#if defined(CONFIG_PREEMPT_TRACER) || \ - (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS)) -void trace_preempt_on(unsigned long a0, unsigned long a1) -{ - trace_preempt_enable_rcuidle(a0, a1); - tracer_preempt_on(a0, a1); -} - -void trace_preempt_off(unsigned long a0, unsigned long a1) -{ - trace_preempt_disable_rcuidle(a0, a1); - tracer_preempt_off(a0, a1); + return 0; } -#endif +core_initcall(init_irqsoff_tracer); +#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */ diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c new file mode 100644 index 000000000000..dc01c7f4d326 --- /dev/null +++ b/kernel/trace/trace_preemptirq.c @@ -0,0 +1,71 @@ +/* + * preemptoff and irqoff tracepoints + * + * Copyright (C) Joel Fernandes (Google) joel@joelfernandes.org + */ + +#include <linux/kallsyms.h> +#include <linux/uaccess.h> +#include <linux/module.h> +#include <linux/ftrace.h> + +#define CREATE_TRACE_POINTS +#include <trace/events/preemptirq.h> + +#ifdef CONFIG_TRACE_IRQFLAGS +/* Per-cpu variable to prevent redundant calls when IRQs already off */ +static DEFINE_PER_CPU(int, tracing_irq_cpu); + +void trace_hardirqs_on(void) +{ + if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu)) + return; + + trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); + this_cpu_write(tracing_irq_cpu, 0); +} +EXPORT_SYMBOL(trace_hardirqs_on); + +void trace_hardirqs_off(void) +{ + if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu)) + return; + + this_cpu_write(tracing_irq_cpu, 1); + trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); +} +EXPORT_SYMBOL(trace_hardirqs_off); + +__visible void trace_hardirqs_on_caller(unsigned long caller_addr) +{ + if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu)) + return; + + trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); + this_cpu_write(tracing_irq_cpu, 0); +} +EXPORT_SYMBOL(trace_hardirqs_on_caller); + +__visible void trace_hardirqs_off_caller(unsigned long caller_addr) +{ + if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu)) + return; + + this_cpu_write(tracing_irq_cpu, 1); + trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); +} +EXPORT_SYMBOL(trace_hardirqs_off_caller); +#endif /* CONFIG_TRACE_IRQFLAGS */ + +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE + +void trace_preempt_on(unsigned long a0, unsigned long a1) +{ + trace_preempt_enable_rcuidle(a0, a1); +} + +void trace_preempt_off(unsigned long a0, unsigned long a1) +{ + trace_preempt_disable_rcuidle(a0, a1); +} +#endif
Peter,
Want to ack this? It touches Lockdep.
Joel,
I got to this patch and I'm still reviewing it. I'll hopefully have my full review done by next week. I'll make it a priority. But I still would like Peter's ack on this one, as he's the maintainer of lockdep.
Thanks,
-- Steve
On Thu, 28 Jun 2018 11:21:47 -0700 Joel Fernandes joel@joelfernandes.org wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
This patch detaches the preemptirq tracepoints from the tracers and keeps it separate.
Advantages:
- Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.
- This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event. 3 users of the events exist:
- Lockdep
- irqsoff and preemptoff tracers
- irqs and preempt trace events
The unification cleans up several ifdefs and makes the code in preempt tracer and irqsoff tracers simpler. It gets rid of all the horrific ifdeferry around PROVE_LOCKING and makes configuration of the different users of the tracepoints more easy and understandable. It also gets rid of the time_* function calls from the lockdep hooks used to call into the preemptirq tracer which is not needed anymore. The negative delta in lines of code in this patch is quite large too.
In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS as a single point for registering probes onto the tracepoints. With this, the web of config options for preempt/irq toggle tracepoints and its users becomes:
PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING | | \ | | \ (selects) / \ \ (selects) / TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS \ / \ (depends on) / PREEMPTIRQ_TRACEPOINTS
One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things.
Other than the performance tests mentioned in the previous patch, I also ran the locking API test suite. I verified that all tests cases are passing.
I also injected issues by not registering lockdep probes onto the tracepoints and I see failures to confirm that the probes are indeed working.
This series + lockdep probes not registered (just to inject errors): [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok | [ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
With this series + lockdep probes registered, all locking tests pass:
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
Reviewed-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
include/linux/ftrace.h | 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/trace/events/preemptirq.h | 23 +-- init/main.c | 5 +- kernel/locking/lockdep.c | 35 ++--- kernel/sched/core.c | 2 +- kernel/trace/Kconfig | 22 ++- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 231 ++++++++---------------------- kernel/trace/trace_preemptirq.c | 71 +++++++++ 12 files changed, 194 insertions(+), 229 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 8154f4920fcb..f32e3c81407e 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void) return CALLER_ADDR2; } -#ifdef CONFIG_IRQSOFF_TRACER
- extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
- extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
- static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
- static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-#if defined(CONFIG_PREEMPT_TRACER) || \
- (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE extern void trace_preempt_on(unsigned long a0, unsigned long a1); extern void trace_preempt_off(unsigned long a0, unsigned long a1); #else diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 9700f00bbc04..50edb9cbbd26 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -15,9 +15,16 @@ #include <linux/typecheck.h> #include <asm/irqflags.h> -#ifdef CONFIG_TRACE_IRQFLAGS +/* Currently trace_softirqs_on/off is used only by lockdep */ +#ifdef CONFIG_PROVE_LOCKING extern void trace_softirqs_on(unsigned long ip); extern void trace_softirqs_off(unsigned long ip); +#else +# define trace_softirqs_on(ip) do { } while (0) +# define trace_softirqs_off(ip) do { } while (0) +#endif
+#ifdef CONFIG_TRACE_IRQFLAGS extern void trace_hardirqs_on(void); extern void trace_hardirqs_off(void); # define trace_hardirq_context(p) ((p)->hardirq_context) @@ -43,8 +50,6 @@ do { \ #else # define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_off() do { } while (0) -# define trace_softirqs_on(ip) do { } while (0) -# define trace_softirqs_off(ip) do { } while (0) # define trace_hardirq_context(p) 0 # define trace_softirq_context(p) 0 # define trace_hardirqs_enabled(p) 0 diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 6fc77d4dbdcd..a8113357ceeb 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -266,7 +266,8 @@ struct held_lock { /*
- Initialization, self-test and debugging-output methods:
*/ -extern void lockdep_info(void); +extern void lockdep_init(void); +extern void lockdep_init_early(void); extern void lockdep_reset(void); extern void lockdep_reset_lock(struct lockdep_map *lock); extern void lockdep_free_key_range(void *start, unsigned long size); @@ -406,7 +407,8 @@ static inline void lockdep_on(void) # define lock_downgrade(l, i) do { } while (0) # define lock_set_class(l, n, k, s, i) do { } while (0) # define lock_set_subclass(l, s, i) do { } while (0) -# define lockdep_info() do { } while (0) +# define lockdep_init() do { } while (0) +# define lockdep_init_early() do { } while (0) # define lockdep_init_map(lock, name, key, sub) \ do { (void)(name); (void)(key); } while (0) # define lockdep_set_class(lock, key) do { (void)(key); } while (0) @@ -532,7 +534,7 @@ do { \ #endif /* CONFIG_LOCKDEP */ -#ifdef CONFIG_TRACE_IRQFLAGS +#ifdef CONFIG_PROVE_LOCKING extern void print_irqtrace_events(struct task_struct *curr); #else static inline void print_irqtrace_events(struct task_struct *curr) diff --git a/include/linux/preempt.h b/include/linux/preempt.h index 5bd3f151da78..c01813c3fbe9 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -150,7 +150,7 @@ */ #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET) -#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE) extern void preempt_count_add(int val); extern void preempt_count_sub(int val); #define preempt_count_dec_and_test() \ diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h index 9c4eb33c5a1d..9a0d4ceeb166 100644 --- a/include/trace/events/preemptirq.h +++ b/include/trace/events/preemptirq.h @@ -1,4 +1,4 @@ -#ifdef CONFIG_PREEMPTIRQ_EVENTS +#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS #undef TRACE_SYSTEM #define TRACE_SYSTEM preemptirq @@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template, (void *)((unsigned long)(_stext) + __entry->parent_offs)) ); -#ifndef CONFIG_PROVE_LOCKING +#ifdef CONFIG_TRACE_IRQFLAGS DEFINE_EVENT(preemptirq_template, irq_disable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); @@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable, DEFINE_EVENT(preemptirq_template, irq_enable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); +#else +#define trace_irq_enable(...) +#define trace_irq_disable(...) +#define trace_irq_enable_rcuidle(...) +#define trace_irq_disable_rcuidle(...) #endif -#ifdef CONFIG_DEBUG_PREEMPT +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE DEFINE_EVENT(preemptirq_template, preempt_disable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); @@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable, DEFINE_EVENT(preemptirq_template, preempt_enable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); +#else +#define trace_preempt_enable(...) +#define trace_preempt_disable(...) +#define trace_preempt_enable_rcuidle(...) +#define trace_preempt_disable_rcuidle(...) #endif #endif /* _TRACE_PREEMPTIRQ_H */ #include <trace/define_trace.h> -#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING) +#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */ #define trace_irq_enable(...) #define trace_irq_disable(...) #define trace_irq_enable_rcuidle(...) #define trace_irq_disable_rcuidle(...) -#endif
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT) #define trace_preempt_enable(...) #define trace_preempt_disable(...) #define trace_preempt_enable_rcuidle(...) diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..44fe43be84c1 100644 --- a/init/main.c +++ b/init/main.c @@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void) profile_init(); call_function_init(); WARN(!irqs_disabled(), "Interrupts were enabled early\n");
- lockdep_init_early();
- early_boot_irqs_disabled = false; local_irq_enable();
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void) panic("Too many boot %s vars at `%s'", panic_later, panic_param);
- lockdep_info();
- lockdep_init();
/* * Need to run this when irqs are enabled, because it wants diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 5fa4d3138bf1..b961a1698e98 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -55,6 +55,7 @@ #include "lockdep_internals.h" +#include <trace/events/preemptirq.h> #define CREATE_TRACE_POINTS #include <trace/events/lock.h> @@ -2845,10 +2846,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip) debug_atomic_inc(hardirqs_on_events); } -__visible void trace_hardirqs_on_caller(unsigned long ip) +static void lockdep_hardirqs_on(void *none, unsigned long ignore,
unsigned long ip)
{
- time_hardirqs_on(CALLER_ADDR0, ip);
- if (unlikely(!debug_locks || current->lockdep_recursion)) return;
@@ -2887,23 +2887,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip) __trace_hardirqs_on_caller(ip); current->lockdep_recursion = 0; } -EXPORT_SYMBOL(trace_hardirqs_on_caller);
-void trace_hardirqs_on(void) -{
- trace_hardirqs_on_caller(CALLER_ADDR0);
-} -EXPORT_SYMBOL(trace_hardirqs_on); /*
- Hardirqs were disabled:
*/ -__visible void trace_hardirqs_off_caller(unsigned long ip) +static void lockdep_hardirqs_off(void *none, unsigned long ignore,
unsigned long ip)
{ struct task_struct *curr = current;
- time_hardirqs_off(CALLER_ADDR0, ip);
- if (unlikely(!debug_locks || current->lockdep_recursion)) return;
@@ -2925,13 +2917,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip) } else debug_atomic_inc(redundant_hardirqs_off); } -EXPORT_SYMBOL(trace_hardirqs_off_caller);
-void trace_hardirqs_off(void) -{
- trace_hardirqs_off_caller(CALLER_ADDR0);
-} -EXPORT_SYMBOL(trace_hardirqs_off); /*
- Softirqs will be enabled:
@@ -4338,7 +4323,15 @@ void lockdep_reset_lock(struct lockdep_map *lock) raw_local_irq_restore(flags); } -void __init lockdep_info(void) +void __init lockdep_init_early(void) +{ +#ifdef CONFIG_PROVE_LOCKING
- register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
- register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif +}
+void __init lockdep_init(void) { printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n"); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78d8facba456..4c956f6849ec 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3192,7 +3192,7 @@ static inline void sched_tick_stop(int cpu) { } #endif #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
defined(CONFIG_PREEMPT_TRACER))
defined(CONFIG_TRACE_PREEMPT_TOGGLE))
/*
- If the value passed in is equal to the current preempt count
- then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index dcc0166d1997..8d51351e3149 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP Allow the use of ring_buffer_swap_cpu. Adds a very slight overhead to tracing when enabled. +config PREEMPTIRQ_TRACEPOINTS
- bool
- depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
- select TRACING
- default y
- help
Create preempt/irq toggle tracepoints if needed, so that other parts
of the kernel can use them to generate or add hooks to them.
# All tracer options should select GENERIC_TRACER. For those options that are # enabled by all tracers (context switch and event tracer) they select TRACING. # This allows those options to appear when no other tracer is selected. But the @@ -155,18 +164,20 @@ config FUNCTION_GRAPH_TRACER the return value. This is done by setting the current return address on the current task structure into a stack of calls. +config TRACE_PREEMPT_TOGGLE
- bool
- help
Enables hooks which will be called when preemption is first disabled,
and last enabled.
config PREEMPTIRQ_EVENTS bool "Enable trace events for preempt and irq disable/enable" select TRACE_IRQFLAGS
- depends on DEBUG_PREEMPT || !PROVE_LOCKING
- depends on TRACING
- select TRACE_PREEMPT_TOGGLE if PREEMPT
- select GENERIC_TRACER default n help Enable tracing of disable and enable events for preemption and irqs.
For tracing preempt disable/enable events, DEBUG_PREEMPT must be
enabled. For tracing irq disable/enable events, PROVE_LOCKING must
be disabled.
config IRQSOFF_TRACER bool "Interrupts-off Latency Tracer" @@ -203,6 +214,7 @@ config PREEMPT_TRACER select RING_BUFFER_ALLOW_SWAP select TRACER_SNAPSHOT select TRACER_SNAPSHOT_PER_CPU_SWAP
- select TRACE_PREEMPT_TOGGLE help This option measures the time spent in preemption-off critical sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index e2538c7638d4..84a0cb222f20 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o obj-$(CONFIG_TRACING_MAP) += tracing_map.o obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o -obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o +obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index f8daa754cce2..770cd30cda40 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -16,7 +16,6 @@ #include "trace.h" -#define CREATE_TRACE_POINTS #include <trace/events/preemptirq.h> #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER) @@ -450,66 +449,6 @@ void stop_critical_timings(void) } EXPORT_SYMBOL_GPL(stop_critical_timings); -#ifdef CONFIG_IRQSOFF_TRACER -#ifdef CONFIG_PROVE_LOCKING -void time_hardirqs_on(unsigned long a0, unsigned long a1) -{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(a0, a1);
-}
-void time_hardirqs_off(unsigned long a0, unsigned long a1) -{
- if (!preempt_trace() && irq_trace())
start_critical_timing(a0, a1);
-}
-#else /* !CONFIG_PROVE_LOCKING */
-/*
- We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void) -{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-static inline void tracer_hardirqs_off(void) -{
- if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) -{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) -{
- if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-#endif /* CONFIG_PROVE_LOCKING */ -#endif /* CONFIG_IRQSOFF_TRACER */
-#ifdef CONFIG_PREEMPT_TRACER -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) -{
- if (preempt_trace() && !irq_trace())
stop_critical_timing(a0, a1);
-}
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) -{
- if (preempt_trace() && !irq_trace())
start_critical_timing(a0, a1);
-} -#endif /* CONFIG_PREEMPT_TRACER */
#ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; @@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr) } #ifdef CONFIG_IRQSOFF_TRACER +/*
- We are only interested in hardirq on/off events:
- */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1) +{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(a0, a1);
+}
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1) +{
- if (!preempt_trace() && irq_trace())
start_critical_timing(a0, a1);
+}
static int irqsoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_IRQS_OFF;
- register_trace_irq_disable(tracer_hardirqs_off, NULL);
- register_trace_irq_enable(tracer_hardirqs_on, NULL); return __irqsoff_tracer_init(tr);
} static void irqsoff_tracer_reset(struct trace_array *tr) {
- unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
- unregister_trace_irq_enable(tracer_hardirqs_on, NULL); __irqsoff_tracer_reset(tr);
} @@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, }; -# define register_irqsoff(trace) register_tracer(&trace) -#else -# define register_irqsoff(trace) do { } while (0) -#endif +#endif /* CONFIG_IRQSOFF_TRACER */ #ifdef CONFIG_PREEMPT_TRACER +static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1) +{
- if (preempt_trace() && !irq_trace())
stop_critical_timing(a0, a1);
+}
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1) +{
- if (preempt_trace() && !irq_trace())
start_critical_timing(a0, a1);
+}
static int preemptoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_PREEMPT_OFF;
- register_trace_preempt_disable(tracer_preempt_off, NULL);
- register_trace_preempt_enable(tracer_preempt_on, NULL); return __irqsoff_tracer_init(tr);
} static void preemptoff_tracer_reset(struct trace_array *tr) {
- unregister_trace_preempt_disable(tracer_preempt_off, NULL);
- unregister_trace_preempt_enable(tracer_preempt_on, NULL); __irqsoff_tracer_reset(tr);
} @@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, }; -# define register_preemptoff(trace) register_tracer(&trace) -#else -# define register_preemptoff(trace) do { } while (0) -#endif +#endif /* CONFIG_PREEMPT_TRACER */ -#if defined(CONFIG_IRQSOFF_TRACER) && \
- defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER) static int preemptirqsoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
- register_trace_irq_disable(tracer_hardirqs_off, NULL);
- register_trace_irq_enable(tracer_hardirqs_on, NULL);
- register_trace_preempt_disable(tracer_preempt_off, NULL);
- register_trace_preempt_enable(tracer_preempt_on, NULL);
- return __irqsoff_tracer_init(tr);
} static void preemptirqsoff_tracer_reset(struct trace_array *tr) {
- unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
- unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
- unregister_trace_preempt_disable(tracer_preempt_off, NULL);
- unregister_trace_preempt_enable(tracer_preempt_on, NULL);
- __irqsoff_tracer_reset(tr);
} @@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, };
-# define register_preemptirqsoff(trace) register_tracer(&trace) -#else -# define register_preemptirqsoff(trace) do { } while (0) #endif __init static int init_irqsoff_tracer(void) {
- register_irqsoff(irqsoff_tracer);
- register_preemptoff(preemptoff_tracer);
- register_preemptirqsoff(preemptirqsoff_tracer);
- return 0;
-} -core_initcall(init_irqsoff_tracer); -#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-#ifndef CONFIG_IRQSOFF_TRACER -static inline void tracer_hardirqs_on(void) { } -static inline void tracer_hardirqs_off(void) { } -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { } -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { } +#ifdef CONFIG_IRQSOFF_TRACER
- register_tracer(&irqsoff_tracer);
#endif
-#ifndef CONFIG_PREEMPT_TRACER -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { } -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { } +#ifdef CONFIG_PREEMPT_TRACER
- register_tracer(&preemptoff_tracer);
#endif
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING) -/* Per-cpu variable to prevent redundant calls when IRQs already off */ -static DEFINE_PER_CPU(int, tracing_irq_cpu);
-void trace_hardirqs_on(void) -{
- if (!this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
- tracer_hardirqs_on();
- this_cpu_write(tracing_irq_cpu, 0);
-} -EXPORT_SYMBOL(trace_hardirqs_on);
-void trace_hardirqs_off(void) -{
- if (this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
- tracer_hardirqs_off();
-} -EXPORT_SYMBOL(trace_hardirqs_off);
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr) -{
- if (!this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
- tracer_hardirqs_on_caller(caller_addr);
- this_cpu_write(tracing_irq_cpu, 0);
-} -EXPORT_SYMBOL(trace_hardirqs_on_caller);
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr) -{
- if (this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
- tracer_hardirqs_off_caller(caller_addr);
-} -EXPORT_SYMBOL(trace_hardirqs_off_caller);
-/*
- Stubs:
- */
-void trace_softirqs_on(unsigned long ip) -{ -}
-void trace_softirqs_off(unsigned long ip) -{ -}
-inline void print_irqtrace_events(struct task_struct *curr) -{ -} +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
- register_tracer(&preemptirqsoff_tracer);
#endif -#if defined(CONFIG_PREEMPT_TRACER) || \
- (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1) -{
- trace_preempt_enable_rcuidle(a0, a1);
- tracer_preempt_on(a0, a1);
-}
-void trace_preempt_off(unsigned long a0, unsigned long a1) -{
- trace_preempt_disable_rcuidle(a0, a1);
- tracer_preempt_off(a0, a1);
- return 0;
} -#endif +core_initcall(init_irqsoff_tracer); +#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */ diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c new file mode 100644 index 000000000000..dc01c7f4d326 --- /dev/null +++ b/kernel/trace/trace_preemptirq.c @@ -0,0 +1,71 @@ +/*
- preemptoff and irqoff tracepoints
- Copyright (C) Joel Fernandes (Google) joel@joelfernandes.org
- */
+#include <linux/kallsyms.h> +#include <linux/uaccess.h> +#include <linux/module.h> +#include <linux/ftrace.h>
+#define CREATE_TRACE_POINTS +#include <trace/events/preemptirq.h>
+#ifdef CONFIG_TRACE_IRQFLAGS +/* Per-cpu variable to prevent redundant calls when IRQs already off */ +static DEFINE_PER_CPU(int, tracing_irq_cpu);
+void trace_hardirqs_on(void) +{
- if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
- this_cpu_write(tracing_irq_cpu, 0);
+} +EXPORT_SYMBOL(trace_hardirqs_on);
+void trace_hardirqs_off(void) +{
- if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+} +EXPORT_SYMBOL(trace_hardirqs_off);
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr) +{
- if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
- this_cpu_write(tracing_irq_cpu, 0);
+} +EXPORT_SYMBOL(trace_hardirqs_on_caller);
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr) +{
- if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+} +EXPORT_SYMBOL(trace_hardirqs_off_caller); +#endif /* CONFIG_TRACE_IRQFLAGS */
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+void trace_preempt_on(unsigned long a0, unsigned long a1) +{
- trace_preempt_enable_rcuidle(a0, a1);
+}
+void trace_preempt_off(unsigned long a0, unsigned long a1) +{
- trace_preempt_disable_rcuidle(a0, a1);
+} +#endif
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 06, 2018 at 06:06:10PM -0400, Steven Rostedt wrote:
Peter,
Want to ack this? It touches Lockdep.
Joel,
I got to this patch and I'm still reviewing it. I'll hopefully have my full review done by next week. I'll make it a priority. But I still would like Peter's ack on this one, as he's the maintainer of lockdep.
Thanks a lot Steven.
Peter, the lockdep calls are just small changes to the calling of the irq on/off hooks and minor clean ups. Also I ran full locking API selftests with all tests passing. I hope you are Ok with this change. Appreciate an Ack for the lockdep bits and thanks.
-Joel
Thanks,
-- Steve
On Thu, 28 Jun 2018 11:21:47 -0700 Joel Fernandes joel@joelfernandes.org wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
This patch detaches the preemptirq tracepoints from the tracers and keeps it separate.
Advantages:
- Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.
- This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event. 3 users of the events exist:
- Lockdep
- irqsoff and preemptoff tracers
- irqs and preempt trace events
The unification cleans up several ifdefs and makes the code in preempt tracer and irqsoff tracers simpler. It gets rid of all the horrific ifdeferry around PROVE_LOCKING and makes configuration of the different users of the tracepoints more easy and understandable. It also gets rid of the time_* function calls from the lockdep hooks used to call into the preemptirq tracer which is not needed anymore. The negative delta in lines of code in this patch is quite large too.
In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS as a single point for registering probes onto the tracepoints. With this, the web of config options for preempt/irq toggle tracepoints and its users becomes:
PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING | | \ | | \ (selects) / \ \ (selects) / TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS \ / \ (depends on) / PREEMPTIRQ_TRACEPOINTS
One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things.
Other than the performance tests mentioned in the previous patch, I also ran the locking API test suite. I verified that all tests cases are passing.
I also injected issues by not registering lockdep probes onto the tracepoints and I see failures to confirm that the probes are indeed working.
This series + lockdep probes not registered (just to inject errors): [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok | [ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
With this series + lockdep probes registered, all locking tests pass:
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok | [ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok | [ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok | [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
Reviewed-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
include/linux/ftrace.h | 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/trace/events/preemptirq.h | 23 +-- init/main.c | 5 +- kernel/locking/lockdep.c | 35 ++--- kernel/sched/core.c | 2 +- kernel/trace/Kconfig | 22 ++- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 231 ++++++++---------------------- kernel/trace/trace_preemptirq.c | 71 +++++++++ 12 files changed, 194 insertions(+), 229 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 8154f4920fcb..f32e3c81407e 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void) return CALLER_ADDR2; } -#ifdef CONFIG_IRQSOFF_TRACER
- extern void time_hardirqs_on(unsigned long a0, unsigned long a1);
- extern void time_hardirqs_off(unsigned long a0, unsigned long a1);
-#else
- static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { }
- static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
-#endif
-#if defined(CONFIG_PREEMPT_TRACER) || \
- (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE extern void trace_preempt_on(unsigned long a0, unsigned long a1); extern void trace_preempt_off(unsigned long a0, unsigned long a1); #else diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 9700f00bbc04..50edb9cbbd26 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -15,9 +15,16 @@ #include <linux/typecheck.h> #include <asm/irqflags.h> -#ifdef CONFIG_TRACE_IRQFLAGS +/* Currently trace_softirqs_on/off is used only by lockdep */ +#ifdef CONFIG_PROVE_LOCKING extern void trace_softirqs_on(unsigned long ip); extern void trace_softirqs_off(unsigned long ip); +#else +# define trace_softirqs_on(ip) do { } while (0) +# define trace_softirqs_off(ip) do { } while (0) +#endif
+#ifdef CONFIG_TRACE_IRQFLAGS extern void trace_hardirqs_on(void); extern void trace_hardirqs_off(void); # define trace_hardirq_context(p) ((p)->hardirq_context) @@ -43,8 +50,6 @@ do { \ #else # define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_off() do { } while (0) -# define trace_softirqs_on(ip) do { } while (0) -# define trace_softirqs_off(ip) do { } while (0) # define trace_hardirq_context(p) 0 # define trace_softirq_context(p) 0 # define trace_hardirqs_enabled(p) 0 diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 6fc77d4dbdcd..a8113357ceeb 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -266,7 +266,8 @@ struct held_lock { /*
- Initialization, self-test and debugging-output methods:
*/ -extern void lockdep_info(void); +extern void lockdep_init(void); +extern void lockdep_init_early(void); extern void lockdep_reset(void); extern void lockdep_reset_lock(struct lockdep_map *lock); extern void lockdep_free_key_range(void *start, unsigned long size); @@ -406,7 +407,8 @@ static inline void lockdep_on(void) # define lock_downgrade(l, i) do { } while (0) # define lock_set_class(l, n, k, s, i) do { } while (0) # define lock_set_subclass(l, s, i) do { } while (0) -# define lockdep_info() do { } while (0) +# define lockdep_init() do { } while (0) +# define lockdep_init_early() do { } while (0) # define lockdep_init_map(lock, name, key, sub) \ do { (void)(name); (void)(key); } while (0) # define lockdep_set_class(lock, key) do { (void)(key); } while (0) @@ -532,7 +534,7 @@ do { \ #endif /* CONFIG_LOCKDEP */ -#ifdef CONFIG_TRACE_IRQFLAGS +#ifdef CONFIG_PROVE_LOCKING extern void print_irqtrace_events(struct task_struct *curr); #else static inline void print_irqtrace_events(struct task_struct *curr) diff --git a/include/linux/preempt.h b/include/linux/preempt.h index 5bd3f151da78..c01813c3fbe9 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -150,7 +150,7 @@ */ #define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET) -#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE) extern void preempt_count_add(int val); extern void preempt_count_sub(int val); #define preempt_count_dec_and_test() \ diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h index 9c4eb33c5a1d..9a0d4ceeb166 100644 --- a/include/trace/events/preemptirq.h +++ b/include/trace/events/preemptirq.h @@ -1,4 +1,4 @@ -#ifdef CONFIG_PREEMPTIRQ_EVENTS +#ifdef CONFIG_PREEMPTIRQ_TRACEPOINTS #undef TRACE_SYSTEM #define TRACE_SYSTEM preemptirq @@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(preemptirq_template, (void *)((unsigned long)(_stext) + __entry->parent_offs)) ); -#ifndef CONFIG_PROVE_LOCKING +#ifdef CONFIG_TRACE_IRQFLAGS DEFINE_EVENT(preemptirq_template, irq_disable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); @@ -40,9 +40,14 @@ DEFINE_EVENT(preemptirq_template, irq_disable, DEFINE_EVENT(preemptirq_template, irq_enable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); +#else +#define trace_irq_enable(...) +#define trace_irq_disable(...) +#define trace_irq_enable_rcuidle(...) +#define trace_irq_disable_rcuidle(...) #endif -#ifdef CONFIG_DEBUG_PREEMPT +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE DEFINE_EVENT(preemptirq_template, preempt_disable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); @@ -50,22 +55,22 @@ DEFINE_EVENT(preemptirq_template, preempt_disable, DEFINE_EVENT(preemptirq_template, preempt_enable, TP_PROTO(unsigned long ip, unsigned long parent_ip), TP_ARGS(ip, parent_ip)); +#else +#define trace_preempt_enable(...) +#define trace_preempt_disable(...) +#define trace_preempt_enable_rcuidle(...) +#define trace_preempt_disable_rcuidle(...) #endif #endif /* _TRACE_PREEMPTIRQ_H */ #include <trace/define_trace.h> -#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING) +#else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */ #define trace_irq_enable(...) #define trace_irq_disable(...) #define trace_irq_enable_rcuidle(...) #define trace_irq_disable_rcuidle(...) -#endif
-#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT) #define trace_preempt_enable(...) #define trace_preempt_disable(...) #define trace_preempt_enable_rcuidle(...) diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..44fe43be84c1 100644 --- a/init/main.c +++ b/init/main.c @@ -648,6 +648,9 @@ asmlinkage __visible void __init start_kernel(void) profile_init(); call_function_init(); WARN(!irqs_disabled(), "Interrupts were enabled early\n");
- lockdep_init_early();
- early_boot_irqs_disabled = false; local_irq_enable();
@@ -663,7 +666,7 @@ asmlinkage __visible void __init start_kernel(void) panic("Too many boot %s vars at `%s'", panic_later, panic_param);
- lockdep_info();
- lockdep_init();
/* * Need to run this when irqs are enabled, because it wants diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 5fa4d3138bf1..b961a1698e98 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -55,6 +55,7 @@ #include "lockdep_internals.h" +#include <trace/events/preemptirq.h> #define CREATE_TRACE_POINTS #include <trace/events/lock.h> @@ -2845,10 +2846,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip) debug_atomic_inc(hardirqs_on_events); } -__visible void trace_hardirqs_on_caller(unsigned long ip) +static void lockdep_hardirqs_on(void *none, unsigned long ignore,
unsigned long ip)
{
- time_hardirqs_on(CALLER_ADDR0, ip);
- if (unlikely(!debug_locks || current->lockdep_recursion)) return;
@@ -2887,23 +2887,15 @@ __visible void trace_hardirqs_on_caller(unsigned long ip) __trace_hardirqs_on_caller(ip); current->lockdep_recursion = 0; } -EXPORT_SYMBOL(trace_hardirqs_on_caller);
-void trace_hardirqs_on(void) -{
- trace_hardirqs_on_caller(CALLER_ADDR0);
-} -EXPORT_SYMBOL(trace_hardirqs_on); /*
- Hardirqs were disabled:
*/ -__visible void trace_hardirqs_off_caller(unsigned long ip) +static void lockdep_hardirqs_off(void *none, unsigned long ignore,
unsigned long ip)
{ struct task_struct *curr = current;
- time_hardirqs_off(CALLER_ADDR0, ip);
- if (unlikely(!debug_locks || current->lockdep_recursion)) return;
@@ -2925,13 +2917,6 @@ __visible void trace_hardirqs_off_caller(unsigned long ip) } else debug_atomic_inc(redundant_hardirqs_off); } -EXPORT_SYMBOL(trace_hardirqs_off_caller);
-void trace_hardirqs_off(void) -{
- trace_hardirqs_off_caller(CALLER_ADDR0);
-} -EXPORT_SYMBOL(trace_hardirqs_off); /*
- Softirqs will be enabled:
@@ -4338,7 +4323,15 @@ void lockdep_reset_lock(struct lockdep_map *lock) raw_local_irq_restore(flags); } -void __init lockdep_info(void) +void __init lockdep_init_early(void) +{ +#ifdef CONFIG_PROVE_LOCKING
- register_trace_prio_irq_disable(lockdep_hardirqs_off, NULL, INT_MAX);
- register_trace_prio_irq_enable(lockdep_hardirqs_on, NULL, INT_MIN);
+#endif +}
+void __init lockdep_init(void) { printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n"); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78d8facba456..4c956f6849ec 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3192,7 +3192,7 @@ static inline void sched_tick_stop(int cpu) { } #endif #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
defined(CONFIG_PREEMPT_TRACER))
defined(CONFIG_TRACE_PREEMPT_TOGGLE))
/*
- If the value passed in is equal to the current preempt count
- then we just disabled preemption. Start timing the latency.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index dcc0166d1997..8d51351e3149 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -82,6 +82,15 @@ config RING_BUFFER_ALLOW_SWAP Allow the use of ring_buffer_swap_cpu. Adds a very slight overhead to tracing when enabled. +config PREEMPTIRQ_TRACEPOINTS
- bool
- depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
- select TRACING
- default y
- help
Create preempt/irq toggle tracepoints if needed, so that other parts
of the kernel can use them to generate or add hooks to them.
# All tracer options should select GENERIC_TRACER. For those options that are # enabled by all tracers (context switch and event tracer) they select TRACING. # This allows those options to appear when no other tracer is selected. But the @@ -155,18 +164,20 @@ config FUNCTION_GRAPH_TRACER the return value. This is done by setting the current return address on the current task structure into a stack of calls. +config TRACE_PREEMPT_TOGGLE
- bool
- help
Enables hooks which will be called when preemption is first disabled,
and last enabled.
config PREEMPTIRQ_EVENTS bool "Enable trace events for preempt and irq disable/enable" select TRACE_IRQFLAGS
- depends on DEBUG_PREEMPT || !PROVE_LOCKING
- depends on TRACING
- select TRACE_PREEMPT_TOGGLE if PREEMPT
- select GENERIC_TRACER default n help Enable tracing of disable and enable events for preemption and irqs.
For tracing preempt disable/enable events, DEBUG_PREEMPT must be
enabled. For tracing irq disable/enable events, PROVE_LOCKING must
be disabled.
config IRQSOFF_TRACER bool "Interrupts-off Latency Tracer" @@ -203,6 +214,7 @@ config PREEMPT_TRACER select RING_BUFFER_ALLOW_SWAP select TRACER_SNAPSHOT select TRACER_SNAPSHOT_PER_CPU_SWAP
- select TRACE_PREEMPT_TOGGLE help This option measures the time spent in preemption-off critical sections, with microsecond accuracy.
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index e2538c7638d4..84a0cb222f20 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -35,7 +35,7 @@ obj-$(CONFIG_TRACING) += trace_printk.o obj-$(CONFIG_TRACING_MAP) += tracing_map.o obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o -obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o +obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index f8daa754cce2..770cd30cda40 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -16,7 +16,6 @@ #include "trace.h" -#define CREATE_TRACE_POINTS #include <trace/events/preemptirq.h> #if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER) @@ -450,66 +449,6 @@ void stop_critical_timings(void) } EXPORT_SYMBOL_GPL(stop_critical_timings); -#ifdef CONFIG_IRQSOFF_TRACER -#ifdef CONFIG_PROVE_LOCKING -void time_hardirqs_on(unsigned long a0, unsigned long a1) -{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(a0, a1);
-}
-void time_hardirqs_off(unsigned long a0, unsigned long a1) -{
- if (!preempt_trace() && irq_trace())
start_critical_timing(a0, a1);
-}
-#else /* !CONFIG_PROVE_LOCKING */
-/*
- We are only interested in hardirq on/off events:
- */
-static inline void tracer_hardirqs_on(void) -{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-static inline void tracer_hardirqs_off(void) -{
- if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
-}
-static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) -{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, caller_addr);
-}
-static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) -{
- if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, caller_addr);
-}
-#endif /* CONFIG_PROVE_LOCKING */ -#endif /* CONFIG_IRQSOFF_TRACER */
-#ifdef CONFIG_PREEMPT_TRACER -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) -{
- if (preempt_trace() && !irq_trace())
stop_critical_timing(a0, a1);
-}
-static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) -{
- if (preempt_trace() && !irq_trace())
start_critical_timing(a0, a1);
-} -#endif /* CONFIG_PREEMPT_TRACER */
#ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; @@ -659,15 +598,34 @@ static void irqsoff_tracer_stop(struct trace_array *tr) } #ifdef CONFIG_IRQSOFF_TRACER +/*
- We are only interested in hardirq on/off events:
- */
+static void tracer_hardirqs_on(void *none, unsigned long a0, unsigned long a1) +{
- if (!preempt_trace() && irq_trace())
stop_critical_timing(a0, a1);
+}
+static void tracer_hardirqs_off(void *none, unsigned long a0, unsigned long a1) +{
- if (!preempt_trace() && irq_trace())
start_critical_timing(a0, a1);
+}
static int irqsoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_IRQS_OFF;
- register_trace_irq_disable(tracer_hardirqs_off, NULL);
- register_trace_irq_enable(tracer_hardirqs_on, NULL); return __irqsoff_tracer_init(tr);
} static void irqsoff_tracer_reset(struct trace_array *tr) {
- unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
- unregister_trace_irq_enable(tracer_hardirqs_on, NULL); __irqsoff_tracer_reset(tr);
} @@ -690,21 +648,34 @@ static struct tracer irqsoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, }; -# define register_irqsoff(trace) register_tracer(&trace) -#else -# define register_irqsoff(trace) do { } while (0) -#endif +#endif /* CONFIG_IRQSOFF_TRACER */ #ifdef CONFIG_PREEMPT_TRACER +static void tracer_preempt_on(void *none, unsigned long a0, unsigned long a1) +{
- if (preempt_trace() && !irq_trace())
stop_critical_timing(a0, a1);
+}
+static void tracer_preempt_off(void *none, unsigned long a0, unsigned long a1) +{
- if (preempt_trace() && !irq_trace())
start_critical_timing(a0, a1);
+}
static int preemptoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_PREEMPT_OFF;
- register_trace_preempt_disable(tracer_preempt_off, NULL);
- register_trace_preempt_enable(tracer_preempt_on, NULL); return __irqsoff_tracer_init(tr);
} static void preemptoff_tracer_reset(struct trace_array *tr) {
- unregister_trace_preempt_disable(tracer_preempt_off, NULL);
- unregister_trace_preempt_enable(tracer_preempt_on, NULL); __irqsoff_tracer_reset(tr);
} @@ -727,23 +698,29 @@ static struct tracer preemptoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, }; -# define register_preemptoff(trace) register_tracer(&trace) -#else -# define register_preemptoff(trace) do { } while (0) -#endif +#endif /* CONFIG_PREEMPT_TRACER */ -#if defined(CONFIG_IRQSOFF_TRACER) && \
- defined(CONFIG_PREEMPT_TRACER)
+#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER) static int preemptirqsoff_tracer_init(struct trace_array *tr) { trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
- register_trace_irq_disable(tracer_hardirqs_off, NULL);
- register_trace_irq_enable(tracer_hardirqs_on, NULL);
- register_trace_preempt_disable(tracer_preempt_off, NULL);
- register_trace_preempt_enable(tracer_preempt_on, NULL);
- return __irqsoff_tracer_init(tr);
} static void preemptirqsoff_tracer_reset(struct trace_array *tr) {
- unregister_trace_irq_disable(tracer_hardirqs_off, NULL);
- unregister_trace_irq_enable(tracer_hardirqs_on, NULL);
- unregister_trace_preempt_disable(tracer_preempt_off, NULL);
- unregister_trace_preempt_enable(tracer_preempt_on, NULL);
- __irqsoff_tracer_reset(tr);
} @@ -766,115 +743,21 @@ static struct tracer preemptirqsoff_tracer __read_mostly = .allow_instances = true, .use_max_tr = true, };
-# define register_preemptirqsoff(trace) register_tracer(&trace) -#else -# define register_preemptirqsoff(trace) do { } while (0) #endif __init static int init_irqsoff_tracer(void) {
- register_irqsoff(irqsoff_tracer);
- register_preemptoff(preemptoff_tracer);
- register_preemptirqsoff(preemptirqsoff_tracer);
- return 0;
-} -core_initcall(init_irqsoff_tracer); -#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
-#ifndef CONFIG_IRQSOFF_TRACER -static inline void tracer_hardirqs_on(void) { } -static inline void tracer_hardirqs_off(void) { } -static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { } -static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { } +#ifdef CONFIG_IRQSOFF_TRACER
- register_tracer(&irqsoff_tracer);
#endif
-#ifndef CONFIG_PREEMPT_TRACER -static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { } -static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { } +#ifdef CONFIG_PREEMPT_TRACER
- register_tracer(&preemptoff_tracer);
#endif
-#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING) -/* Per-cpu variable to prevent redundant calls when IRQs already off */ -static DEFINE_PER_CPU(int, tracing_irq_cpu);
-void trace_hardirqs_on(void) -{
- if (!this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
- tracer_hardirqs_on();
- this_cpu_write(tracing_irq_cpu, 0);
-} -EXPORT_SYMBOL(trace_hardirqs_on);
-void trace_hardirqs_off(void) -{
- if (this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
- tracer_hardirqs_off();
-} -EXPORT_SYMBOL(trace_hardirqs_off);
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr) -{
- if (!this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
- tracer_hardirqs_on_caller(caller_addr);
- this_cpu_write(tracing_irq_cpu, 0);
-} -EXPORT_SYMBOL(trace_hardirqs_on_caller);
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr) -{
- if (this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
- tracer_hardirqs_off_caller(caller_addr);
-} -EXPORT_SYMBOL(trace_hardirqs_off_caller);
-/*
- Stubs:
- */
-void trace_softirqs_on(unsigned long ip) -{ -}
-void trace_softirqs_off(unsigned long ip) -{ -}
-inline void print_irqtrace_events(struct task_struct *curr) -{ -} +#if defined(CONFIG_IRQSOFF_TRACER) && defined(CONFIG_PREEMPT_TRACER)
- register_tracer(&preemptirqsoff_tracer);
#endif -#if defined(CONFIG_PREEMPT_TRACER) || \
- (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
-void trace_preempt_on(unsigned long a0, unsigned long a1) -{
- trace_preempt_enable_rcuidle(a0, a1);
- tracer_preempt_on(a0, a1);
-}
-void trace_preempt_off(unsigned long a0, unsigned long a1) -{
- trace_preempt_disable_rcuidle(a0, a1);
- tracer_preempt_off(a0, a1);
- return 0;
} -#endif +core_initcall(init_irqsoff_tracer); +#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */ diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c new file mode 100644 index 000000000000..dc01c7f4d326 --- /dev/null +++ b/kernel/trace/trace_preemptirq.c @@ -0,0 +1,71 @@ +/*
- preemptoff and irqoff tracepoints
- Copyright (C) Joel Fernandes (Google) joel@joelfernandes.org
- */
+#include <linux/kallsyms.h> +#include <linux/uaccess.h> +#include <linux/module.h> +#include <linux/ftrace.h>
+#define CREATE_TRACE_POINTS +#include <trace/events/preemptirq.h>
+#ifdef CONFIG_TRACE_IRQFLAGS +/* Per-cpu variable to prevent redundant calls when IRQs already off */ +static DEFINE_PER_CPU(int, tracing_irq_cpu);
+void trace_hardirqs_on(void) +{
- if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
- this_cpu_write(tracing_irq_cpu, 0);
+} +EXPORT_SYMBOL(trace_hardirqs_on);
+void trace_hardirqs_off(void) +{
- if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+} +EXPORT_SYMBOL(trace_hardirqs_off);
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr) +{
- if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
- this_cpu_write(tracing_irq_cpu, 0);
+} +EXPORT_SYMBOL(trace_hardirqs_on_caller);
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr) +{
- if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+} +EXPORT_SYMBOL(trace_hardirqs_off_caller); +#endif /* CONFIG_TRACE_IRQFLAGS */
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+void trace_preempt_on(unsigned long a0, unsigned long a1) +{
- trace_preempt_enable_rcuidle(a0, a1);
+}
+void trace_preempt_off(unsigned long a0, unsigned long a1) +{
- trace_preempt_disable_rcuidle(a0, a1);
+} +#endif
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 28 Jun 2018 11:21:47 -0700 Joel Fernandes joel@joelfernandes.org wrote:
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c new file mode 100644 index 000000000000..dc01c7f4d326 --- /dev/null +++ b/kernel/trace/trace_preemptirq.c @@ -0,0 +1,71 @@
Can you send a patch on top of this, that adds a SPDX header here. Just add another patch, no need to resend this one.
I need to go through all the files in kernel/trace/* and add SPDX headers. I don't want to add more files that don't have them.
I'm still playing around with this patch, and testing it.
-- Steve
+/*
- preemptoff and irqoff tracepoints
- Copyright (C) Joel Fernandes (Google) joel@joelfernandes.org
- */
+#include <linux/kallsyms.h> +#include <linux/uaccess.h> +#include <linux/module.h> +#include <linux/ftrace.h>
+#define CREATE_TRACE_POINTS +#include <trace/events/preemptirq.h>
+#ifdef CONFIG_TRACE_IRQFLAGS +/* Per-cpu variable to prevent redundant calls when IRQs already off */ +static DEFINE_PER_CPU(int, tracing_irq_cpu);
+void trace_hardirqs_on(void) +{
- if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
- this_cpu_write(tracing_irq_cpu, 0);
+} +EXPORT_SYMBOL(trace_hardirqs_on);
+void trace_hardirqs_off(void) +{
- if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+} +EXPORT_SYMBOL(trace_hardirqs_off);
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr) +{
- if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
return;
- trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
- this_cpu_write(tracing_irq_cpu, 0);
+} +EXPORT_SYMBOL(trace_hardirqs_on_caller);
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr) +{
- if (lockdep_recursing(current) || this_cpu_read(tracing_irq_cpu))
return;
- this_cpu_write(tracing_irq_cpu, 1);
- trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+} +EXPORT_SYMBOL(trace_hardirqs_off_caller); +#endif /* CONFIG_TRACE_IRQFLAGS */
+#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
+void trace_preempt_on(unsigned long a0, unsigned long a1) +{
- trace_preempt_enable_rcuidle(a0, a1);
+}
+void trace_preempt_off(unsigned long a0, unsigned long a1) +{
- trace_preempt_disable_rcuidle(a0, a1);
+} +#endif
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 10, 2018 at 10:20:50AM -0400, Steven Rostedt wrote:
On Thu, 28 Jun 2018 11:21:47 -0700 Joel Fernandes joel@joelfernandes.org wrote:
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c new file mode 100644 index 000000000000..dc01c7f4d326 --- /dev/null +++ b/kernel/trace/trace_preemptirq.c @@ -0,0 +1,71 @@
Can you send a patch on top of this, that adds a SPDX header here. Just add another patch, no need to resend this one.
I need to go through all the files in kernel/trace/* and add SPDX headers. I don't want to add more files that don't have them.
Sure, I'll send a patch ontop of this.
I'm still playing around with this patch, and testing it.
Ok, thanks.
-Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 28, 2018 at 11:21:47AM -0700, Joel Fernandes wrote:
One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion
I'm not seeing any new lockdep_recursion checks...
protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things.
Would it not be much easier to avoid that entirely, afaict all get/put_lock_stats() callers already have IRQs disabled, so that (traced) preempt fiddling is entirely superfluous.
--- diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 5fa4d3138bf1..8f5ce0048d15 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -248,12 +248,7 @@ void clear_lock_stats(struct lock_class *class)
static struct lock_class_stats *get_lock_stats(struct lock_class *class) { - return &get_cpu_var(cpu_lock_stats)[class - lock_classes]; -} - -static void put_lock_stats(struct lock_class_stats *stats) -{ - put_cpu_var(cpu_lock_stats); + return &this_cpu_ptr(&cpu_lock_stats)[class - lock_classes]; }
static void lock_release_holdtime(struct held_lock *hlock) @@ -271,7 +266,6 @@ static void lock_release_holdtime(struct held_lock *hlock) lock_time_inc(&stats->read_holdtime, holdtime); else lock_time_inc(&stats->write_holdtime, holdtime); - put_lock_stats(stats); } #else static inline void lock_release_holdtime(struct held_lock *hlock) @@ -4090,7 +4084,6 @@ __lock_contended(struct lockdep_map *lock, unsigned long ip) stats->contending_point[contending_point]++; if (lock->cpu != smp_processor_id()) stats->bounces[bounce_contended + !!hlock->read]++; - put_lock_stats(stats); }
static void @@ -4138,7 +4131,6 @@ __lock_acquired(struct lockdep_map *lock, unsigned long ip) } if (lock->cpu != cpu) stats->bounces[bounce_acquired + !!hlock->read]++; - put_lock_stats(stats);
lock->cpu = cpu; lock->ip = ip; -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 15:12:56 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:47AM -0700, Joel Fernandes wrote:
One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion
I'm not seeing any new lockdep_recursion checks...
I believe he's talking about this part:
+void trace_hardirqs_on(void) +{ + if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu)) + return; +
[etc]
protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things.
Would it not be much easier to avoid that entirely, afaict all get/put_lock_stats() callers already have IRQs disabled, so that (traced) preempt fiddling is entirely superfluous.
Agreed. Looks like a good clean up.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 Jul 2018 09:19:44 -0400 Steven Rostedt rostedt@goodmis.org wrote:
On Wed, 11 Jul 2018 15:12:56 +0200 Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jun 28, 2018 at 11:21:47AM -0700, Joel Fernandes wrote:
One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion
I'm not seeing any new lockdep_recursion checks...
I believe he's talking about this part:
+void trace_hardirqs_on(void) +{
- if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu))
return;
And the reason he said this is new, IIUC, is because the old way we could still do irqsoff tracing even if lockdep_recursion is set. Now, irqsoff tracing is disable within lockdep_recursion.
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 09:19:44AM -0400, Steven Rostedt wrote:
protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things.
Would it not be much easier to avoid that entirely, afaict all get/put_lock_stats() callers already have IRQs disabled, so that (traced) preempt fiddling is entirely superfluous.
Agreed. Looks like a good clean up.
So actually with or without the clean up, I don't see any issues with dropping lockdep_recursing in my tests at the moment. I'm not sure something else changed between then and now causing the issue to go away. I can include Peter's clean up in my series though if he's Ok with it since you guys agree its a good clean up anyway. Would you prefer I did that, and then also dropped the lockdep_recursing checks? Or should I keep the lockdep_recursing() checks just to be safe? Do you see cases where you want irqsoff tracing while lockdep_recursing() is true?
thanks,
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 12 Jul 2018 01:38:05 -0700 Joel Fernandes joel@joelfernandes.org wrote:
So actually with or without the clean up, I don't see any issues with dropping lockdep_recursing in my tests at the moment. I'm not sure something else changed between then and now causing the issue to go away. I can include Peter's clean up in my series though if he's Ok with it since you guys agree its a good clean up anyway. Would you prefer I did that, and then also dropped the lockdep_recursing checks? Or should I keep the lockdep_recursing() checks just to be safe? Do you see cases where you want irqsoff tracing while lockdep_recursing() is true?
I say rewrite it as per Peter's suggestion. Perhaps even add credit to Peter like:
Cleaned-up-code-by: Peter Zijlstra peterz@infradead.org
;-)
And yes, I would recommend dropping the lockdep_recursion() if you can't trigger issues from within your tests. If it shows up later, we can always add it back.
Thanks!
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 11, 2018 at 03:12:56PM +0200, Peter Zijlstra wrote:
On Thu, Jun 28, 2018 at 11:21:47AM -0700, Joel Fernandes wrote:
One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion
I'm not seeing any new lockdep_recursion checks...
I meant its this part:
void trace_hardirqs_on(void) { if (lockdep_recursing(current) || !this_cpu_read(tracing_irq_cpu)) return;
protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things.
Would it not be much easier to avoid that entirely, afaict all get/put_lock_stats() callers already have IRQs disabled, so that (traced) preempt fiddling is entirely superfluous.
Let me try to apply Peter's diff and see if I still don't need lockdep recursion checking. I believe it is still harmless to still check for lockdep recursion just to be safe. But I'll give it a try and let you know how it goes.
thanks!
- Joel
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 5fa4d3138bf1..8f5ce0048d15 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -248,12 +248,7 @@ void clear_lock_stats(struct lock_class *class) static struct lock_class_stats *get_lock_stats(struct lock_class *class) {
- return &get_cpu_var(cpu_lock_stats)[class - lock_classes];
-}
-static void put_lock_stats(struct lock_class_stats *stats) -{
- put_cpu_var(cpu_lock_stats);
- return &this_cpu_ptr(&cpu_lock_stats)[class - lock_classes];
} static void lock_release_holdtime(struct held_lock *hlock) @@ -271,7 +266,6 @@ static void lock_release_holdtime(struct held_lock *hlock) lock_time_inc(&stats->read_holdtime, holdtime); else lock_time_inc(&stats->write_holdtime, holdtime);
- put_lock_stats(stats);
} #else static inline void lock_release_holdtime(struct held_lock *hlock) @@ -4090,7 +4084,6 @@ __lock_contended(struct lockdep_map *lock, unsigned long ip) stats->contending_point[contending_point]++; if (lock->cpu != smp_processor_id()) stats->bounces[bounce_contended + !!hlock->read]++;
- put_lock_stats(stats);
} static void @@ -4138,7 +4131,6 @@ __lock_acquired(struct lockdep_map *lock, unsigned long ip) } if (lock->cpu != cpu) stats->bounces[bounce_acquired + !!hlock->read]++;
- put_lock_stats(stats);
lock->cpu = cpu;
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Joel Fernandes (Google)" joel@joelfernandes.org
In this patch we introduce a test module for simulating a long atomic section in the kernel which the preemptoff or irqsoff tracers can detect. This module is to be used only for test purposes and is default disabled.
Following is the expected output (only briefly shown) that can be parsed to verify that the tracers are working correctly. We will use this from the kselftests in future patches.
For the preemptoff tracer:
echo preemptoff > /d/tracing/current_tracer sleep 1 insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000 sleep 1 bash-4.3# cat /d/tracing/trace preempt -1066 2...2 0us@: atomic_sect_run <-atomic_sect_run preempt -1066 2...2 500002us : atomic_sect_run <-atomic_sect_run preempt -1066 2...2 500004us : tracer_preempt_on <-atomic_sect_run preempt -1066 2...2 500012us : <stack trace> => kthread => ret_from_fork
For the irqsoff tracer:
echo irqsoff > /d/tracing/current_tracer sleep 1 insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000 sleep 1 bash-4.3# cat /d/tracing/trace irq dis -1069 1d..1 0us@: atomic_sect_run irq dis -1069 1d..1 500001us : atomic_sect_run irq dis -1069 1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run irq dis -1069 1d..1 500005us : <stack trace> => ret_from_fork
Co-developed-by: Erick Reyes erickreyes@google.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- lib/Kconfig.debug | 8 ++++ lib/Makefile | 1 + lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+) create mode 100644 lib/test_atomic_sections.c
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 8838d1158d19..622c90e1e066 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1956,6 +1956,14 @@ config TEST_KMOD
If unsure, say N.
+config TEST_ATOMIC_SECTIONS + tristate "Simulate atomic sections for tracers to detect" + depends on m + help + Select this option to build a test module that can help test atomic + sections by simulating them with a duration supplied as a module + parameter. Preempt disable and irq disable modes can be requested. + config TEST_DEBUG_VIRTUAL tristate "Test CONFIG_DEBUG_VIRTUAL feature" depends on DEBUG_VIRTUAL diff --git a/lib/Makefile b/lib/Makefile index 90dc5520b784..7831e747bf72 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -44,6 +44,7 @@ obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += hexdump.o obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o obj-y += kstrtox.o obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o obj-$(CONFIG_TEST_BPF) += test_bpf.o diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c new file mode 100644 index 000000000000..1eef518f0974 --- /dev/null +++ b/lib/test_atomic_sections.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Atomic section emulation test module + * + * Emulates atomic sections by disabling IRQs or preemption + * and doing a busy wait for a specified amount of time. + * This can be used for testing of different atomic section + * tracers such as irqsoff tracers. + * + * (c) 2018. Google LLC + */ + +#include <linux/delay.h> +#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/kthread.h> +#include <linux/ktime.h> +#include <linux/module.h> +#include <linux/printk.h> +#include <linux/string.h> + +static ulong atomic_time = 100; +static char atomic_mode[10] = "irq"; + +module_param_named(atomic_time, atomic_time, ulong, S_IRUGO); +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO); +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)"); +MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)"); + +static void busy_wait(ulong time) +{ + ktime_t start, end; + start = ktime_get(); + do { + end = ktime_get(); + if (kthread_should_stop()) + break; + } while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000)); +} + +int atomic_sect_run(void *data) +{ + unsigned long flags; + + if (!strcmp(atomic_mode, "irq")) { + local_irq_save(flags); + busy_wait(atomic_time); + local_irq_restore(flags); + } else if (!strcmp(atomic_mode, "preempt")) { + preempt_disable(); + busy_wait(atomic_time); + preempt_enable(); + } + + return 0; +} + +static int __init atomic_sect_init(void) +{ + char task_name[50]; + struct task_struct *test_task; + + snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode); + + test_task = kthread_run(atomic_sect_run, NULL, task_name); + return PTR_ERR_OR_ZERO(test_task); +} + +static void __exit atomic_sect_exit(void) +{ + return; +} + +module_init(atomic_sect_init) +module_exit(atomic_sect_exit) +MODULE_LICENSE("GPL v2");
On Thu, 28 Jun 2018 11:21:48 -0700 Joel Fernandes joel@joelfernandes.org wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
In this patch we introduce a test module for simulating a long atomic section in the kernel which the preemptoff or irqsoff tracers can detect. This module is to be used only for test purposes and is default disabled.
Following is the expected output (only briefly shown) that can be parsed to verify that the tracers are working correctly. We will use this from the kselftests in future patches.
For the preemptoff tracer:
echo preemptoff > /d/tracing/current_tracer sleep 1 insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000 sleep 1 bash-4.3# cat /d/tracing/trace preempt -1066 2...2 0us@: atomic_sect_run <-atomic_sect_run preempt -1066 2...2 500002us : atomic_sect_run <-atomic_sect_run preempt -1066 2...2 500004us : tracer_preempt_on <-atomic_sect_run preempt -1066 2...2 500012us : <stack trace> => kthread => ret_from_fork
For the irqsoff tracer:
echo irqsoff > /d/tracing/current_tracer sleep 1 insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000 sleep 1 bash-4.3# cat /d/tracing/trace irq dis -1069 1d..1 0us@: atomic_sect_run irq dis -1069 1d..1 500001us : atomic_sect_run irq dis -1069 1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run irq dis -1069 1d..1 500005us : <stack trace> => ret_from_fork
Co-developed-by: Erick Reyes erickreyes@google.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
lib/Kconfig.debug | 8 ++++ lib/Makefile | 1 + lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
I think this code should reside in kernel/trace directory. I already have modules there. See the ring_buffer_benchmark code and the test module for mmio tracer.
3 files changed, 86 insertions(+) create mode 100644 lib/test_atomic_sections.c
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 8838d1158d19..622c90e1e066 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1956,6 +1956,14 @@ config TEST_KMOD If unsure, say N. +config TEST_ATOMIC_SECTIONS
- tristate "Simulate atomic sections for tracers to detect"
Hmm, I don't like this title. It's not very obvious to what it is about. What about "Preempt / IRQ disable delay thread to test latency tracers" ? Or something along those lines.
- depends on m
- help
Select this option to build a test module that can help test atomic
sections by simulating them with a duration supplied as a module
parameter. Preempt disable and irq disable modes can be requested.
"If unsure say N"
config TEST_DEBUG_VIRTUAL tristate "Test CONFIG_DEBUG_VIRTUAL feature" depends on DEBUG_VIRTUAL diff --git a/lib/Makefile b/lib/Makefile index 90dc5520b784..7831e747bf72 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -44,6 +44,7 @@ obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += hexdump.o obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o obj-y += kstrtox.o obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o obj-$(CONFIG_TEST_BPF) += test_bpf.o diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c new file mode 100644 index 000000000000..1eef518f0974 --- /dev/null +++ b/lib/test_atomic_sections.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Atomic section emulation test module
- Emulates atomic sections by disabling IRQs or preemption
- and doing a busy wait for a specified amount of time.
- This can be used for testing of different atomic section
- tracers such as irqsoff tracers.
- (c) 2018. Google LLC
- */
+#include <linux/delay.h> +#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/kthread.h> +#include <linux/ktime.h> +#include <linux/module.h> +#include <linux/printk.h> +#include <linux/string.h>
+static ulong atomic_time = 100; +static char atomic_mode[10] = "irq";
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO); +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO); +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
It's not a "Period", it's a delay. "Length of time in critical section"
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
"Mode of the test: preempt or irq disabled (default irq)"
+static void busy_wait(ulong time) +{
- ktime_t start, end;
- start = ktime_get();
- do {
end = ktime_get();
if (kthread_should_stop())
break;
- } while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+int atomic_sect_run(void *data) +{
- unsigned long flags;
- if (!strcmp(atomic_mode, "irq")) {
local_irq_save(flags);
busy_wait(atomic_time);
local_irq_restore(flags);
- } else if (!strcmp(atomic_mode, "preempt")) {
preempt_disable();
busy_wait(atomic_time);
preempt_enable();
- }
So this is a one shot deal? That should be explained somewhere, probably in the config help message. In fact, I think the config help message should show how to use this.
-- Steve
- return 0;
+}
+static int __init atomic_sect_init(void) +{
- char task_name[50];
- struct task_struct *test_task;
- snprintf(task_name, sizeof(task_name), "%s_test", atomic_mode);
- test_task = kthread_run(atomic_sect_run, NULL, task_name);
- return PTR_ERR_OR_ZERO(test_task);
+}
+static void __exit atomic_sect_exit(void) +{
- return;
+}
+module_init(atomic_sect_init) +module_exit(atomic_sect_exit) +MODULE_LICENSE("GPL v2");
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 10, 2018 at 08:47:07PM -0400, Steven Rostedt wrote:
On Thu, 28 Jun 2018 11:21:48 -0700 Joel Fernandes joel@joelfernandes.org wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
In this patch we introduce a test module for simulating a long atomic section in the kernel which the preemptoff or irqsoff tracers can detect. This module is to be used only for test purposes and is default disabled.
Following is the expected output (only briefly shown) that can be parsed to verify that the tracers are working correctly. We will use this from the kselftests in future patches.
For the preemptoff tracer:
echo preemptoff > /d/tracing/current_tracer sleep 1 insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=500000 sleep 1 bash-4.3# cat /d/tracing/trace preempt -1066 2...2 0us@: atomic_sect_run <-atomic_sect_run preempt -1066 2...2 500002us : atomic_sect_run <-atomic_sect_run preempt -1066 2...2 500004us : tracer_preempt_on <-atomic_sect_run preempt -1066 2...2 500012us : <stack trace> => kthread => ret_from_fork
For the irqsoff tracer:
echo irqsoff > /d/tracing/current_tracer sleep 1 insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=500000 sleep 1 bash-4.3# cat /d/tracing/trace irq dis -1069 1d..1 0us@: atomic_sect_run irq dis -1069 1d..1 500001us : atomic_sect_run irq dis -1069 1d..1 500002us : tracer_hardirqs_on <-atomic_sect_run irq dis -1069 1d..1 500005us : <stack trace> => ret_from_fork
Co-developed-by: Erick Reyes erickreyes@google.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
lib/Kconfig.debug | 8 ++++ lib/Makefile | 1 + lib/test_atomic_sections.c | 77 ++++++++++++++++++++++++++++++++++++++
I think this code should reside in kernel/trace directory. I already have modules there. See the ring_buffer_benchmark code and the test module for mmio tracer.
Ok, I'll move it to there.
3 files changed, 86 insertions(+) create mode 100644 lib/test_atomic_sections.c
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 8838d1158d19..622c90e1e066 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1956,6 +1956,14 @@ config TEST_KMOD If unsure, say N. +config TEST_ATOMIC_SECTIONS
- tristate "Simulate atomic sections for tracers to detect"
Hmm, I don't like this title. It's not very obvious to what it is about. What about "Preempt / IRQ disable delay thread to test latency tracers" ? Or something along those lines.
Sure, I'll change it to that. I agree its better. I'll change the text to that and call the config TEST_PREEMPT_IRQ_DISABLE_DELAY.
- depends on m
- help
Select this option to build a test module that can help test atomic
sections by simulating them with a duration supplied as a module
parameter. Preempt disable and irq disable modes can be requested.
"If unsure say N"
Sure, sounds good.
config TEST_DEBUG_VIRTUAL tristate "Test CONFIG_DEBUG_VIRTUAL feature" depends on DEBUG_VIRTUAL diff --git a/lib/Makefile b/lib/Makefile index 90dc5520b784..7831e747bf72 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -44,6 +44,7 @@ obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += hexdump.o obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o obj-y += kstrtox.o obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o obj-$(CONFIG_TEST_BPF) += test_bpf.o diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c new file mode 100644 index 000000000000..1eef518f0974 --- /dev/null +++ b/lib/test_atomic_sections.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Atomic section emulation test module
- Emulates atomic sections by disabling IRQs or preemption
- and doing a busy wait for a specified amount of time.
- This can be used for testing of different atomic section
- tracers such as irqsoff tracers.
- (c) 2018. Google LLC
- */
+#include <linux/delay.h> +#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/kernel.h> +#include <linux/kthread.h> +#include <linux/ktime.h> +#include <linux/module.h> +#include <linux/printk.h> +#include <linux/string.h>
+static ulong atomic_time = 100; +static char atomic_mode[10] = "irq";
+module_param_named(atomic_time, atomic_time, ulong, S_IRUGO); +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO); +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
It's not a "Period", it's a delay. "Length of time in critical section"
Sure.
+MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq (default irq)");
"Mode of the test: preempt or irq disabled (default irq)"
Ok.
+static void busy_wait(ulong time) +{
- ktime_t start, end;
- start = ktime_get();
- do {
end = ktime_get();
if (kthread_should_stop())
break;
- } while (ktime_to_ns(ktime_sub(end, start)) < (time * 1000));
+}
+int atomic_sect_run(void *data) +{
- unsigned long flags;
- if (!strcmp(atomic_mode, "irq")) {
local_irq_save(flags);
busy_wait(atomic_time);
local_irq_restore(flags);
- } else if (!strcmp(atomic_mode, "preempt")) {
preempt_disable();
busy_wait(atomic_time);
preempt_enable();
- }
So this is a one shot deal? That should be explained somewhere, probably in the config help message. In fact, I think the config help message should show how to use this.
Sounds good, I'll clarify it better. Thanks!
- Joel
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Joel Fernandes (Google)" joel@joelfernandes.org
Here we add unit tests for the preemptoff and irqsoff tracer by using a kernel module introduced previously to trigger atomic sections in the kernel.
Reviewed-by: Masami Hiramatsu mhiramat@kernel.org Acked-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- tools/testing/selftests/ftrace/config | 3 + .../test.d/preemptirq/irqsoff_tracer.tc | 73 +++++++++++++++++++ 2 files changed, 76 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
diff --git a/tools/testing/selftests/ftrace/config b/tools/testing/selftests/ftrace/config index b01924c71c09..29588b328345 100644 --- a/tools/testing/selftests/ftrace/config +++ b/tools/testing/selftests/ftrace/config @@ -4,3 +4,6 @@ CONFIG_FUNCTION_PROFILER=y CONFIG_TRACER_SNAPSHOT=y CONFIG_STACK_TRACER=y CONFIG_HIST_TRIGGERS=y +CONFIG_PREEMPT_TRACER=y +CONFIG_IRQSOFF_TRACER=y +CONFIG_TEST_ATOMIC_SECTIONS=m diff --git a/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc new file mode 100644 index 000000000000..1806d340035d --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc @@ -0,0 +1,73 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: test for the preemptirqsoff tracer + +MOD=test_atomic_sections + +fail() { + reset_tracer + rmmod $MOD || true + exit_fail +} + +unsup() { #msg + reset_tracer + rmmod $MOD || true + echo $1 + exit_unsupported +} + +modprobe $MOD || unsup "$MOD module not available" +rmmod $MOD + +grep -q "preemptoff" available_tracers || unsup "preemptoff tracer not enabled" +grep -q "irqsoff" available_tracers || unsup "irqsoff tracer not enabled" + +reset_tracer + +# Simulate preemptoff section for half a second couple of times +echo preemptoff > current_tracer +sleep 1 +modprobe $MOD atomic_mode=preempt atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=preempt atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=preempt atomic_time=500000 || fail +rmmod $MOD || fail + +cat trace + +# Confirm which tracer +grep -q "tracer: preemptoff" trace || fail + +# Check the end of the section +egrep -q "5.....us : <stack trace>" trace || fail + +# Check for 500ms of latency +egrep -q "latency: 5..... us" trace || fail + +reset_tracer + +# Simulate irqsoff section for half a second couple of times +echo irqsoff > current_tracer +sleep 1 +modprobe $MOD atomic_mode=irq atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=irq atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=irq atomic_time=500000 || fail +rmmod $MOD || fail + +cat trace + +# Confirm which tracer +grep -q "tracer: irqsoff" trace || fail + +# Check the end of the section +egrep -q "5.....us : <stack trace>" trace || fail + +# Check for 500ms of latency +egrep -q "latency: 5..... us" trace || fail + +reset_tracer +exit 0
On Thu, 28 Jun 2018 11:21:49 -0700 Joel Fernandes joel@joelfernandes.org wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
Here we add unit tests for the preemptoff and irqsoff tracer by using a kernel module introduced previously to trigger atomic sections in the kernel.
Reviewed-by: Masami Hiramatsu mhiramat@kernel.org Acked-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
This looks fine. The only patch that needs to be changed and resent is patch 6 and 7. Just send 6, and this one again because it depends on patch 6.
I'll go ahead and apply 1-5 and kick off my other tests.
Thanks!
-- Steve
tools/testing/selftests/ftrace/config | 3 + .../test.d/preemptirq/irqsoff_tracer.tc | 73 +++++++++++++++++++ 2 files changed, 76 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
diff --git a/tools/testing/selftests/ftrace/config b/tools/testing/selftests/ftrace/config index b01924c71c09..29588b328345 100644 --- a/tools/testing/selftests/ftrace/config +++ b/tools/testing/selftests/ftrace/config @@ -4,3 +4,6 @@ CONFIG_FUNCTION_PROFILER=y CONFIG_TRACER_SNAPSHOT=y CONFIG_STACK_TRACER=y CONFIG_HIST_TRIGGERS=y +CONFIG_PREEMPT_TRACER=y +CONFIG_IRQSOFF_TRACER=y +CONFIG_TEST_ATOMIC_SECTIONS=m diff --git a/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc new file mode 100644 index 000000000000..1806d340035d --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc @@ -0,0 +1,73 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: test for the preemptirqsoff tracer
+MOD=test_atomic_sections
+fail() {
- reset_tracer
- rmmod $MOD || true
- exit_fail
+}
+unsup() { #msg
- reset_tracer
- rmmod $MOD || true
- echo $1
- exit_unsupported
+}
+modprobe $MOD || unsup "$MOD module not available" +rmmod $MOD
+grep -q "preemptoff" available_tracers || unsup "preemptoff tracer not enabled" +grep -q "irqsoff" available_tracers || unsup "irqsoff tracer not enabled"
+reset_tracer
+# Simulate preemptoff section for half a second couple of times +echo preemptoff > current_tracer +sleep 1 +modprobe $MOD atomic_mode=preempt atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=preempt atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=preempt atomic_time=500000 || fail +rmmod $MOD || fail
+cat trace
+# Confirm which tracer +grep -q "tracer: preemptoff" trace || fail
+# Check the end of the section +egrep -q "5.....us : <stack trace>" trace || fail
+# Check for 500ms of latency +egrep -q "latency: 5..... us" trace || fail
+reset_tracer
+# Simulate irqsoff section for half a second couple of times +echo irqsoff > current_tracer +sleep 1 +modprobe $MOD atomic_mode=irq atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=irq atomic_time=500000 || fail +rmmod $MOD || fail +modprobe $MOD atomic_mode=irq atomic_time=500000 || fail +rmmod $MOD || fail
+cat trace
+# Confirm which tracer +grep -q "tracer: irqsoff" trace || fail
+# Check the end of the section +egrep -q "5.....us : <stack trace>" trace || fail
+# Check for 500ms of latency +egrep -q "latency: 5..... us" trace || fail
+reset_tracer +exit 0
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 10, 2018 at 08:49:58PM -0400, Steven Rostedt wrote:
On Thu, 28 Jun 2018 11:21:49 -0700 Joel Fernandes joel@joelfernandes.org wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
Here we add unit tests for the preemptoff and irqsoff tracer by using a kernel module introduced previously to trigger atomic sections in the kernel.
Reviewed-by: Masami Hiramatsu mhiramat@kernel.org Acked-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
This looks fine. The only patch that needs to be changed and resent is patch 6 and 7. Just send 6, and this one again because it depends on patch 6.
I'll go ahead and apply 1-5 and kick off my other tests.
Sounds good, I'll resend those shortly with the changes you suggested.
Thanks!
- Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 28, 2018 at 11:21:42AM -0700, Joel Fernandes wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
This is a posting of v9 preempt/irq tracepoint clean up series rebased onto v4.18-rc2. No changes in the series, just a rebase + repost.
All patches have a Reviewed-by tags now from reviewers. This series has been well tested and is a simplification/refactoring of existing code, along with giving a speed-up for tracepoints using the rcu-idle API. With this our users will find it easier to use tools depending on existing preempt tracepoints since it simplifies the configuration for them.
Steve, all patches of this tracing series have been reviewed and/or acked and there haven't been additional changes for a couple of reposts. Are you Ok with it?
Thanks,
-Joel
Future enhancements/fixes I am developing for preempt-off tracer will depend on these patches, so I suggest prioritizing these well reviewed and tested patches for that reason as well.
Introduction to the series: The preempt/irq tracepoints exist but not everything in the kernel is using it whenever they need to be notified that a preempt disable/enable or an irq disable/enable has occurred. This makes things not work simultaneously (for example, only either lockdep or irqsoff trace-events can be used at a time).
This is particularly painful to deal with, since turning on lockdep breaks tracers that install probes on IRQ events, such as the BCC atomic critical section tracer [1]. This constraint also makes it impossible to use synthetic events to trace irqsoff sections with lockdep simulataneously turned on.
This series solves that, and also results in a nice clean up of relevant parts of the kernel. Several ifdefs are simpler, and the design is more unified and better. Also as a result of this, we also speeded performance all rcuidle tracepoints since their handling is simpler.
[1] https://github.com/iovisor/bcc/blob/master/tools/criticalstat_example.txt
v8->v9:
- Small style changes to tracepoint code (Mathieu)
- Minor style fix to use PTR_ERR_OR_ZERO (0-day bot)
- Minor fix to test_atomic_sections to use unsigned long.
- Added Namhyung's, Mathieu's Reviewed-by to some patches.
- Added Acks from Matsami
v7->v8:
- Refactored irqsoff tracer probe defines (Namhyung)
v6->v7:
Added a module to simulate an atomic section, a kselftest to load and and trigger it which verifies the preempt-tracer and this series.
Fixed a new warning after I rebased in early boot, this is because
early_boot_irqs_disabled was set too early, I moved it after the lockdep initialization.
added back the softirq fix since it appears it wasn't picked up.
Ran Ingo's locking API selftest suite which are passing with this series.
Mathieu suggested ifdef'ing the tracepoint_synchronize_unregister function incase tracepoints aren't enabled, did that.
Joel Fernandes (Google) (6): srcu: Add notrace variant of srcu_dereference trace/irqsoff: Split reset into separate functions tracepoint: Make rcuidle tracepoint callers use SRCU tracing: Centralize preemptirq tracepoints and unify their usage lib: Add module to simulate atomic sections for testing preemptoff tracers kselftests: Add tests for the preemptoff and irqsoff tracers
Paul McKenney (1): srcu: Add notrace variants of srcu_read_{lock,unlock}
include/linux/ftrace.h | 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/linux/srcu.h | 22 ++ include/linux/tracepoint.h | 49 +++- include/trace/events/preemptirq.h | 23 +- init/main.c | 5 +- kernel/locking/lockdep.c | 35 +-- kernel/sched/core.c | 2 +- kernel/trace/Kconfig | 22 +- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 253 ++++++------------ kernel/trace/trace_preemptirq.c | 71 +++++ kernel/tracepoint.c | 16 +- lib/Kconfig.debug | 8 + lib/Makefile | 1 + lib/test_atomic_sections.c | 77 ++++++ tools/testing/selftests/ftrace/config | 3 + .../test.d/preemptirq/irqsoff_tracer.tc | 73 +++++ 20 files changed, 453 insertions(+), 241 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c create mode 100644 lib/test_atomic_sections.c create mode 100644 tools/testing/selftests/ftrace/test.d/preemptirq/irqsoff_tracer.tc
-- 2.18.0.rc2.346.g013aa6912e-goog
-- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Jul 2018 07:15:37 -0700 Joel Fernandes joel@joelfernandes.org wrote:
On Thu, Jun 28, 2018 at 11:21:42AM -0700, Joel Fernandes wrote:
From: "Joel Fernandes (Google)" joel@joelfernandes.org
This is a posting of v9 preempt/irq tracepoint clean up series rebased onto v4.18-rc2. No changes in the series, just a rebase + repost.
All patches have a Reviewed-by tags now from reviewers. This series has been well tested and is a simplification/refactoring of existing code, along with giving a speed-up for tracepoints using the rcu-idle API. With this our users will find it easier to use tools depending on existing preempt tracepoints since it simplifies the configuration for them.
Steve, all patches of this tracing series have been reviewed and/or acked and there haven't been additional changes for a couple of reposts. Are you Ok with it?
I'm currently chasing down some fires, but I'll try to get to it this week.
Thanks for the patience ;-)
-- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
linux-kselftest-mirror@lists.linaro.org