These set of patches capture latency events caused by interrupts and premption disabled in kernel. The patches are based on the hist trigger feature developed by Tom Zanussi.
Git-commit: 7ef224d1d0e3a1ade02d02c01ce1dcffb736d2c3
As mentioned by Daniel, there is also a good write up in the following blog by Brendan Gregg: http://www.brendangregg.com/blog/2016-06-08/linux-hist-triggers.html
The perf interface for the same have not been developed yet. Related efforts: https://patchwork.kernel.org/patch/8439401
hwlat_detector tracer: https://lkml.org/lkml/2016/8/4/348 https://lkml.org/lkml/2016/8/4/346
The patch series also contains histogram triggers for missed hrtimer offsets.
Dependencies: CONFIG_IRQSOFF_TRACER CONFIG_PREEMPT_TRACER CONFIG_PROVE_LOCKING CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
Usage of triggers to generate histograms:
mount -t debugfs nodev /sys/kernel/debug echo 'hist:key=latency.log2:val=hitcount:sort=latency' > /sys/kernel/debug/tracing/events/latency/latency_irqs/trigger echo 'hist:key=common_pid.execname' > /sys/kernel/debug/tracing/events/latency/latency_hrtimer_interrupt/trigger
CPU specific breakdown of events:
echo 'hist:key=cpu,latency:val=hitcount:sort=latency' > /sys/kernel/debug/tracing/events/latency/latency_preempt/trigger echo 'hist:key=cpu,latency:val=hitcount:sort=latency if cpu==1' > /sys/kernel/debug/tracing/events/latency/latency_preempt/trigger
Histogram output: cat /sys/kernel/debug/tracing/events/latency/latency_irqs/hist cat /sys/kernel/debug/tracing/events/latency/latency_preempt/hist cat /sys/kernel/debug/tracing/events/latency/latency_critical_timings/hist cat /sys/kernel/debug/tracing/events/latency/latency_hrtimer_interrupt/hist
Disable a trigger: echo '!hist:key=latency.log2' > /sys/kernel/debug/tracing/events/latency/latency_irqs/trigger
Also, not reusing the time diff from the tracer as it only keeps only the preempt_timestamp in 'trace_array_cpu' but there are additional timestamp variables to be added for irqs, and critical timings. And not sure if it is okay to use the *_enabled() in this way, splitting the time stamping and adding the tracepoint in two sections.
Also, Daniel mentioned that Steven was tossing the idea around to introduce a 'variable' to the tracepoints which can be used for timestamping etc to which he wanted to avoid i.e. placing two tracepoints and doing later the time diff. It sounds like too much overhead.
TODO: 1. perf interface. Not sure if this is needed 2. Effective wakeup latencies of individual processes due to irqs/premption being disabled or due to missed timers.
Excerpt from Carsten Emde's patches: Effective latencies occurs during wakeup of a process. To determine effective latencies, the time stamp when a process is scheduled to be woken up has to be stored, and determines the duration of the wakeup time shortly before control is passed over to this process. Note that the apparent latency in user space may be somewhat longer, since the process may be interrupted after control is passed over to it but before the execution in user space takes place. Simply measuring the interval between enqueuing and wakeup may also not appropriate in cases when a process is scheduled as a result of a timer expiration. The timer may have missed its deadline, e.g. due to disabled interrupts, but this latency would not be registered. Therefore, the offsets of missed timers are recorded in a separate histogram. If both wakeup latency and missed timer offsets are configured and enabled, a third histogram may be enabled that records the overall latency as a sum of the timer latency, if any, and the wakeup latency. This histogram is called "timerandwakeup".
I will be working on the above points while some one can look at the patches listed below.
Binoy Jayan (2): tracing: Add trace_irqsoff tracepoints tracing: Histogram for missed timer offsets
Daniel Wagner (1): tracing: Deference pointers without RCU checks
include/linux/hrtimer.h | 1 + include/linux/rculist.h | 36 ++++++++++++++++++ include/linux/tracepoint.h | 4 +- include/trace/events/latency.h | 73 +++++++++++++++++++++++++++++++++++++ kernel/time/hrtimer.c | 27 ++++++++++++++ kernel/trace/trace_events_trigger.c | 6 +-- kernel/trace/trace_irqsoff.c | 48 +++++++++++++++++++++++- 7 files changed, 189 insertions(+), 6 deletions(-) create mode 100644 include/trace/events/latency.h
From: Daniel Wagner daniel.wagner@bmw-carit.de
The tracepoint can't be used in code section where we are in the middle of a state transition.
For example if we place a tracepoint inside start/stop_critical_section(), lockdep complains with
[ 0.035589] WARNING: CPU: 0 PID: 3 at kernel/locking/lockdep.c:3560 \ check_flags.part.36+0x1bc/0x210() [ 0.036000] \ DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [ 0.036000] Kernel panic - not \ syncing: panic_on_warn set ... [ 0.036000] [ 0.036000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc7+ #460 [ 0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS \ 1.7.5-20140709_153950- 04/01/2014 [ 0.036000] ffffffff81f2463a ffff88007c93bb98 \ ffffffff81afb317 0000000000000001 [ 0.036000] ffffffff81f212b3 ffff88007c93bc18 \ ffffffff81af7bc2 ffff88007c93bbb8 [ 0.036000] ffffffff00000008 ffff88007c93bc28 \ ffff88007c93bbc8 000000000093bbd8 [ 0.036000] Call Trace: [ 0.036000] [<ffffffff81afb317>] dump_stack+0x4f/0x7b [ 0.036000] [<ffffffff81af7bc2>] panic+0xc0/0x1e9 [ 0.036000] [<ffffffff81b07188>] ? _raw_spin_unlock_irqrestore+0x38/0x80 [ 0.036000] [<ffffffff81052a60>] warn_slowpath_common+0xc0/0xc0 [ 0.036000] [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70 [ 0.036000] [<ffffffff81052aa6>] warn_slowpath_fmt+0x46/0x50 [ 0.036000] [<ffffffff810a468c>] check_flags.part.36+0x1bc/0x210 [ 0.036000] [<ffffffff810a5168>] lock_is_held+0x78/0x90 [ 0.036000] [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70 [ 0.036000] [<ffffffff810588ab>] ? __do_softirq+0x3db/0x500 [ 0.036000] [<ffffffff81134495>] trace_preempt_on+0x255/0x260 [ 0.036000] [<ffffffff81081c0b>] preempt_count_sub+0xab/0xf0 [ 0.036000] [<ffffffff81057116>] __local_bh_enable+0x36/0x70 [ 0.036000] [<ffffffff810588ab>] __do_softirq+0x3db/0x500 [ 0.036000] [<ffffffff810589ef>] run_ksoftirqd+0x1f/0x60 [ 0.036000] [<ffffffff81079653>] smpboot_thread_fn+0x193/0x2a0 [ 0.036000] [<ffffffff810794c0>] ? SyS_setgroups+0x150/0x150 [ 0.036000] [<ffffffff81075312>] kthread+0xf2/0x110 [ 0.036000] [<ffffffff81b026b3>] ? wait_for_completion+0xc3/0x120 [ 0.036000] [<ffffffff81081c0b>] ? preempt_count_sub+0xab/0xf0 [ 0.036000] [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240 [ 0.036000] [<ffffffff81b07cc8>] ret_from_fork+0x58/0x90 [ 0.036000] [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240 [ 0.036000] ---[ end Kernel panic - not syncing: panic_on_warn set ...
PeterZ was so kind to explain it to me what is happening:
"__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it tells lockdep softirqs are enabled with trace_softirqs_on() after that we go an actually modify the preempt_count with preempt_count_sub(). Then in preempt_count_sub() you call into trace_preempt_on() if this was the last preempt_count increment but you do that _before_ you actually change the preempt_count with __preempt_count_sub() at this point lockdep and preempt_count think the world differs and *boom*"
So the simplest way to avoid this is by disabling the consistency checks.
We also need to take care of the iterating in trace_events_trigger.c to avoid a splatter in conjunction with the hist trigger.
Not for inclusion!
Not-Signed-off-by: Daniel Wagner daniel.wagner@bmw-carit.de
Not-Signed-off-by: Binoy Jayan binoy.jayan@linaro.org --- include/linux/rculist.h | 36 ++++++++++++++++++++++++++++++++++++ include/linux/tracepoint.h | 4 ++-- kernel/trace/trace_events_trigger.c | 6 +++--- 3 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 8beb98d..bee836b 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -279,6 +279,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, container_of(lockless_dereference(ptr), type, member)
/** + * list_entry_rcu_notrace - get the struct for this entry (for tracing) + * @ptr: the &struct list_head pointer. + * @type: the type of the struct this is embedded in. + * @member: the name of the list_head within the struct. + * + * This primitive may safely run concurrently with the _rcu list-mutation + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). + * + * This is the same as list_entry_rcu() except that it does + * not do any RCU debugging or tracing. + */ +#define list_entry_rcu_notrace(ptr, type, member) \ +({ \ + typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \ + container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), type, member); \ +}) + +/** * Where are list_empty_rcu() and list_first_entry_rcu()? * * Implementing those functions following their counterparts list_empty() and @@ -391,6 +409,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
/** + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type (for tracing) + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member: the name of the list_head within the struct. + * + * This list-traversal primitive may safely run concurrently with + * the _rcu list-mutation primitives such as list_add_rcu() + * as long as the traversal is guarded by rcu_read_lock(). + * + * This is the same as list_for_each_entry_rcu() except that it does + * not do any RCU debugging or tracing. + */ +#define list_for_each_entry_rcu_notrace(pos, head, member) \ + for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \ + &pos->member != (head); \ + pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_continue_rcu - continue iteration over list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index be586c6..d0e2a82 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -138,7 +138,7 @@ extern void syscall_unregfunc(void); return; \ prercu; \ rcu_read_lock_sched_notrace(); \ - it_func_ptr = rcu_dereference_sched((tp)->funcs); \ + it_func_ptr = rcu_dereference_raw_notrace((tp)->funcs); \ if (it_func_ptr) { \ do { \ it_func = (it_func_ptr)->func; \ @@ -189,7 +189,7 @@ extern void syscall_unregfunc(void); TP_CONDITION(cond),,); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ - rcu_dereference_sched(__tracepoint_##name.funcs);\ + rcu_dereference_raw_notrace(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ } \ diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index a975571..4489fb4 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -71,14 +71,14 @@ event_triggers_call(struct trace_event_file *file, void *rec) if (list_empty(&file->triggers)) return tt;
- list_for_each_entry_rcu(data, &file->triggers, list) { + list_for_each_entry_rcu_notrace(data, &file->triggers, list) { if (data->paused) continue; if (!rec) { data->ops->func(data, rec); continue; } - filter = rcu_dereference_sched(data->filter); + filter = rcu_dereference_raw_notrace(data->filter); if (filter && !filter_match_preds(filter, rec)) continue; if (event_command_post_trigger(data->cmd_ops)) { @@ -111,7 +111,7 @@ event_triggers_post_call(struct trace_event_file *file, { struct event_trigger_data *data;
- list_for_each_entry_rcu(data, &file->triggers, list) { + list_for_each_entry_rcu_notrace(data, &file->triggers, list) { if (data->paused) continue; if (data->cmd_ops->trigger_type & tt)
On Thu, Aug 18, 2016 at 12:39:24PM +0530, Binoy Jayan wrote:
Not for inclusion!
Not-Signed-off-by: Daniel Wagner daniel.wagner@bmw-carit.de
I wrote that for the initial RFC. Since you are now taking care of the patch set, you can drop the 'Not for inclusion!' and here is my
Signed-off-by: Daniel Wanger daniel.wagner@bmw-carit.de
This work is based on work by Daniel Wagner. A few tracepoints are added at the end of the critical section. With the hist trigger in place, the hist trigger plots may be generated, with per-cpu breakdown of events captured. It is based on linux kernel's event infrastructure.
The following filter(s) may be used
'hist:key=latency.log2:val=hitcount:sort=latency' 'hist:key=cpu,latency:val=hitcount:sort=latency if cpu==1' 'hist:key=common_pid.execname'
This captures only the latencies introduced by disabled irqs and preemption. Additional per process data has to be captured to calculate the effective latencies introduced for individual processes.
Initial work - latency.patch
[1] https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?...
Signed-off-by: Binoy Jayan binoy.jayan@linaro.org --- include/trace/events/latency.h | 42 ++++++++++++++++++++++++++++++++++++ kernel/trace/trace_irqsoff.c | 48 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 89 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/latency.h
diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h new file mode 100644 index 0000000..b279e2d --- /dev/null +++ b/include/trace/events/latency.h @@ -0,0 +1,42 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM latency + +#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HIST_H + +#include <linux/tracepoint.h> + +DECLARE_EVENT_CLASS(latency_template, + TP_PROTO(int cpu, cycles_t latency), + + TP_ARGS(cpu, latency), + + TP_STRUCT__entry( + __field(int, cpu) + __field(cycles_t, latency) + ), + + TP_fast_assign( + __entry->cpu = cpu; + __entry->latency = latency; + ), + + TP_printk("cpu=%d, latency=%lu", __entry->cpu, __entry->latency) +); + +DEFINE_EVENT(latency_template, latency_irqs, + TP_PROTO(int cpu, cycles_t latency), + TP_ARGS(cpu, latency)); + +DEFINE_EVENT(latency_template, latency_preempt, + TP_PROTO(int cpu, cycles_t latency), + TP_ARGS(cpu, latency)); + +DEFINE_EVENT(latency_template, latency_critical_timings, + TP_PROTO(int cpu, cycles_t latency), + TP_ARGS(cpu, latency)); + +#endif /* _TRACE_HIST_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 03cdff8..08d8d05 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -14,13 +14,20 @@ #include <linux/module.h> #include <linux/ftrace.h>
+#include <trace/events/sched.h> + #include "trace.h"
+#define CREATE_TRACE_POINTS +#include <trace/events/latency.h> + static struct trace_array *irqsoff_trace __read_mostly; static int tracer_enabled __read_mostly;
static DEFINE_PER_CPU(int, tracing_cpu); - +static DEFINE_PER_CPU(cycle_t, ts_irqs); +static DEFINE_PER_CPU(cycle_t, ts_preempt); +static DEFINE_PER_CPU(cycle_t, ts_critical_timings); static DEFINE_RAW_SPINLOCK(max_trace_lock);
enum { @@ -419,9 +426,22 @@ stop_critical_timing(unsigned long ip, unsigned long parent_ip) atomic_dec(&data->disabled); }
+static inline void mark_timestamp(cycle_t __percpu *ts) +{ + int cpu = raw_smp_processor_id(); + per_cpu(*ts, cpu) = ftrace_now(cpu); +} + +static inline cycle_t get_delta(int cpu, cycle_t __percpu *ts) +{ + return ftrace_now(cpu) - per_cpu(*ts, cpu); +} /* start and stop critical timings used to for stoppage (in idle) */ void start_critical_timings(void) { + if (trace_latency_critical_timings_enabled()) + mark_timestamp(&ts_critical_timings); + if (preempt_trace() || irq_trace()) start_critical_timing(CALLER_ADDR0, CALLER_ADDR1); } @@ -431,6 +451,13 @@ void stop_critical_timings(void) { if (preempt_trace() || irq_trace()) stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); + + if (trace_latency_critical_timings_enabled()) { + int cpu = raw_smp_processor_id(); + trace_latency_critical_timings(cpu, + get_delta(cpu, &ts_critical_timings)); + } + } EXPORT_SYMBOL_GPL(stop_critical_timings);
@@ -438,6 +465,10 @@ EXPORT_SYMBOL_GPL(stop_critical_timings); #ifdef CONFIG_PROVE_LOCKING void time_hardirqs_on(unsigned long a0, unsigned long a1) { + if (trace_latency_irqs_enabled()) { + int cpu = raw_smp_processor_id(); + trace_latency_irqs(cpu, get_delta(cpu, &ts_irqs)); + } if (!preempt_trace() && irq_trace()) stop_critical_timing(a0, a1); } @@ -446,6 +477,10 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1) { if (!preempt_trace() && irq_trace()) start_critical_timing(a0, a1); + + if (trace_latency_irqs_enabled()) { + mark_timestamp(&ts_irqs); + } }
#else /* !CONFIG_PROVE_LOCKING */ @@ -500,9 +535,16 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller); #endif /* CONFIG_PROVE_LOCKING */ #endif /* CONFIG_IRQSOFF_TRACER */
+int count = 0; + #ifdef CONFIG_PREEMPT_TRACER void trace_preempt_on(unsigned long a0, unsigned long a1) { + if (trace_latency_preempt_enabled()) { + int cpu = raw_smp_processor_id(); + trace_latency_preempt(cpu, get_delta(cpu, &ts_preempt)); + } + if (preempt_trace() && !irq_trace()) stop_critical_timing(a0, a1); } @@ -511,6 +553,10 @@ void trace_preempt_off(unsigned long a0, unsigned long a1) { if (preempt_trace() && !irq_trace()) start_critical_timing(a0, a1); + + if (trace_latency_preempt_enabled()) { + mark_timestamp(&ts_preempt); + } } #endif /* CONFIG_PREEMPT_TRACER */
On Thu, Aug 18, 2016 at 12:39:25PM +0530, Binoy Jayan wrote:
static struct trace_array *irqsoff_trace __read_mostly; static int tracer_enabled __read_mostly; static DEFINE_PER_CPU(int, tracing_cpu);
+static DEFINE_PER_CPU(cycle_t, ts_irqs); +static DEFINE_PER_CPU(cycle_t, ts_preempt); +static DEFINE_PER_CPU(cycle_t, ts_critical_timings); static DEFINE_RAW_SPINLOCK(max_trace_lock);
What about creating cycle_t array and introduce only one tracepoint type which containts the type? I think it should be possible to get the right plot via hist's filter option:
'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0 && type==0'
Obviously, there is a performance impact. It would be good to see some numbers.
cheers, daniel
Just as rough idea (not tested):
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 03cdff8..012544c 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -19,7 +19,15 @@ static struct trace_array *irqsoff_trace __read_mostly; static int tracer_enabled __read_mostly;
+enum latency_type { + LAT_IRQ, + LAT_PREEMPT, + LAT_CRITTIME, + LAT_MAX +}; + static DEFINE_PER_CPU(int, tracing_cpu); +static DEFINE_PER_CPU(cycle_t[LAT_MAX], lat_data);
static DEFINE_RAW_SPINLOCK(max_trace_lock);
@@ -350,6 +358,19 @@ out: __trace_function(tr, CALLER_ADDR0, parent_ip, flags, pc); }
+static inline void lat_timestamp(enum latency_type type) +{ + int cpu = raw_smp_processor_id(); + per_cpu(lat_data[type], cpu) = ftrace_now(cpu); +} + +static inline void lat_trace(enum latency_type type) +{ + int cpu = raw_smp_processor_id(); + trace_latency_preempt(type, + ftrace_now(cpu) - per_cpu(lat_data[type], cpu)); +} + static inline void start_critical_timing(unsigned long ip, unsigned long parent_ip) { @@ -422,6 +443,9 @@ stop_critical_timing(unsigned long ip, unsigned long parent_ip) /* start and stop critical timings used to for stoppage (in idle) */ void start_critical_timings(void) { + if (trace_latency_preempt_enabled()) + lat_timestamp(LAT_CRITTIME); + if (preempt_trace() || irq_trace()) start_critical_timing(CALLER_ADDR0, CALLER_ADDR1); } @@ -431,6 +455,9 @@ void stop_critical_timings(void) { if (preempt_trace() || irq_trace()) stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); + + if (trace_latency_preempt_enabled()) + lat_trace(LAT_CRITTIME); } EXPORT_SYMBOL_GPL(stop_critical_timings);
@@ -438,6 +465,9 @@ EXPORT_SYMBOL_GPL(stop_critical_timings); #ifdef CONFIG_PROVE_LOCKING void time_hardirqs_on(unsigned long a0, unsigned long a1) { + if (trace_latency_preempt_enable()) + lat_trace(LAT_CRITTIME); + if (!preempt_trace() && irq_trace()) stop_critical_timing(a0, a1); } @@ -446,6 +476,9 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1) { if (!preempt_trace() && irq_trace()) start_critical_timing(a0, a1); + + if (trace_latency_preempt_enabled()) + lat_timestamp(LAT_IRQ); }
#else /* !CONFIG_PROVE_LOCKING */
On Thu, 18 Aug 2016 12:39:25 +0530 Binoy Jayan binoy.jayan@linaro.org wrote:
+static inline void mark_timestamp(cycle_t __percpu *ts) +{
- int cpu = raw_smp_processor_id();
- per_cpu(*ts, cpu) = ftrace_now(cpu);
Just an FYI. ftrace_now() is whatever the clock source is that is specified by /sys/kernel/tracing/trace_clock. Which could be a slower "global" clock, or even cycles. If that's what you want then this is the right approach. But if you want something more accurate per cpu (may not be accurate across CPUs) then you may want to use trace_clock_local() (which is the "[local]" clock in the trace_clock file.
Also, you should convert the above to:
this_cpu_write(ts, ftrace_now(raw_smp_processor_id()));
As this_cpu_write is optimized over a per_cpu() use.
+}
+static inline cycle_t get_delta(int cpu, cycle_t __percpu *ts) +{
return ftrace_now(cpu) - per_cpu(*ts, cpu);
Use this_cpu_read(ts)
+} /* start and stop critical timings used to for stoppage (in idle) */ void start_critical_timings(void) {
- if (trace_latency_critical_timings_enabled())
mark_timestamp(&ts_critical_timings);
- if (preempt_trace() || irq_trace()) start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
} @@ -431,6 +451,13 @@ void stop_critical_timings(void) { if (preempt_trace() || irq_trace()) stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
- if (trace_latency_critical_timings_enabled()) {
int cpu = raw_smp_processor_id();
trace_latency_critical_timings(cpu,
get_delta(cpu, &ts_critical_timings));
- }
} EXPORT_SYMBOL_GPL(stop_critical_timings); @@ -438,6 +465,10 @@ EXPORT_SYMBOL_GPL(stop_critical_timings); #ifdef CONFIG_PROVE_LOCKING void time_hardirqs_on(unsigned long a0, unsigned long a1) {
- if (trace_latency_irqs_enabled()) {
int cpu = raw_smp_processor_id();
trace_latency_irqs(cpu, get_delta(cpu, &ts_irqs));
- } if (!preempt_trace() && irq_trace()) stop_critical_timing(a0, a1);
} @@ -446,6 +477,10 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1) { if (!preempt_trace() && irq_trace()) start_critical_timing(a0, a1);
- if (trace_latency_irqs_enabled()) {
mark_timestamp(&ts_irqs);
- }
} #else /* !CONFIG_PROVE_LOCKING */ @@ -500,9 +535,16 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller); #endif /* CONFIG_PROVE_LOCKING */ #endif /* CONFIG_IRQSOFF_TRACER */ +int count = 0;
#ifdef CONFIG_PREEMPT_TRACER void trace_preempt_on(unsigned long a0, unsigned long a1) {
- if (trace_latency_preempt_enabled()) {
int cpu = raw_smp_processor_id();
trace_latency_preempt(cpu, get_delta(cpu, &ts_preempt));
If cpu in the tracepoint will always be smp_processor_id() you don't need to do that. It's already saved by the tracer.
-- Steve
- }
- if (preempt_trace() && !irq_trace()) stop_critical_timing(a0, a1);
} @@ -511,6 +553,10 @@ void trace_preempt_off(unsigned long a0, unsigned long a1) { if (preempt_trace() && !irq_trace()) start_critical_timing(a0, a1);
- if (trace_latency_preempt_enabled()) {
mark_timestamp(&ts_preempt);
- }
} #endif /* CONFIG_PREEMPT_TRACER */
Hi Binoy,
[auto build test WARNING on tip/perf/core] [also build test WARNING on v4.8-rc3 next-20160819] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information]
url: https://github.com/0day-ci/linux/commits/Binoy-Jayan/tracing-Deference-point... config: x86_64-randconfig-s3-08181449 (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from include/trace/define_trace.h:95:0, from include/trace/events/latency.h:42, from kernel/trace/trace_irqsoff.c:22: include/trace/events/latency.h: In function 'trace_raw_output_latency_template': include/trace/events/latency.h:24:12: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'cycles_t {aka long long unsigned int}' [-Wformat=] TP_printk("cpu=%d, latency=%lu", __entry->cpu, __entry->latency) ^ include/trace/trace_events.h:327:22: note: in definition of macro 'DECLARE_EVENT_CLASS' trace_seq_printf(s, print); \ ^~~~~ include/trace/events/latency.h:24:2: note: in expansion of macro 'TP_printk' TP_printk("cpu=%d, latency=%lu", __entry->cpu, __entry->latency) ^~~~~~~~~ In file included from include/asm-generic/percpu.h:6:0, from arch/x86/include/asm/percpu.h:552, from arch/x86/include/asm/preempt.h:5, from include/linux/preempt.h:59, from include/linux/spinlock.h:50, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:19, from include/linux/uaccess.h:4, from kernel/trace/trace_irqsoff.c:13: At top level:
kernel/trace/trace_irqsoff.c:29:32: warning: 'ts_preempt' defined but not used [-Wunused-variable]
static DEFINE_PER_CPU(cycle_t, ts_preempt); ^ include/linux/percpu-defs.h:105:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION' __typeof__(type) name ^~~~
kernel/trace/trace_irqsoff.c:29:8: note: in expansion of macro 'DEFINE_PER_CPU'
static DEFINE_PER_CPU(cycle_t, ts_preempt); ^~~~~~~~~~~~~~
kernel/trace/trace_irqsoff.c:28:32: warning: 'ts_irqs' defined but not used [-Wunused-variable]
static DEFINE_PER_CPU(cycle_t, ts_irqs); ^ include/linux/percpu-defs.h:105:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION' __typeof__(type) name ^~~~ kernel/trace/trace_irqsoff.c:28:8: note: in expansion of macro 'DEFINE_PER_CPU' static DEFINE_PER_CPU(cycle_t, ts_irqs); ^~~~~~~~~~~~~~
vim +/ts_preempt +29 kernel/trace/trace_irqsoff.c
7 * From code in the latency_tracer, that is: 8 * 9 * Copyright (C) 2004-2006 Ingo Molnar 10 * Copyright (C) 2004 Nadia Yvette Chambers 11 */ 12 #include <linux/kallsyms.h>
13 #include <linux/uaccess.h>
14 #include <linux/module.h> 15 #include <linux/ftrace.h> 16 17 #include <trace/events/sched.h> 18 19 #include "trace.h" 20 21 #define CREATE_TRACE_POINTS 22 #include <trace/events/latency.h> 23 24 static struct trace_array *irqsoff_trace __read_mostly; 25 static int tracer_enabled __read_mostly; 26 27 static DEFINE_PER_CPU(int, tracing_cpu);
28 static DEFINE_PER_CPU(cycle_t, ts_irqs); 29 static DEFINE_PER_CPU(cycle_t, ts_preempt);
30 static DEFINE_PER_CPU(cycle_t, ts_critical_timings); 31 static DEFINE_RAW_SPINLOCK(max_trace_lock); 32
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Binoy,
[auto build test WARNING on tip/perf/core] [also build test WARNING on v4.8-rc3 next-20160819] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information]
url: https://github.com/0day-ci/linux/commits/Binoy-Jayan/tracing-Deference-point... config: i386-randconfig-c0-08190953 (attached as .config) compiler: gcc-4.9 (Debian 4.9.3-14) 4.9.3 reproduce: # save the attached .config to linux build tree make ARCH=i386
All warnings (new ones prefixed by >>):
In file included from include/linux/linkage.h:4:0, from include/linux/kernel.h:6, from include/linux/kallsyms.h:9, from kernel/trace/trace_irqsoff.c:12: include/trace/events/latency.h: In function 'trace_raw_output_latency_template':
include/linux/compiler.h:152:17: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'cycles_t' [-Wformat=]
static struct ftrace_branch_data \ ^ include/linux/compiler.h:147:23: note: in expansion of macro '__trace_if' #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) ) ^ include/trace/trace_events.h:324:2: note: in expansion of macro 'if' if (ret != TRACE_TYPE_HANDLED) \ ^ include/trace/events/latency.h:9:1: note: in expansion of macro 'DECLARE_EVENT_CLASS' DECLARE_EVENT_CLASS(latency_template, ^ In file included from include/asm-generic/percpu.h:6:0, from arch/x86/include/asm/percpu.h:552, from arch/x86/include/asm/preempt.h:5, from include/linux/preempt.h:59, from include/linux/spinlock.h:50, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:19, from include/linux/uaccess.h:4, from kernel/trace/trace_irqsoff.c:13: kernel/trace/trace_irqsoff.c: At top level: kernel/trace/trace_irqsoff.c:28:32: warning: 'ts_irqs' defined but not used [-Wunused-variable] static DEFINE_PER_CPU(cycle_t, ts_irqs); ^ include/linux/percpu-defs.h:105:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION' __typeof__(type) name ^ kernel/trace/trace_irqsoff.c:28:8: note: in expansion of macro 'DEFINE_PER_CPU' static DEFINE_PER_CPU(cycle_t, ts_irqs); ^ kernel/trace/trace_irqsoff.c:29:32: warning: 'ts_preempt' defined but not used [-Wunused-variable] static DEFINE_PER_CPU(cycle_t, ts_preempt); ^ include/linux/percpu-defs.h:105:19: note: in definition of macro 'DEFINE_PER_CPU_SECTION' __typeof__(type) name ^ kernel/trace/trace_irqsoff.c:29:8: note: in expansion of macro 'DEFINE_PER_CPU' static DEFINE_PER_CPU(cycle_t, ts_preempt); ^
vim +152 include/linux/compiler.h
45b79749 Steven Rostedt 2008-11-21 136 # define likely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1)) 1f0d69a9 Steven Rostedt 2008-11-12 137 # endif 1f0d69a9 Steven Rostedt 2008-11-12 138 # ifndef unlikely 45b79749 Steven Rostedt 2008-11-21 139 # define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0)) 1f0d69a9 Steven Rostedt 2008-11-12 140 # endif 2bcd521a Steven Rostedt 2008-11-21 141 2bcd521a Steven Rostedt 2008-11-21 142 #ifdef CONFIG_PROFILE_ALL_BRANCHES 2bcd521a Steven Rostedt 2008-11-21 143 /* 2bcd521a Steven Rostedt 2008-11-21 144 * "Define 'is'", Bill Clinton 2bcd521a Steven Rostedt 2008-11-21 145 * "Define 'if'", Steven Rostedt 2bcd521a Steven Rostedt 2008-11-21 146 */ ab3c9c68 Linus Torvalds 2009-04-07 147 #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) ) ab3c9c68 Linus Torvalds 2009-04-07 148 #define __trace_if(cond) \ b33c8ff4 Arnd Bergmann 2016-02-12 149 if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ 2bcd521a Steven Rostedt 2008-11-21 150 ({ \ 2bcd521a Steven Rostedt 2008-11-21 151 int ______r; \ 2bcd521a Steven Rostedt 2008-11-21 @152 static struct ftrace_branch_data \ 2bcd521a Steven Rostedt 2008-11-21 153 __attribute__((__aligned__(4))) \ 2bcd521a Steven Rostedt 2008-11-21 154 __attribute__((section("_ftrace_branch"))) \ 2bcd521a Steven Rostedt 2008-11-21 155 ______f = { \ 2bcd521a Steven Rostedt 2008-11-21 156 .func = __func__, \ 2bcd521a Steven Rostedt 2008-11-21 157 .file = __FILE__, \ 2bcd521a Steven Rostedt 2008-11-21 158 .line = __LINE__, \ 2bcd521a Steven Rostedt 2008-11-21 159 }; \ 2bcd521a Steven Rostedt 2008-11-21 160 ______r = !!(cond); \
:::::: The code at line 152 was first introduced by commit :::::: 2bcd521a684cc94befbe2ce7d5b613c841b0d304 trace: profile all if conditionals
:::::: TO: Steven Rostedt srostedt@redhat.com :::::: CC: Ingo Molnar mingo@elte.hu
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Latencies of missed timer offsets. Generate a histogram of missed timer offsets in microseconds. This will be a based along with irq and preemption latencies to calculate the effective process wakeup latencies.
The following filter(s) may be used
'hist:key=common_pid.execname' 'hist:key=common_pid.execname,cpu:val=toffset,hitcount'
Signed-off-by: Binoy Jayan binoy.jayan@linaro.org --- include/linux/hrtimer.h | 1 + include/trace/events/latency.h | 31 +++++++++++++++++++++++++++++++ kernel/time/hrtimer.c | 27 +++++++++++++++++++++++++++ 3 files changed, 59 insertions(+)
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 5e00f80..19cb76c 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -104,6 +104,7 @@ struct hrtimer { struct hrtimer_clock_base *base; u8 state; u8 is_rel; + ktime_t praecox; #ifdef CONFIG_TIMER_STATS int start_pid; void *start_site; diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h index b279e2d..32317a7 100644 --- a/include/trace/events/latency.h +++ b/include/trace/events/latency.h @@ -36,6 +36,37 @@ DEFINE_EVENT(latency_template, latency_critical_timings, TP_PROTO(int cpu, cycles_t latency), TP_ARGS(cpu, latency));
+TRACE_EVENT(latency_hrtimer_interrupt, + + TP_PROTO(int cpu, long long toffset, struct task_struct *curr, + struct task_struct *task), + + TP_ARGS(cpu, toffset, curr, task), + + TP_STRUCT__entry( + __field(int, cpu) + __field(long long, toffset) + __array(char, ccomm, TASK_COMM_LEN) + __field(int, cprio) + __array(char, tcomm, TASK_COMM_LEN) + __field(int, tprio) + ), + + TP_fast_assign( + __entry->cpu = cpu; + __entry->toffset = toffset; + memcpy(__entry->ccomm, curr->comm, TASK_COMM_LEN); + __entry->cprio = curr->prio; + memcpy(__entry->tcomm, task != NULL ? task->comm : "<none>", + task != NULL ? TASK_COMM_LEN : 7); + __entry->tprio = task != NULL ? task->prio : -1; + ), + + TP_printk("cpu=%d toffset=%lld curr=%s[%d] thread=%s[%d]", + __entry->cpu, __entry->toffset, __entry->ccomm, + __entry->cprio, __entry->tcomm, __entry->tprio) +); + #endif /* _TRACE_HIST_H */
/* This part must be outside protection */ diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 9ba7c82..71e6fcb 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -53,9 +53,12 @@ #include <asm/uaccess.h>
#include <trace/events/timer.h> +#include <trace/events/latency.h>
#include "tick-internal.h"
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer); + /* * The timer bases: * @@ -992,6 +995,15 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
timer_stats_hrtimer_set_start_info(timer);
+ if (trace_latency_hrtimer_interrupt_enabled()) { + ktime_t now = new_base->get_time(); + + if (ktime_to_ns(tim) < ktime_to_ns(now)) + timer->praecox = now; + else + timer->praecox = ktime_set(0, 0); + } + leftmost = enqueue_hrtimer(timer, new_base); if (!leftmost) goto unlock; @@ -1265,6 +1277,19 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, cpu_base->running = NULL; }
+static inline void trace_missed_hrtimer(struct hrtimer *timer, ktime_t basenow) +{ + if (trace_latency_hrtimer_interrupt_enabled()) + trace_latency_hrtimer_interrupt(raw_smp_processor_id(), + ktime_to_ns(ktime_sub(ktime_to_ns(timer->praecox) ? + timer->praecox : hrtimer_get_expires(timer), + basenow)), + current, + timer->function == hrtimer_wakeup ? + container_of(timer, struct hrtimer_sleeper, + timer)->task : NULL); + +} static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now) { struct hrtimer_clock_base *base = cpu_base->clock_base; @@ -1284,6 +1309,8 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)
timer = container_of(node, struct hrtimer, node);
+ trace_missed_hrtimer(timer, basenow); + /* * The immediate goal for using the softexpires is * minimizing wakeups, not running timers at the
Hi Binoy,
[auto build test ERROR on tip/perf/core] [also build test ERROR on v4.8-rc3 next-20160819] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information]
url: https://github.com/0day-ci/linux/commits/Binoy-Jayan/tracing-Deference-point... config: powerpc-c2k_defconfig (attached as .config) compiler: powerpc-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/ma... -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc
All errors (new ones prefixed by >>):
kernel/built-in.o: In function `atomic_read':
arch/powerpc/include/asm/atomic.h:37: undefined reference to `__tracepoint_latency_hrtimer_interrupt' arch/powerpc/include/asm/atomic.h:37: undefined reference to `__tracepoint_latency_hrtimer_interrupt'
kernel/built-in.o: In function `__hrtimer_run_queues': kernel/time/hrtimer.c:1298: undefined reference to `__tracepoint_latency_hrtimer_interrupt' kernel/built-in.o: In function `trace_missed_hrtimer': kernel/time/hrtimer.c:1283: undefined reference to `__tracepoint_latency_hrtimer_interrupt'
vim +37 arch/powerpc/include/asm/atomic.h
dc53617c arch/powerpc/include/asm/atomic.h Boqun Feng 2016-01-06 31 }) dc53617c arch/powerpc/include/asm/atomic.h Boqun Feng 2016-01-06 32 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 33 static __inline__ int atomic_read(const atomic_t *v) 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 34 { 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 35 int t; 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 36 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 @37 __asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter)); 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 38 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 39 return t; 9f0cbea0 include/asm-powerpc/atomic.h Segher Boessenkool 2007-08-11 40 }
:::::: The code at line 37 was first introduced by commit :::::: 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89 [POWERPC] Implement atomic{, 64}_{read, write}() without volatile
:::::: TO: Segher Boessenkool segher@kernel.crashing.org :::::: CC: Paul Mackerras paulus@samba.org
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Binoy,
[auto build test ERROR on tip/perf/core] [also build test ERROR on v4.8-rc3 next-20160819] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information]
url: https://github.com/0day-ci/linux/commits/Binoy-Jayan/tracing-Deference-point... config: x86_64-rhel (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=x86_64
All errors (new ones prefixed by >>):
kernel/built-in.o: In function `__hrtimer_run_queues':
hrtimer.c:(.text+0x73720): undefined reference to `__tracepoint_latency_hrtimer_interrupt' kernel/built-in.o:(__jump_table+0x1f18): undefined reference to `__tracepoint_latency_hrtimer_interrupt'
kernel/built-in.o:(__jump_table+0x1f78): undefined reference to `__tracepoint_latency_hrtimer_interrupt' kernel/built-in.o:(__jump_table+0x1fa8): undefined reference to `__tracepoint_latency_hrtimer_interrupt'
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
linaro-kernel@lists.linaro.org