On 26 August 2016 at 07:19, Masami Hiramatsu mhiramat@kernel.org wrote:
On Wed, 24 Aug 2016 16:47:28 +0530
"__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it tells lockdep softirqs are enabled with trace_softirqs_on() after that we go an actually modify the preempt_count with preempt_count_sub(). Then in preempt_count_sub() you call into trace_preempt_on() if this was the last preempt_count increment but you do that _before_ you actually change the preempt_count with __preempt_count_sub() at this point lockdep and preempt_count think the world differs and *boom*"
So the simplest way to avoid this is by disabling the consistency checks.
We also need to take care of the iterating in trace_events_trigger.c to avoid a splatter in conjunction with the hist trigger.
Special care for lockdep inside tracepoint handler is reasonable.
Reviewed-by: Masami Hiramatsu mhiramat@kernel.org
Steven, since this seems a bugfix, could you pick this from the series?
Thank you,
-- Masami Hiramatsu mhiramat@kernel.org
Hi Daniel/Masami,
I ran into a similar rcu error while using same tracepoint for all three latency types and using a filter like below to trigger only events falling under a specific type.
echo 'hist:key=ltype,cpu:val=latency:sort=ltype,cpu if ltype==0' > \ /sys/kernel/debug/tracing/events/latency/latency_preempt/trigger
The error occurs only when I use the predicate 'if ltype==0' as filter.
It occurs in 'filter_match_preds' during a call to 'rcu_dereference_sched'. kernel/trace/trace_events_filter.c +611 : filter_match_preds()
Surprisingly, this happens only the first time the echo command is used on the trigger file after each boot.
Do you think it is similar to the bug you have fixed? May be i'll try using "rcu_dereference_raw_notrace" instead of 'rcu_dereference_sched'.
Binoy
[ 1029.324257] =============================== [ 1029.324785] [ INFO: suspicious RCU usage. ] [ 1029.328698] 4.7.0+ #49 Not tainted [ 1029.332858] ------------------------------- [ 1029.336334] /local/mnt/workspace/src/korg/linux/kernel/trace/trace_events_filter.c:611 suspicious rcu_dereference_check() usage! [ 1029.340423] [ 1029.340423] other info that might help us debug this: [ 1029.340423] [ 1029.352226] [ 1029.352226] RCU used illegally from idle CPU! [ 1029.352226] rcu_scheduler_active = 1, debug_locks = 0 [ 1029.359953] RCU used illegally from extended quiescent state! [ 1029.371057] no locks held by swapper/0/0. [ 1029.376696] [ 1029.376696] stack backtrace: [ 1029.380693] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0+ #49 [ 1029.385033] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 1029.391111] Call trace: [ 1029.397798] [<ffff000008089ecc>] dump_backtrace+0x0/0x1e0 [ 1029.399967] [<ffff00000808a0d0>] show_stack+0x24/0x2c [ 1029.405523] [<ffff000008431554>] dump_stack+0xb0/0xf0 [ 1029.410557] [<ffff000008115dd0>] lockdep_rcu_suspicious+0xe8/0x120 [ 1029.415595] [<ffff0000081a5d40>] filter_match_preds+0x108/0x118 [ 1029.421669] [<ffff0000081a8308>] event_triggers_call+0x5c/0xc0 [ 1029.427485] [<ffff0000081a2184>] trace_event_buffer_commit+0x11c/0x244 [ 1029.433390] [<ffff00000819a4c0>] trace_event_raw_event_latency_template+0x58/0xa4 [ 1029.439902] [<ffff00000819b5ec>] time_hardirqs_on+0x264/0x290 [ 1029.447450] [<ffff0000081170d0>] trace_hardirqs_on_caller+0x20/0x180 [ 1029.453179] [<ffff000008117240>] trace_hardirqs_on+0x10/0x18 [ 1029.459604] [<ffff000008789290>] cpuidle_enter_state+0xc8/0x2e0 [ 1029.465246] [<ffff00000878951c>] cpuidle_enter+0x34/0x40 [ 1029.470888] [<ffff000008111188>] call_cpuidle+0x3c/0x5c [ 1029.476442] [<ffff0000081114e8>] cpu_startup_entry+0x1c0/0x360 [ 1029.481392] [<ffff000008955bc8>] rest_init+0x150/0x160 [ 1029.487293] [<ffff000008d30b98>] start_kernel+0x3a4/0x3b8 [ 1029.492415] [<ffff000008d301bc>] __primary_switched+0x30/0x74
Binoy