On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney paulmck@kernel.org wrote:
On Tue, Sep 01, 2020 at 08:46:54AM +0200, Ulf Hansson wrote:
- Saravanna, Rafael, Lina
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use rcu_read_lock() from the idle loop without use of something like RCU_NONIDLE(). If this is due to event tracing, you should use the _rcuidle() variant of the event trace statement.
In the runtime suspend path, the runtime PM core calls device_links_read_lock() - if the device in question has any links to suppliers (to allow them to be suspended too).
device_links_read_lock() calls srcu_read_lock().
Except that it is perfectly legal to invoke srcu_read_lock() from the idle loop. The problem is instead rcu_read_lock() and similar.
Hmm. Sounds like more debugging is needed then, to narrow down the problem.
It turns out that the device in question (the CPU device that is attached to genpd) didn't have any links before - but that seems to have changed, due to the work done by Saravana (links become created on a per resource basis, parsed from DT during boot).
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
It means that as far as RCU (and only RCU) is concerned there is an exit from idle state for just long enough to execute RCU_NONIDLE()'s argument. This involves an atomic operation on both entry to and exit from RCU_NONIDLE(), which in most cases won't be noticeable. But in some cases you might (for example) want to enclose a loop in RCU_NONIDLE() rather than doing RCU_NONIDLE() on each pass through the loop.
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
Here is the list, though it is early in the morning here:
RCU_NONIDLE().
Peter's patch, if it turns out to hoist your code out of what RCU considers to be the idle loop.
If the problem is trace events, use the _rcuidle() variant of the trace event. Instead of trace_blah(), use trace_blah_rcuidle().
Switch from RCU (as in rcu_read_lock()) to SRCU (as in srcu_read_lock()).
Take Peter's patch a step further, moving the rcu_idle_enter() and rcu_idle_exit() calls as needed. But please keep in mind that these two functions require that irqs be disabled by their callers.
If RCU_NONIDLE() in inconvenient due to early exits and such, you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson() functions that it calls.
Do any of those help?
Yes, they will, in one way or the other. Thanks for providing me with all the available options.
BTW, I still don't get what good rcu_idle_enter|exit() does, but I am assuming those need to be called at some point before the CPU goes to sleep.
Kind regards Uffe