+ Saravanna, Rafael, Lina
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use rcu_read_lock() from the idle loop without use of something like RCU_NONIDLE(). If this is due to event tracing, you should use the _rcuidle() variant of the event trace statement.
In the runtime suspend path, the runtime PM core calls device_links_read_lock() - if the device in question has any links to suppliers (to allow them to be suspended too).
device_links_read_lock() calls srcu_read_lock().
It turns out that the device in question (the CPU device that is attached to genpd) didn't have any links before - but that seems to have changed, due to the work done by Saravana (links become created on a per resource basis, parsed from DT during boot).
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
[...]
Kind regards Uffe