Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

2 Sep 2020

      On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney paulmck@kernel.org wrote:
...
On Tue, Sep 01, 2020 at 08:46:54AM +0200, Ulf Hansson wrote:
...

Saravanna, Rafael, Lina

On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
...
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
...
While booting linux mainline kernel on arm64 db410c this kernel warning
noticed.
metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b
  git describe: v5.9-rc3
  make_kernelversion: 5.9.0-rc3
  kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
[    0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host)
(aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils)
2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020
[    0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
<>
[    5.299090] sdhci: Secure Digital Host Controller Interface driver
[    5.299140] sdhci: Copyright(c) Pierre Ossman
[    5.304313]
[    5.307771] Synopsys Designware Multimedia Card Interface Driver
[    5.308588] =============================
[    5.308593] WARNING: suspicious RCU usage
[    5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
[    5.320052] 5.9.0-rc3 #1 Not tainted
[    5.320057] -----------------------------
[    5.320063] /usr/src/kernel/include/trace/events/lock.h:37
suspicious rcu_dereference_check() usage!
[    5.320068]
[    5.320068] other info that might help us debug this:
[    5.320068]
[    5.320074]
[    5.320074] rcu_scheduler_active = 2, debug_locks = 1
[    5.320078] RCU used illegally from extended quiescent state!
[    5.320084] no locks held by swapper/0/0.
[    5.320089]
[    5.320089] stack backtrace:
[    5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
[    5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
[    5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[    5.346452] Call trace:
[    5.346463]  dump_backtrace+0x0/0x1f8
[    5.346471]  show_stack+0x2c/0x38
[    5.346480]  dump_stack+0xec/0x15c
[    5.346490]  lockdep_rcu_suspicious+0xd4/0xf8
[    5.346499]  lock_acquire+0x3d0/0x440
[    5.346510]  _raw_spin_lock_irqsave+0x80/0xb0
[    5.413118]  __pm_runtime_suspend+0x34/0x1d0
[    5.417457]  psci_enter_domain_idle_state+0x4c/0xb0
[    5.421795]  cpuidle_enter_state+0xc8/0x610
[    5.426392]  cpuidle_enter+0x3c/0x50
[    5.430561]  call_cpuidle+0x44/0x80
[    5.434378]  do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use
rcu_read_lock() from the idle loop without use of something like
RCU_NONIDLE().  If this is due to event tracing, you should use the
_rcuidle() variant of the event trace statement.
In the runtime suspend path, the runtime PM core calls
device_links_read_lock() - if the device in question has any links to
suppliers (to allow them to be suspended too).
device_links_read_lock() calls srcu_read_lock().
Except that it is perfectly legal to invoke srcu_read_lock() from the
idle loop.  The problem is instead rcu_read_lock() and similar.
Hmm. Sounds like more debugging is needed then, to narrow down the problem.
...
...
It turns out that the device in question (the CPU device that is
attached to genpd) didn't have any links before - but that seems to
have changed, due to the work done by Saravana (links become created
on a per resource basis, parsed from DT during boot).
...
Note also that Peter Zijlstra (CCed) is working to shrink the portion
of the idle loop that RCU ignores.  Not sure that it covers your
case, but it is worth checking.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what
does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
on bigger code paths?
It means that as far as RCU (and only RCU) is concerned there is an
exit from idle state for just long enough to execute RCU_NONIDLE()'s
argument.  This involves an atomic operation on both entry to and exit
from RCU_NONIDLE(), which in most cases won't be noticeable.  But in some
cases you might (for example) want to enclose a loop in RCU_NONIDLE()
rather than doing RCU_NONIDLE() on each pass through the loop.
...
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
that's the easiest approach, at least to start with.
Or do you have any other ideas?
Here is the list, though it is early in the morning here:

 RCU_NONIDLE().

 Peter's patch, if it turns out to hoist your code out of what
 RCU considers to be the idle loop.

 If the problem is trace events, use the _rcuidle() variant of the
 trace event.  Instead of trace_blah(), use trace_blah_rcuidle().

 Switch from RCU (as in rcu_read_lock()) to SRCU (as in
 srcu_read_lock()).

 Take Peter's patch a step further, moving the rcu_idle_enter()
 and rcu_idle_exit() calls as needed.  But please keep in mind
 that these two functions require that irqs be disabled by their
 callers.

 If RCU_NONIDLE() in inconvenient due to early exits and such,
 you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson()
 functions that it calls.

Do any of those help?
Yes, they will, in one way or the other. Thanks for providing me with
all the available options.
BTW, I still don't get what good rcu_idle_enter|exit() does, but I am
assuming those need to be called at some point before the CPU goes to
sleep.
Kind regards
Uffe

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper