While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0 [ 5.437589] cpu_startup_entry+0x2c/0x78 [ 5.441063] rest_init+0x1ac/0x280 [ 5.444970] arch_call_rest_init+0x14/0x1c [ 5.448180] start_kernel+0x50c/0x544 [ 5.452395] [ 5.452399] [ 5.452403] ============================= [ 5.452406] WARNING: suspicious RCU usage [ 5.452409] 5.9.0-rc3 #1 Not tainted [ 5.452412] ----------------------------- [ 5.452417] /usr/src/kernel/include/trace/events/ipi.h:36 suspicious rcu_dereference_check() usage! [ 5.452420] [ 5.452424] other info that might help us debug this: [ 5.452426] [ 5.452429] [ 5.452432] rcu_scheduler_active = 2, debug_locks = 1 [ 5.452436] RCU used illegally from extended quiescent state! [ 5.452440] 1 lock held by swapper/0/0: [ 5.452443] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at: vprintk_emit+0xb0/0x358 [ 5.452458] [ 5.452461] stack backtrace: [ 5.452465] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.452469] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.452472] Call trace: [ 5.452476] dump_backtrace+0x0/0x1f8 [ 5.452479] show_stack+0x2c/0x38 [ 5.452481] dump_stack+0xec/0x15c [ 5.452485] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.452489] arch_irq_work_raise+0x208/0x210 [ 5.452493] __irq_work_queue_local+0x64/0x88 [ 5.452495] irq_work_queue+0x3c/0x88 [ 5.452499] printk_safe_log_store+0x148/0x178 [ 5.452502] vprintk_func+0x1cc/0x2b8 [ 5.452506] printk+0x74/0x94 [ 5.452509] lockdep_rcu_suspicious+0x28/0xf8 [ 5.452512] lock_release+0x338/0x360 [ 5.452516] _raw_spin_unlock+0x3c/0xa0 [ 5.452519] vprintk_emit+0xf8/0x358 [ 5.452522] vprintk_default+0x48/0x58 [ 5.452526] vprintk_func+0xec/0x2b8 [ 5.452528] printk+0x74/0x94 [ 5.452532] lockdep_rcu_suspicious+0x28/0xf8 [ 5.452535] lock_acquire+0x3d0/0x440 [ 5.452538] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.452542] __pm_runtime_suspend+0x34/0x1d0 [ 5.452545] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.452549] cpuidle_enter_state+0xc8/0x610 [ 5.452552] cpuidle_enter+0x3c/0x50 [ 5.452555] call_cpuidle+0x44/0x80 [ 5.452559] do_idle+0x240/0x2a0 [ 5.452562] cpu_startup_entry+0x2c/0x78 [ 5.452564] rest_init+0x1ac/0x280 [ 5.452568] arch_call_rest_init+0x14/0x1c [ 5.452571] start_kernel+0x50c/0x544 [ 5.452575] ============================= [ 5.452578] WARNING: suspicious RCU usage [ 5.452582] 5.9.0-rc3 #1 Not tainted [ 5.452585] ----------------------------- [ 5.452590] /usr/src/kernel/include/trace/events/lock.h:63 suspicious rcu_dereference_check() usage! [ 5.452593] [ 5.452596] other info that might help us debug this: [ 5.452599] [ 5.452601] [ 5.452605] rcu_scheduler_active = 2, debug_locks = 1 [ 5.452609] RCU used illegally from extended quiescent state! [ 5.452612] 1 lock held by swapper/0/0: [ 5.452615] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at: vprintk_emit+0xb0/0x358 [ 5.452630] [ 5.452633] stack backtrace: [ 5.452636] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.452640] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.452643] Call trace: [ 5.452646] dump_backtrace+0x0/0x1f8 [ 5.452649] show_stack+0x2c/0x38 [ 5.452652] dump_stack+0xec/0x15c [ 5.452656] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.452659] lock_release+0x338/0x360 [ 5.452662] _raw_spin_unlock+0x3c/0xa0 [ 5.452665] vprintk_emit+0xf8/0x358 [ 5.452669] vprintk_default+0x48/0x58 [ 5.452671] vprintk_func+0xec/0x2b8 [ 5.452674] printk+0x74/0x94 [ 5.452677] lockdep_rcu_suspicious+0x28/0xf8 [ 5.452680] lock_acquire+0x3d0/0x440 [ 5.452683] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.452686] __pm_runtime_suspend+0x34/0x1d0 [ 5.452690] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.452693] cpuidle_enter_state+0xc8/0x610 [ 5.452696] cpuidle_enter+0x3c/0x50 [ 5.452698] call_cpuidle+0x44/0x80 [ 5.452701] do_idle+0x240/0x2a0 [ 5.452704] cpu_startup_entry+0x2c/0x78 [ 5.452708] rest_init+0x1ac/0x280 [ 5.452711] arch_call_rest_init+0x14/0x1c [ 5.452714] start_kernel+0x50c/0x544
full test log link, https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.9-rc3/testrun/...
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use rcu_read_lock() from the idle loop without use of something like RCU_NONIDLE(). If this is due to event tracing, you should use the _rcuidle() variant of the event trace statement.
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Thanx, Paul
[ 5.437589] cpu_startup_entry+0x2c/0x78 [ 5.441063] rest_init+0x1ac/0x280 [ 5.444970] arch_call_rest_init+0x14/0x1c [ 5.448180] start_kernel+0x50c/0x544 [ 5.452395] [ 5.452399] [ 5.452403] ============================= [ 5.452406] WARNING: suspicious RCU usage [ 5.452409] 5.9.0-rc3 #1 Not tainted [ 5.452412] ----------------------------- [ 5.452417] /usr/src/kernel/include/trace/events/ipi.h:36 suspicious rcu_dereference_check() usage! [ 5.452420] [ 5.452424] other info that might help us debug this: [ 5.452426] [ 5.452429] [ 5.452432] rcu_scheduler_active = 2, debug_locks = 1 [ 5.452436] RCU used illegally from extended quiescent state! [ 5.452440] 1 lock held by swapper/0/0: [ 5.452443] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at: vprintk_emit+0xb0/0x358 [ 5.452458] [ 5.452461] stack backtrace: [ 5.452465] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.452469] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.452472] Call trace: [ 5.452476] dump_backtrace+0x0/0x1f8 [ 5.452479] show_stack+0x2c/0x38 [ 5.452481] dump_stack+0xec/0x15c [ 5.452485] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.452489] arch_irq_work_raise+0x208/0x210 [ 5.452493] __irq_work_queue_local+0x64/0x88 [ 5.452495] irq_work_queue+0x3c/0x88 [ 5.452499] printk_safe_log_store+0x148/0x178 [ 5.452502] vprintk_func+0x1cc/0x2b8 [ 5.452506] printk+0x74/0x94 [ 5.452509] lockdep_rcu_suspicious+0x28/0xf8 [ 5.452512] lock_release+0x338/0x360 [ 5.452516] _raw_spin_unlock+0x3c/0xa0 [ 5.452519] vprintk_emit+0xf8/0x358 [ 5.452522] vprintk_default+0x48/0x58 [ 5.452526] vprintk_func+0xec/0x2b8 [ 5.452528] printk+0x74/0x94 [ 5.452532] lockdep_rcu_suspicious+0x28/0xf8 [ 5.452535] lock_acquire+0x3d0/0x440 [ 5.452538] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.452542] __pm_runtime_suspend+0x34/0x1d0 [ 5.452545] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.452549] cpuidle_enter_state+0xc8/0x610 [ 5.452552] cpuidle_enter+0x3c/0x50 [ 5.452555] call_cpuidle+0x44/0x80 [ 5.452559] do_idle+0x240/0x2a0 [ 5.452562] cpu_startup_entry+0x2c/0x78 [ 5.452564] rest_init+0x1ac/0x280 [ 5.452568] arch_call_rest_init+0x14/0x1c [ 5.452571] start_kernel+0x50c/0x544 [ 5.452575] ============================= [ 5.452578] WARNING: suspicious RCU usage [ 5.452582] 5.9.0-rc3 #1 Not tainted [ 5.452585] ----------------------------- [ 5.452590] /usr/src/kernel/include/trace/events/lock.h:63 suspicious rcu_dereference_check() usage! [ 5.452593] [ 5.452596] other info that might help us debug this: [ 5.452599] [ 5.452601] [ 5.452605] rcu_scheduler_active = 2, debug_locks = 1 [ 5.452609] RCU used illegally from extended quiescent state! [ 5.452612] 1 lock held by swapper/0/0: [ 5.452615] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at: vprintk_emit+0xb0/0x358 [ 5.452630] [ 5.452633] stack backtrace: [ 5.452636] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.452640] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.452643] Call trace: [ 5.452646] dump_backtrace+0x0/0x1f8 [ 5.452649] show_stack+0x2c/0x38 [ 5.452652] dump_stack+0xec/0x15c [ 5.452656] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.452659] lock_release+0x338/0x360 [ 5.452662] _raw_spin_unlock+0x3c/0xa0 [ 5.452665] vprintk_emit+0xf8/0x358 [ 5.452669] vprintk_default+0x48/0x58 [ 5.452671] vprintk_func+0xec/0x2b8 [ 5.452674] printk+0x74/0x94 [ 5.452677] lockdep_rcu_suspicious+0x28/0xf8 [ 5.452680] lock_acquire+0x3d0/0x440 [ 5.452683] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.452686] __pm_runtime_suspend+0x34/0x1d0 [ 5.452690] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.452693] cpuidle_enter_state+0xc8/0x610 [ 5.452696] cpuidle_enter+0x3c/0x50 [ 5.452698] call_cpuidle+0x44/0x80 [ 5.452701] do_idle+0x240/0x2a0 [ 5.452704] cpu_startup_entry+0x2c/0x78 [ 5.452708] rest_init+0x1ac/0x280 [ 5.452711] arch_call_rest_init+0x14/0x1c [ 5.452714] start_kernel+0x50c/0x544
full test log link, https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.9-rc3/testrun/...
-- Linaro LKFT https://lkft.linaro.org
+ Saravanna, Rafael, Lina
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use rcu_read_lock() from the idle loop without use of something like RCU_NONIDLE(). If this is due to event tracing, you should use the _rcuidle() variant of the event trace statement.
In the runtime suspend path, the runtime PM core calls device_links_read_lock() - if the device in question has any links to suppliers (to allow them to be suspended too).
device_links_read_lock() calls srcu_read_lock().
It turns out that the device in question (the CPU device that is attached to genpd) didn't have any links before - but that seems to have changed, due to the work done by Saravana (links become created on a per resource basis, parsed from DT during boot).
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
[...]
Kind regards Uffe
+ Re-adding Peter (seems like the original address was wrong)
On Tue, 1 Sep 2020 at 08:46, Ulf Hansson ulf.hansson@linaro.org wrote:
- Saravanna, Rafael, Lina
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use rcu_read_lock() from the idle loop without use of something like RCU_NONIDLE(). If this is due to event tracing, you should use the _rcuidle() variant of the event trace statement.
In the runtime suspend path, the runtime PM core calls device_links_read_lock() - if the device in question has any links to suppliers (to allow them to be suspended too).
device_links_read_lock() calls srcu_read_lock().
It turns out that the device in question (the CPU device that is attached to genpd) didn't have any links before - but that seems to have changed, due to the work done by Saravana (links become created on a per resource basis, parsed from DT during boot).
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
[...]
Kind regards Uffe
On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 08:46, Ulf Hansson ulf.hansson@linaro.org wrote:
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
[ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Right, so I think I 'caused' this by making the lock tracepoints visible. That is, the error always existed, now we actually warn about it.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we got the ordering wrong and are papering over it. That said, that's been the modus operandi for a while now, just make it shut up and don't think about it :-/
That said; I pushed the rcu_idle_enter() about as deep as it goes into generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
I suppose the next step is pushing it into individual driver when needed, something like the below perhaps. I realize the coupled idle state stuff is more complicated that most, but it's also not an area I've looked at in detail, so perhaps I've just made a bigger mess, but it ought to give you enough to get going I think.
Rafael?
--- diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c index 74463841805f..617bbef316e6 100644 --- a/drivers/cpuidle/cpuidle-psci.c +++ b/drivers/cpuidle/cpuidle-psci.c @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)
static inline int psci_enter_state(int idx, u32 state) { + /* + * XXX push rcu_idle_enter into the coupled code + */ return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state); }
@@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, if (!state) state = states[idx];
+ rcu_idle_enter(); ret = psci_cpu_suspend_enter(state) ? -1 : idx; + rcu_idle_exit();
pm_runtime_get_sync(pd_dev);
@@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, int idx) { u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states); + int ret;
- return psci_enter_state(idx, state[idx]); + rcu_idle_enter(); + ret = psci_enter_state(idx, state[idx]); + rcu_idle_exit(); + + return ret; }
static const struct of_device_id psci_idle_state_match[] = { @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv, * deeper states. */ drv->states[state_count - 1].enter = psci_enter_domain_idle_state; + drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE; psci_cpuidle_use_cpuhp = true;
return 0; @@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu) * state index 0. */ drv->states[0].enter = psci_enter_idle_state; + drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE; drv->states[0].exit_latency = 1; drv->states[0].target_residency = 1; drv->states[0].power_usage = UINT_MAX; diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 04becd70cc41..3dbac3bb761b 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, time_start = ns_to_ktime(local_clock());
stop_critical_timings(); - rcu_idle_enter(); + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) + rcu_idle_enter(); entered_state = target_state->enter(dev, drv, index); - rcu_idle_exit(); + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) + rcu_idle_exit(); start_critical_timings();
sched_clock_idle_wakeup_event(); diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 75895e6363b8..47f686131a54 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -82,6 +82,7 @@ struct cpuidle_state { #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */ #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */ #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */ +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */
struct cpuidle_device_kobj; struct cpuidle_state_kobj;
On Tue, 1 Sep 2020 at 12:42, peterz@infradead.org wrote:
On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 08:46, Ulf Hansson ulf.hansson@linaro.org wrote:
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
[ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Right, so I think I 'caused' this by making the lock tracepoints visible. That is, the error always existed, now we actually warn about it.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we got the ordering wrong and are papering over it. That said, that's been the modus operandi for a while now, just make it shut up and don't think about it :-/
That said; I pushed the rcu_idle_enter() about as deep as it goes into generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
Aha, that commit should fix this problem, I think. Looks like that commit was sent as a fix and included in the recent v5.9-rc3.
Naresh, can you try with the above commit?
I suppose the next step is pushing it into individual driver when needed, something like the below perhaps. I realize the coupled idle state stuff is more complicated that most, but it's also not an area I've looked at in detail, so perhaps I've just made a bigger mess, but it ought to give you enough to get going I think.
These aren't coupled states. Instead, in cpuidle-psci, we are using PM domains through genpd and runtime PM to manage "shared idle states" between CPUs.
Kind regards Uffe
Rafael?
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c index 74463841805f..617bbef316e6 100644 --- a/drivers/cpuidle/cpuidle-psci.c +++ b/drivers/cpuidle/cpuidle-psci.c @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)
static inline int psci_enter_state(int idx, u32 state) {
/*
* XXX push rcu_idle_enter into the coupled code
*/ return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
}
@@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, if (!state) state = states[idx];
rcu_idle_enter(); ret = psci_cpu_suspend_enter(state) ? -1 : idx;
rcu_idle_exit(); pm_runtime_get_sync(pd_dev);
@@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, int idx) { u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
int ret;
return psci_enter_state(idx, state[idx]);
rcu_idle_enter();
ret = psci_enter_state(idx, state[idx]);
rcu_idle_exit();
return ret;
}
static const struct of_device_id psci_idle_state_match[] = { @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv, * deeper states. */ drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE; psci_cpuidle_use_cpuhp = true; return 0;
@@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu) * state index 0. */ drv->states[0].enter = psci_enter_idle_state;
drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE; drv->states[0].exit_latency = 1; drv->states[0].target_residency = 1; drv->states[0].power_usage = UINT_MAX;
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 04becd70cc41..3dbac3bb761b 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, time_start = ns_to_ktime(local_clock());
stop_critical_timings();
rcu_idle_enter();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter(); entered_state = target_state->enter(dev, drv, index);
rcu_idle_exit();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_exit(); start_critical_timings(); sched_clock_idle_wakeup_event();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 75895e6363b8..47f686131a54 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -82,6 +82,7 @@ struct cpuidle_state { #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */ #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */ #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */ +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */
struct cpuidle_device_kobj; struct cpuidle_state_kobj;
On Tue, 1 Sep 2020 at 14:35, Ulf Hansson ulf.hansson@linaro.org wrote:
On Tue, 1 Sep 2020 at 12:42, peterz@infradead.org wrote:
On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 08:46, Ulf Hansson ulf.hansson@linaro.org wrote:
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
[ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Right, so I think I 'caused' this by making the lock tracepoints visible. That is, the error always existed, now we actually warn about it.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we got the ordering wrong and are papering over it. That said, that's been the modus operandi for a while now, just make it shut up and don't think about it :-/
That said; I pushed the rcu_idle_enter() about as deep as it goes into generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
Aha, that commit should fix this problem, I think. Looks like that commit was sent as a fix and included in the recent v5.9-rc3.
Naresh, can you try with the above commit?
Ah, just realized that I misread the patch. It doesn't fix it.
We would still need a RCU_NONIDLE() in psci_enter_domain_idle_state() - or something along the lines of what you suggest below.
Apologies for the noise.
Kind regards Uffe
I suppose the next step is pushing it into individual driver when needed, something like the below perhaps. I realize the coupled idle state stuff is more complicated that most, but it's also not an area I've looked at in detail, so perhaps I've just made a bigger mess, but it ought to give you enough to get going I think.
These aren't coupled states. Instead, in cpuidle-psci, we are using PM domains through genpd and runtime PM to manage "shared idle states" between CPUs.
Kind regards Uffe
Rafael?
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c index 74463841805f..617bbef316e6 100644 --- a/drivers/cpuidle/cpuidle-psci.c +++ b/drivers/cpuidle/cpuidle-psci.c @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)
static inline int psci_enter_state(int idx, u32 state) {
/*
* XXX push rcu_idle_enter into the coupled code
*/ return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
}
@@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, if (!state) state = states[idx];
rcu_idle_enter(); ret = psci_cpu_suspend_enter(state) ? -1 : idx;
rcu_idle_exit(); pm_runtime_get_sync(pd_dev);
@@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, int idx) { u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
int ret;
return psci_enter_state(idx, state[idx]);
rcu_idle_enter();
ret = psci_enter_state(idx, state[idx]);
rcu_idle_exit();
return ret;
}
static const struct of_device_id psci_idle_state_match[] = { @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv, * deeper states. */ drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE; psci_cpuidle_use_cpuhp = true; return 0;
@@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu) * state index 0. */ drv->states[0].enter = psci_enter_idle_state;
drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE; drv->states[0].exit_latency = 1; drv->states[0].target_residency = 1; drv->states[0].power_usage = UINT_MAX;
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 04becd70cc41..3dbac3bb761b 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, time_start = ns_to_ktime(local_clock());
stop_critical_timings();
rcu_idle_enter();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter(); entered_state = target_state->enter(dev, drv, index);
rcu_idle_exit();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_exit(); start_critical_timings(); sched_clock_idle_wakeup_event();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 75895e6363b8..47f686131a54 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -82,6 +82,7 @@ struct cpuidle_state { #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */ #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */ #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */ +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */
struct cpuidle_device_kobj; struct cpuidle_state_kobj;
Hi Ulf,
On Tue, Sep 01 2020 at 06:41 -0600, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 14:35, Ulf Hansson ulf.hansson@linaro.org wrote:
On Tue, 1 Sep 2020 at 12:42, peterz@infradead.org wrote:
On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 08:46, Ulf Hansson ulf.hansson@linaro.org wrote:
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
> [ 5.308588] ============================= > [ 5.308593] WARNING: suspicious RCU usage > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper > [ 5.320052] 5.9.0-rc3 #1 Not tainted > [ 5.320057] ----------------------------- > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! > [ 5.320068] > [ 5.320068] other info that might help us debug this: > [ 5.320068] > [ 5.320074] > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 > [ 5.320078] RCU used illegally from extended quiescent state! > [ 5.320084] no locks held by swapper/0/0. > [ 5.320089] > [ 5.320089] stack backtrace: > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > [ 5.346452] Call trace: > [ 5.346463] dump_backtrace+0x0/0x1f8 > [ 5.346471] show_stack+0x2c/0x38 > [ 5.346480] dump_stack+0xec/0x15c > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 > [ 5.346499] lock_acquire+0x3d0/0x440 > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 > [ 5.421795] cpuidle_enter_state+0xc8/0x610 > [ 5.426392] cpuidle_enter+0x3c/0x50 > [ 5.430561] call_cpuidle+0x44/0x80 > [ 5.434378] do_idle+0x240/0x2a0
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Right, so I think I 'caused' this by making the lock tracepoints visible. That is, the error always existed, now we actually warn about it.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
I think this would be nice. This should also cover the case, where PM domain power off notification callbacks call trace function internally. Right?
--Lina
Or do you have any other ideas?
So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we got the ordering wrong and are papering over it. That said, that's been the modus operandi for a while now, just make it shut up and don't think about it :-/
That said; I pushed the rcu_idle_enter() about as deep as it goes into generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
Aha, that commit should fix this problem, I think. Looks like that commit was sent as a fix and included in the recent v5.9-rc3.
Naresh, can you try with the above commit?
Ah, just realized that I misread the patch. It doesn't fix it.
We would still need a RCU_NONIDLE() in psci_enter_domain_idle_state()
- or something along the lines of what you suggest below.
Apologies for the noise.
Kind regards Uffe
[1]. https://lkml.org/lkml/2020/8/26/81
I suppose the next step is pushing it into individual driver when needed, something like the below perhaps. I realize the coupled idle state stuff is more complicated that most, but it's also not an area I've looked at in detail, so perhaps I've just made a bigger mess, but it ought to give you enough to get going I think.
These aren't coupled states. Instead, in cpuidle-psci, we are using PM domains through genpd and runtime PM to manage "shared idle states" between CPUs.
Kind regards Uffe
Rafael?
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c index 74463841805f..617bbef316e6 100644 --- a/drivers/cpuidle/cpuidle-psci.c +++ b/drivers/cpuidle/cpuidle-psci.c @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)
static inline int psci_enter_state(int idx, u32 state) {
/*
* XXX push rcu_idle_enter into the coupled code
*/ return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
}
@@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, if (!state) state = states[idx];
rcu_idle_enter(); ret = psci_cpu_suspend_enter(state) ? -1 : idx;
rcu_idle_exit(); pm_runtime_get_sync(pd_dev);
@@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, int idx) { u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
int ret;
return psci_enter_state(idx, state[idx]);
rcu_idle_enter();
ret = psci_enter_state(idx, state[idx]);
rcu_idle_exit();
return ret;
}
static const struct of_device_id psci_idle_state_match[] = { @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv, * deeper states. */ drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE; psci_cpuidle_use_cpuhp = true; return 0;
@@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu) * state index 0. */ drv->states[0].enter = psci_enter_idle_state;
drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE; drv->states[0].exit_latency = 1; drv->states[0].target_residency = 1; drv->states[0].power_usage = UINT_MAX;
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 04becd70cc41..3dbac3bb761b 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, time_start = ns_to_ktime(local_clock());
stop_critical_timings();
rcu_idle_enter();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter(); entered_state = target_state->enter(dev, drv, index);
rcu_idle_exit();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_exit(); start_critical_timings(); sched_clock_idle_wakeup_event();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 75895e6363b8..47f686131a54 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -82,6 +82,7 @@ struct cpuidle_state { #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */ #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */ #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */ +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */
struct cpuidle_device_kobj; struct cpuidle_state_kobj;
On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
I think this would be nice. This should also cover the case, where PM domain power off notification callbacks call trace function internally. Right?
That's just more crap for me to clean up later :-(
trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.
On Tue, Sep 01, 2020 at 05:50:14PM +0200, peterz@infradead.org wrote:
On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
> I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps > that's the easiest approach, at least to start with.
I think this would be nice. This should also cover the case, where PM domain power off notification callbacks call trace function internally. Right?
That's just more crap for me to clean up later :-(
trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.
Moving the idle-entry boundary further in is good in any number of ways. But experience indicates that no matter how far you move it, there will be something complex further in. Unless you are pushing it all the way into all the arch-specific code down as far as it can possibly go?
Thanx, Paul
On Tue, Sep 01, 2020 at 09:13:40AM -0700, Paul E. McKenney wrote:
On Tue, Sep 01, 2020 at 05:50:14PM +0200, peterz@infradead.org wrote:
On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
> > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps > > that's the easiest approach, at least to start with.
I think this would be nice. This should also cover the case, where PM domain power off notification callbacks call trace function internally. Right?
That's just more crap for me to clean up later :-(
trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.
Moving the idle-entry boundary further in is good in any number of ways. But experience indicates that no matter how far you move it, there will be something complex further in. Unless you are pushing it all the way into all the arch-specific code down as far as it can possibly go?
Not all; the simple cpuidle drivers should be good already. The more complicated ones need some help.
The patch provided earlier:
https://lkml.kernel.org/r/20200901104206.GU1362448@hirez.programming.kicks-a...
should allow the complicated drivers to take over and DTRT.
On Tue, 1 Sep 2020 at 19:42, Peter Zijlstra peterz@infradead.org wrote:
On Tue, Sep 01, 2020 at 09:13:40AM -0700, Paul E. McKenney wrote:
On Tue, Sep 01, 2020 at 05:50:14PM +0200, peterz@infradead.org wrote:
On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
> > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps > > > that's the easiest approach, at least to start with.
I think this would be nice. This should also cover the case, where PM domain power off notification callbacks call trace function internally. Right?
That's just more crap for me to clean up later :-(
trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.
Moving the idle-entry boundary further in is good in any number of ways. But experience indicates that no matter how far you move it, there will be something complex further in. Unless you are pushing it all the way into all the arch-specific code down as far as it can possibly go?
Not all; the simple cpuidle drivers should be good already. The more complicated ones need some help.
The patch provided earlier:
https://lkml.kernel.org/r/20200901104206.GU1362448@hirez.programming.kicks-a...
should allow the complicated drivers to take over and DTRT.
Don't get me wrong, I fully support your approach by moving the rcu_idle_enter() down as far as possible, but it seems to require more work than just adding a simple flag for the idle states.
Lots of cpuidle drivers are using CPU_PM notifiers (grep for cpu_pm_enter and you will see) from their idlestates ->enter() callbacks. And for those we are already calling rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.
Kind regards Uffe
On Wed, Sep 02, 2020 at 09:03:37AM +0200, Ulf Hansson wrote:
Lots of cpuidle drivers are using CPU_PM notifiers (grep for cpu_pm_enter and you will see) from their idlestates ->enter() callbacks. And for those we are already calling rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.
Yeah, that particular trainwreck is on my todo list already ... then again, that list is forever overflowing.
I'm thinking cpu_pm_unregister_notifier() is not a common thing? The few I looked at seem to suggest 'never' is a good approximation.
It would be fairly trivial to replace the atomic_notifier usage with a raw_notifier a lock and either stop-machine or IPIs. Better still would be if we can get rid of it entirely, but I can't tell in a hurry if that is possible.
On Wed, 2 Sep 2020 at 14:14, peterz@infradead.org wrote:
On Wed, Sep 02, 2020 at 09:03:37AM +0200, Ulf Hansson wrote:
Lots of cpuidle drivers are using CPU_PM notifiers (grep for cpu_pm_enter and you will see) from their idlestates ->enter() callbacks. And for those we are already calling rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.
Yeah, that particular trainwreck is on my todo list already ... then again, that list is forever overflowing.
I'm thinking cpu_pm_unregister_notifier() is not a common thing? The few I looked at seem to suggest 'never' is a good approximation.
The trend is that drivers are turning into regular modules that may also need to manage "->remove()", which may mean unregistering the notifier. Of course, I don't know for sure whether that becomes a problem, but it seems quite limiting.
It would be fairly trivial to replace the atomic_notifier usage with a raw_notifier a lock and either stop-machine or IPIs. Better still would be if we can get rid of it entirely, but I can't tell in a hurry if that is possible.
Okay, let's see.
In any case, I was thinking that the patch with CPU idle flag, for letting CPU idle drivers deal with RCU, that you proposed, seems like a good first step.
At least it should enable us to solve the problem for runtime PM in psci_enter_domain_idle_state(). Let me update the patch and send it out, then we can continue the discussion over there.
Kind regards Uffe
On Wed, Sep 02, 2020 at 05:58:55PM +0200, Ulf Hansson wrote:
On Wed, 2 Sep 2020 at 14:14, peterz@infradead.org wrote:
On Wed, Sep 02, 2020 at 09:03:37AM +0200, Ulf Hansson wrote:
Lots of cpuidle drivers are using CPU_PM notifiers (grep for cpu_pm_enter and you will see) from their idlestates ->enter() callbacks. And for those we are already calling rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.
Yeah, that particular trainwreck is on my todo list already ... then again, that list is forever overflowing.
I'm thinking cpu_pm_unregister_notifier() is not a common thing? The few I looked at seem to suggest 'never' is a good approximation.
The trend is that drivers are turning into regular modules that may also need to manage "->remove()", which may mean unregistering the notifier. Of course, I don't know for sure whether that becomes a problem, but it seems quite limiting.
You can pin modules, once they're loaded they can never be removed again.
Anyway, the below should 'work', I think.
--- diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c index f7e1d0eccdbc..72804e0883d5 100644 --- a/kernel/cpu_pm.c +++ b/kernel/cpu_pm.c @@ -12,21 +12,18 @@ #include <linux/notifier.h> #include <linux/spinlock.h> #include <linux/syscore_ops.h> +#include <linux/cpu.h> +#include <linux/smp.h>
-static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain); +static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain); +static DEFINE_SPINLOCK(cpu_pm_lock);
static int cpu_pm_notify(enum cpu_pm_event event) { int ret;
- /* - * atomic_notifier_call_chain has a RCU read critical section, which - * could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let - * RCU know this. - */ - rcu_irq_enter_irqson(); - ret = atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL); - rcu_irq_exit_irqson(); + lockdep_assert_irqs_disabled(); + ret = raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);
return notifier_to_errno(ret); } @@ -35,9 +32,8 @@ static int cpu_pm_notify_robust(enum cpu_pm_event event_up, enum cpu_pm_event ev { int ret;
- rcu_irq_enter_irqson(); - ret = atomic_notifier_call_chain_robust(&cpu_pm_notifier_chain, event_up, event_down, NULL); - rcu_irq_exit_irqson(); + lockdep_assert_irqs_disabled(); + ret = raw_notifier_call_chain_robust(&cpu_pm_notifier_chain, event_up, event_down, NULL);
return notifier_to_errno(ret); } @@ -54,10 +50,28 @@ static int cpu_pm_notify_robust(enum cpu_pm_event event_up, enum cpu_pm_event ev */ int cpu_pm_register_notifier(struct notifier_block *nb) { - return atomic_notifier_chain_register(&cpu_pm_notifier_chain, nb); + unsigned long flags; + int ret; + + spin_lock_irqsave(&cpu_pm_lock, flags); + ret = raw_notifier_chain_register(&cpu_pm_notifier_chain, nb); + spin_unlock_irqrestore(&cpu_pm_lock, flags); + + return ret; } EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
+static bool __is_idle_cpu(int cpu, void *info) +{ + /* + * Racy as heck, however if we fail to see an idle task, it must be + * after we removed our element, so all is fine. + */ + return is_idle_task(curr_task(cpu)); +} + +static void __nop_func(void *arg) { } + /** * cpu_pm_unregister_notifier - unregister a driver with cpu_pm * @nb: notifier block to be unregistered @@ -69,7 +83,30 @@ EXPORT_SYMBOL_GPL(cpu_pm_register_notifier); */ int cpu_pm_unregister_notifier(struct notifier_block *nb) { - return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb); + unsigned long flags; + int ret, cpu; + + spin_lock_irqsave(&cpu_pm_lock, flags); + ret = raw_notifier_chain_unregister(&cpu_pm_notifier_chain, nb); + spin_unlock_irqrestore(&cpu_pm_lock, flags); + + /* + * Orders the removal above vs the __is_idle_cpu() test below. Matches + * schedule() switching to the idle task. + * + * Ensures that if we miss an idle task, it must be after the removal. + */ + smp_mb(); + + /* + * IPI all idle CPUs, this guarantees that no CPU is currently + * iterating the notifier list. + */ + cpus_read_lock(); + on_each_cpu_cond(__is_idle_cpu, __nop_func, NULL, 1); + cpus_read_unlock(); + + return ret; } EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier);
On Thu, 3 Sep 2020 at 15:53, peterz@infradead.org wrote:
On Wed, Sep 02, 2020 at 05:58:55PM +0200, Ulf Hansson wrote:
On Wed, 2 Sep 2020 at 14:14, peterz@infradead.org wrote:
On Wed, Sep 02, 2020 at 09:03:37AM +0200, Ulf Hansson wrote:
Lots of cpuidle drivers are using CPU_PM notifiers (grep for cpu_pm_enter and you will see) from their idlestates ->enter() callbacks. And for those we are already calling rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.
Yeah, that particular trainwreck is on my todo list already ... then again, that list is forever overflowing.
I'm thinking cpu_pm_unregister_notifier() is not a common thing? The few I looked at seem to suggest 'never' is a good approximation.
The trend is that drivers are turning into regular modules that may also need to manage "->remove()", which may mean unregistering the notifier. Of course, I don't know for sure whether that becomes a problem, but it seems quite limiting.
You can pin modules, once they're loaded they can never be removed again.
Anyway, the below should 'work', I think.
diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c index f7e1d0eccdbc..72804e0883d5 100644 --- a/kernel/cpu_pm.c +++ b/kernel/cpu_pm.c @@ -12,21 +12,18 @@ #include <linux/notifier.h> #include <linux/spinlock.h> #include <linux/syscore_ops.h> +#include <linux/cpu.h> +#include <linux/smp.h>
-static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain); +static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain); +static DEFINE_SPINLOCK(cpu_pm_lock);
static int cpu_pm_notify(enum cpu_pm_event event) { int ret;
/*
* atomic_notifier_call_chain has a RCU read critical section, which
* could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
* RCU know this.
*/
rcu_irq_enter_irqson();
ret = atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);
rcu_irq_exit_irqson();
lockdep_assert_irqs_disabled();
Nitpick, maybe the lockdep should be moved to a separate patch.
ret = raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);
Converting to raw_notifiers seems reasonable - if we need to avoid the RCU usage.
My point is, I wonder about if the notifier callbacks themselves are safe from RCU usage. For example, I would not be surprised if tracing is happening behind them.
Moreover, I am not sure that we really need to prevent and limit tracing from happening. Instead we could push rcu_idle_enter|exit() further down to the arch specific code in the cpuidle drivers, as you kind of all proposed earlier.
In this way, we can step by step, move to a new "version" of cpu_pm_enter() that doesn't have to deal with rcu_irq_enter_irqson(), because RCU hasn't been pushed to idle yet.
[...]
Kind regards Uffe
On Thu, Sep 03, 2020 at 04:36:35PM +0200, Ulf Hansson wrote:
On Thu, 3 Sep 2020 at 15:53, peterz@infradead.org wrote:
static int cpu_pm_notify(enum cpu_pm_event event) { int ret;
lockdep_assert_irqs_disabled();
Nitpick, maybe the lockdep should be moved to a separate patch.
Well, the unregister relies on IRQs being disabled here, so I figured asserting this was a good thing ;-)
Starting the audit below, this might not in fact be true, which then invalidates the unregister implementation. In particular the notifier in arch/arm/kernel/hw_breakpoint.c seems to unconditionally enable IRQs.
ret = raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);
Converting to raw_notifiers seems reasonable - if we need to avoid the RCU usage.
My point is, I wonder about if the notifier callbacks themselves are safe from RCU usage. For example, I would not be surprised if tracing is happening behind them.
A bunch of them seem to call into the clk domain stuff, and I think there's tracepoints in that.
Moreover, I am not sure that we really need to prevent and limit tracing from happening. Instead we could push rcu_idle_enter|exit() further down to the arch specific code in the cpuidle drivers, as you kind of all proposed earlier.
Well, at some point the CPU is in a really dodgy state, ISTR there being ARM platforms where you have to manually leave the cache coherency fabric and all sorts of insanity. There should be a definite cut-off on tracing before that.
Also, what is the point of all this clock and power domain callbacks, if not to put the CPU into an extremely low power state, surely you want to limit the amount of code that's ran when the CPU is in such a state.
In this way, we can step by step, move to a new "version" of cpu_pm_enter() that doesn't have to deal with rcu_irq_enter_irqson(), because RCU hasn't been pushed to idle yet.
That should be easy enough to audit. The thing is that mainline is now generating (debug) splats, and some people are upset with this.
If you're ok with ARM not being lockdep clean while this is being reworked I'm perfectly fine with that.
(There used to be a separate CONFIG for RCU-lockdep, but that seems to have been removed)
On Thu, Sep 03, 2020 at 05:08:19PM +0200, peterz@infradead.org wrote:
On Thu, Sep 03, 2020 at 04:36:35PM +0200, Ulf Hansson wrote:
On Thu, 3 Sep 2020 at 15:53, peterz@infradead.org wrote:
static int cpu_pm_notify(enum cpu_pm_event event) { int ret;
lockdep_assert_irqs_disabled();
Nitpick, maybe the lockdep should be moved to a separate patch.
Well, the unregister relies on IRQs being disabled here, so I figured asserting this was a good thing ;-)
Starting the audit below, this might not in fact be true, which then invalidates the unregister implementation. In particular the notifier in arch/arm/kernel/hw_breakpoint.c seems to unconditionally enable IRQs.
ret = raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);
Converting to raw_notifiers seems reasonable - if we need to avoid the RCU usage.
My point is, I wonder about if the notifier callbacks themselves are safe from RCU usage. For example, I would not be surprised if tracing is happening behind them.
A bunch of them seem to call into the clk domain stuff, and I think there's tracepoints in that.
Moreover, I am not sure that we really need to prevent and limit tracing from happening. Instead we could push rcu_idle_enter|exit() further down to the arch specific code in the cpuidle drivers, as you kind of all proposed earlier.
Well, at some point the CPU is in a really dodgy state, ISTR there being ARM platforms where you have to manually leave the cache coherency fabric and all sorts of insanity. There should be a definite cut-off on tracing before that.
Also, what is the point of all this clock and power domain callbacks, if not to put the CPU into an extremely low power state, surely you want to limit the amount of code that's ran when the CPU is in such a state.
In this way, we can step by step, move to a new "version" of cpu_pm_enter() that doesn't have to deal with rcu_irq_enter_irqson(), because RCU hasn't been pushed to idle yet.
That should be easy enough to audit. The thing is that mainline is now generating (debug) splats, and some people are upset with this.
If you're ok with ARM not being lockdep clean while this is being reworked I'm perfectly fine with that.
(There used to be a separate CONFIG for RCU-lockdep, but that seems to have been removed)
CONFIG_PROVE_RCU still gates RCU_LOCKDEP_WARN(), but it is now a def_bool that follows CONFIG_PROVE_LOCKING.
It would not be hard to make CONFIG_PROVE_RCU separately settable only for arm, if that would help.
Thanx, Paul
On Thu, 3 Sep 2020 at 17:08, peterz@infradead.org wrote:
On Thu, Sep 03, 2020 at 04:36:35PM +0200, Ulf Hansson wrote:
On Thu, 3 Sep 2020 at 15:53, peterz@infradead.org wrote:
static int cpu_pm_notify(enum cpu_pm_event event) { int ret;
lockdep_assert_irqs_disabled();
Nitpick, maybe the lockdep should be moved to a separate patch.
Well, the unregister relies on IRQs being disabled here, so I figured asserting this was a good thing ;-)
Okay, make sense then.
Starting the audit below, this might not in fact be true, which then invalidates the unregister implementation. In particular the notifier in arch/arm/kernel/hw_breakpoint.c seems to unconditionally enable IRQs.
I see.
ret = raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);
Converting to raw_notifiers seems reasonable - if we need to avoid the RCU usage.
My point is, I wonder about if the notifier callbacks themselves are safe from RCU usage. For example, I would not be surprised if tracing is happening behind them.
A bunch of them seem to call into the clk domain stuff, and I think there's tracepoints in that.
Moreover, I am not sure that we really need to prevent and limit tracing from happening. Instead we could push rcu_idle_enter|exit() further down to the arch specific code in the cpuidle drivers, as you kind of all proposed earlier.
Well, at some point the CPU is in a really dodgy state, ISTR there being ARM platforms where you have to manually leave the cache coherency fabric and all sorts of insanity. There should be a definite cut-off on tracing before that.
That's probably the case for some platforms, but I don't see why we need to make everybody "suffer".
Also, what is the point of all this clock and power domain callbacks, if not to put the CPU into an extremely low power state, surely you want to limit the amount of code that's ran when the CPU is in such a state.
In this way, we can step by step, move to a new "version" of cpu_pm_enter() that doesn't have to deal with rcu_irq_enter_irqson(), because RCU hasn't been pushed to idle yet.
That should be easy enough to audit. The thing is that mainline is now generating (debug) splats, and some people are upset with this.
If you're ok with ARM not being lockdep clean while this is being reworked I'm perfectly fine with that.
I think the splats can easily be fixed. Short term.
Adding RCU_NONIDLE (or similar) around pm_runtime calls in psci_enter_domain_idle_state() does the trick. I have a patch for that, it's tested and ready. Let me send it out.
Perhaps we should just accept that this is needed, as to allow us to move step by step into a better situation, while also avoiding the current splats.
(There used to be a separate CONFIG for RCU-lockdep, but that seems to have been removed)
I don't think that would help. Infrastructure for testing will just enable that Kconfig anyway still complains to us.
Kind regards Uffe
On Tue, Sep 01, 2020 at 02:35:52PM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 12:42, peterz@infradead.org wrote:
That said; I pushed the rcu_idle_enter() about as deep as it goes into generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
Aha, that commit should fix this problem, I think. Looks like that commit was sent as a fix and included in the recent v5.9-rc3.
AFAICT psci_enter_domain_idle_state() is still buggered. All that pm_runtime_*() stuff is using locks.
Look at this:
psci_enter_domain_idle_state() pm_runtime_put_sync_suspend() __pm_runtime_suspend() spin_lock_irqsave(&dev->power.lock, flags);
That's a definite fail after we've done rcu_idle_enter().
I suppose the next step is pushing it into individual driver when needed, something like the below perhaps. I realize the coupled idle state stuff is more complicated that most, but it's also not an area I've looked at in detail, so perhaps I've just made a bigger mess, but it ought to give you enough to get going I think.
These aren't coupled states. Instead, in cpuidle-psci, we are using PM domains through genpd and runtime PM to manage "shared idle states" between CPUs.
Similar problem I'm thinking, 'complicated' stuff.
On Tue, Sep 01, 2020 at 08:46:54AM +0200, Ulf Hansson wrote:
- Saravanna, Rafael, Lina
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use rcu_read_lock() from the idle loop without use of something like RCU_NONIDLE(). If this is due to event tracing, you should use the _rcuidle() variant of the event trace statement.
In the runtime suspend path, the runtime PM core calls device_links_read_lock() - if the device in question has any links to suppliers (to allow them to be suspended too).
device_links_read_lock() calls srcu_read_lock().
Except that it is perfectly legal to invoke srcu_read_lock() from the idle loop. The problem is instead rcu_read_lock() and similar.
It turns out that the device in question (the CPU device that is attached to genpd) didn't have any links before - but that seems to have changed, due to the work done by Saravana (links become created on a per resource basis, parsed from DT during boot).
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
It means that as far as RCU (and only RCU) is concerned there is an exit from idle state for just long enough to execute RCU_NONIDLE()'s argument. This involves an atomic operation on both entry to and exit from RCU_NONIDLE(), which in most cases won't be noticeable. But in some cases you might (for example) want to enclose a loop in RCU_NONIDLE() rather than doing RCU_NONIDLE() on each pass through the loop.
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
Here is the list, though it is early in the morning here:
1. RCU_NONIDLE().
2. Peter's patch, if it turns out to hoist your code out of what RCU considers to be the idle loop.
3. If the problem is trace events, use the _rcuidle() variant of the trace event. Instead of trace_blah(), use trace_blah_rcuidle().
4. Switch from RCU (as in rcu_read_lock()) to SRCU (as in srcu_read_lock()).
5. Take Peter's patch a step further, moving the rcu_idle_enter() and rcu_idle_exit() calls as needed. But please keep in mind that these two functions require that irqs be disabled by their callers.
6. If RCU_NONIDLE() in inconvenient due to early exits and such, you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson() functions that it calls.
Do any of those help?
Thanx, Paul
On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney paulmck@kernel.org wrote:
On Tue, Sep 01, 2020 at 08:46:54AM +0200, Ulf Hansson wrote:
- Saravanna, Rafael, Lina
On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
While booting linux mainline kernel on arm64 db410c this kernel warning noticed.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b git describe: v5.9-rc3 make_kernelversion: 5.9.0-rc3 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkf...
Boot log,
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030] [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host) (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils) 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020 [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC <> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver [ 5.299140] sdhci: Copyright(c) Pierre Ossman [ 5.304313] [ 5.307771] Synopsys Designware Multimedia Card Interface Driver [ 5.308588] ============================= [ 5.308593] WARNING: suspicious RCU usage [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.320052] 5.9.0-rc3 #1 Not tainted [ 5.320057] ----------------------------- [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.320068] [ 5.320068] other info that might help us debug this: [ 5.320068] [ 5.320074] [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 [ 5.320078] RCU used illegally from extended quiescent state! [ 5.320084] no locks held by swapper/0/0. [ 5.320089] [ 5.320089] stack backtrace: [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.346452] Call trace: [ 5.346463] dump_backtrace+0x0/0x1f8 [ 5.346471] show_stack+0x2c/0x38 [ 5.346480] dump_stack+0xec/0x15c [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.346499] lock_acquire+0x3d0/0x440 [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 [ 5.421795] cpuidle_enter_state+0xc8/0x610 [ 5.426392] cpuidle_enter+0x3c/0x50 [ 5.430561] call_cpuidle+0x44/0x80 [ 5.434378] do_idle+0x240/0x2a0
RCU ignores CPUs in the idle loop, which means that you cannot use rcu_read_lock() from the idle loop without use of something like RCU_NONIDLE(). If this is due to event tracing, you should use the _rcuidle() variant of the event trace statement.
In the runtime suspend path, the runtime PM core calls device_links_read_lock() - if the device in question has any links to suppliers (to allow them to be suspended too).
device_links_read_lock() calls srcu_read_lock().
Except that it is perfectly legal to invoke srcu_read_lock() from the idle loop. The problem is instead rcu_read_lock() and similar.
Hmm. Sounds like more debugging is needed then, to narrow down the problem.
It turns out that the device in question (the CPU device that is attached to genpd) didn't have any links before - but that seems to have changed, due to the work done by Saravana (links become created on a per resource basis, parsed from DT during boot).
Note also that Peter Zijlstra (CCed) is working to shrink the portion of the idle loop that RCU ignores. Not sure that it covers your case, but it is worth checking.
Thanks for letting me know. Let's see what Peter thinks about this then.
Apologize for my ignorance, but from a cpuidle point of view, what does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE on bigger code paths?
It means that as far as RCU (and only RCU) is concerned there is an exit from idle state for just long enough to execute RCU_NONIDLE()'s argument. This involves an atomic operation on both entry to and exit from RCU_NONIDLE(), which in most cases won't be noticeable. But in some cases you might (for example) want to enclose a loop in RCU_NONIDLE() rather than doing RCU_NONIDLE() on each pass through the loop.
I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps that's the easiest approach, at least to start with.
Or do you have any other ideas?
Here is the list, though it is early in the morning here:
RCU_NONIDLE().
Peter's patch, if it turns out to hoist your code out of what RCU considers to be the idle loop.
If the problem is trace events, use the _rcuidle() variant of the trace event. Instead of trace_blah(), use trace_blah_rcuidle().
Switch from RCU (as in rcu_read_lock()) to SRCU (as in srcu_read_lock()).
Take Peter's patch a step further, moving the rcu_idle_enter() and rcu_idle_exit() calls as needed. But please keep in mind that these two functions require that irqs be disabled by their callers.
If RCU_NONIDLE() in inconvenient due to early exits and such, you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson() functions that it calls.
Do any of those help?
Yes, they will, in one way or the other. Thanks for providing me with all the available options.
BTW, I still don't get what good rcu_idle_enter|exit() does, but I am assuming those need to be called at some point before the CPU goes to sleep.
Kind regards Uffe
On Wed, Sep 02, 2020 at 08:49:11AM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney paulmck@kernel.org wrote:
[ . . . ]
Here is the list, though it is early in the morning here:
RCU_NONIDLE().
Peter's patch, if it turns out to hoist your code out of what RCU considers to be the idle loop.
If the problem is trace events, use the _rcuidle() variant of the trace event. Instead of trace_blah(), use trace_blah_rcuidle().
Switch from RCU (as in rcu_read_lock()) to SRCU (as in srcu_read_lock()).
Take Peter's patch a step further, moving the rcu_idle_enter() and rcu_idle_exit() calls as needed. But please keep in mind that these two functions require that irqs be disabled by their callers.
If RCU_NONIDLE() in inconvenient due to early exits and such, you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson() functions that it calls.
Do any of those help?
Yes, they will, in one way or the other. Thanks for providing me with all the available options.
BTW, I still don't get what good rcu_idle_enter|exit() does, but I am assuming those need to be called at some point before the CPU goes to sleep.
These functions allow RCU to leave idle CPUs undisturbed. If they were not invoked, RCU would periodically IPI idle CPUs to verify that there were no RCU readers running on them. This would be quite bad for battery lifetime, among other things. So the call to rcu_idle_enter() tells RCU that it may safely completely ignore this CPU until its next call to rcu_idle_exit().
Thanx, Paul
On Wed, 2 Sep 2020 at 15:52, Paul E. McKenney paulmck@kernel.org wrote:
On Wed, Sep 02, 2020 at 08:49:11AM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney paulmck@kernel.org wrote:
[ . . . ]
Here is the list, though it is early in the morning here:
RCU_NONIDLE().
Peter's patch, if it turns out to hoist your code out of what RCU considers to be the idle loop.
If the problem is trace events, use the _rcuidle() variant of the trace event. Instead of trace_blah(), use trace_blah_rcuidle().
Switch from RCU (as in rcu_read_lock()) to SRCU (as in srcu_read_lock()).
Take Peter's patch a step further, moving the rcu_idle_enter() and rcu_idle_exit() calls as needed. But please keep in mind that these two functions require that irqs be disabled by their callers.
If RCU_NONIDLE() in inconvenient due to early exits and such, you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson() functions that it calls.
Do any of those help?
Yes, they will, in one way or the other. Thanks for providing me with all the available options.
BTW, I still don't get what good rcu_idle_enter|exit() does, but I am assuming those need to be called at some point before the CPU goes to sleep.
These functions allow RCU to leave idle CPUs undisturbed. If they were not invoked, RCU would periodically IPI idle CPUs to verify that there were no RCU readers running on them. This would be quite bad for battery lifetime, among other things. So the call to rcu_idle_enter() tells RCU that it may safely completely ignore this CPU until its next call to rcu_idle_exit().
Alright, thanks for explaining this, much appreciated.
So in one way, we would also like to call rcu_idle_enter(), as soon as we know there is no need for the RCU to be active. To prevent unnecessary IPIs I mean. :-)
Kind regards Uffe
On Wed, Sep 02, 2020 at 06:07:05PM +0200, Ulf Hansson wrote:
On Wed, 2 Sep 2020 at 15:52, Paul E. McKenney paulmck@kernel.org wrote:
On Wed, Sep 02, 2020 at 08:49:11AM +0200, Ulf Hansson wrote:
On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney paulmck@kernel.org wrote:
[ . . . ]
Here is the list, though it is early in the morning here:
RCU_NONIDLE().
Peter's patch, if it turns out to hoist your code out of what RCU considers to be the idle loop.
If the problem is trace events, use the _rcuidle() variant of the trace event. Instead of trace_blah(), use trace_blah_rcuidle().
Switch from RCU (as in rcu_read_lock()) to SRCU (as in srcu_read_lock()).
Take Peter's patch a step further, moving the rcu_idle_enter() and rcu_idle_exit() calls as needed. But please keep in mind that these two functions require that irqs be disabled by their callers.
If RCU_NONIDLE() in inconvenient due to early exits and such, you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson() functions that it calls.
Do any of those help?
Yes, they will, in one way or the other. Thanks for providing me with all the available options.
BTW, I still don't get what good rcu_idle_enter|exit() does, but I am assuming those need to be called at some point before the CPU goes to sleep.
These functions allow RCU to leave idle CPUs undisturbed. If they were not invoked, RCU would periodically IPI idle CPUs to verify that there were no RCU readers running on them. This would be quite bad for battery lifetime, among other things. So the call to rcu_idle_enter() tells RCU that it may safely completely ignore this CPU until its next call to rcu_idle_exit().
Alright, thanks for explaining this, much appreciated.
So in one way, we would also like to call rcu_idle_enter(), as soon as we know there is no need for the RCU to be active. To prevent unnecessary IPIs I mean. :-)
Well, the IPIs don't happen until the better part of a second into the grace period. So delaying an rcu_idle_enter() a few microseconds, as Peter Zijlstra is proposing, is absolutely no problem whatsoever. And once the rcu_idle_enter() happens, the RCU grace-period kthread's next scan of the CPUs will see that this CPU needs to be ignored, so no more IPIs for it until it does the next rcu_idle_exit(), rcu_irq_enter(), or any of a number of other things that cause RCU to once again pay attention to that CPU.
Thanx, Paul