On Tue, Jul 26, 2022 at 01:50:06PM +0100, Mark Rutland wrote:
On Tue, Jul 26, 2022 at 01:40:40PM +0100, Mark Rutland wrote:
[Adding Peter; I suspect this is due to the cpuidle rework]
Looking again I see the cpuidle rework isn't in next, so evidently not...
Sorry for the noise!
I'd like to loop in Mike.L and CoreSight ML for CTI PM callbacks. Please see below a comment for CTI spinlock usage.
I'll go give next a spin in a VM, but I suspect I might need real HW to see this due to the way PSCI idle states work.
Mark.
On Tue, Jul 26, 2022 at 11:41:34AM +0100, Sudeep Holla wrote:
I was seeing the below lockdep warnings on my arm64 Juno development platform almost 2 weeks back with -next. I wanted to check for similar reports before post and forgot.
--->8
DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) hardirqs last enabled at (46157): cpuidle_enter_state+0x174/0x2b4 WARNING: CPU: 5 PID: 0 at kernel/locking/lockdep.c:5506 check_flags+0x90/0x1e8 hardirqs last disabled at (46158): el1_interrupt+0x2c/0xc8 Modules linked in: softirqs last enabled at (46154): __do_softirq+0x2c0/0x388 softirqs last disabled at (46139): __irq_exit_rcu+0x118/0x18c CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.19.0-rc6-next-20220714 #9 pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : check_flags+0x90/0x1e8 lr : check_flags+0x90/0x1e8 Call trace: check_flags+0x90/0x1e8 lock_is_held_type+0x80/0x164 rcu_read_lock_sched_held+0x40/0x7c trace_rcu_dyntick+0x5c/0x140 ct_kernel_enter+0x78/0xd4 ct_idle_exit+0x1c/0x44 cpu_idle_poll+0x74/0xb8 do_idle+0x90/0x2c4 cpu_startup_entry+0x30/0x34 secondary_start_kernel+0x130/0x144 __secondary_switched+0xb0/0xb4 irq event stamp: 64229 hardirqs last enabled at (64229): cpu_idle_poll+0x40/0xb8 hardirqs last disabled at (64228): do_idle+0xbc/0x2c4 softirqs last enabled at (64190): __do_softirq+0x2c0/0x388 softirqs last disabled at (64185): __irq_exit_rcu+0x118/0x18c ---[ end trace 0000000000000000 ]--- possible reason: unannotated irqs-off. irq event stamp: 64229 hardirqs last enabled at (64229): cpu_idle_poll+0x40/0xb8 hardirqs last disabled at (64228): do_idle+0xbc/0x2c4 softirqs last enabled at (64190): __do_softirq+0x2c0/0x388 softirqs last disabled at (64185): __irq_exit_rcu+0x118/0x18c
However I don't see the above warning with the latest -next. When I tried yesterday's -next now, I see a different warning. Not sure if they are related. I haven't tried to bisect.
--->8
[ BUG: Invalid wait context ] 5.19.0-rc8-next-20220725 #38 Not tainted
swapper/0/0 is trying to lock: (&drvdata->spinlock){....}-{3:3}, at: cti_cpu_pm_notify+0x54/0x114 other info that might help us debug this: context-{5:5} 1 lock held by swapper/0/0: #0: (cpu_pm_notifier.lock){....}-{2:2}, at: cpu_pm_enter+0x2c/0x80 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc8-next-20220725-00004-g599e6691ed8c #38 Call trace: dump_backtrace+0xe8/0x108 show_stack+0x18/0x4c dump_stack_lvl+0x90/0xc8 dump_stack+0x18/0x54 __lock_acquire+0xa70/0x32d0 lock_acquire+0x160/0x308 _raw_spin_lock+0x60/0xa0 cti_cpu_pm_notify+0x54/0x114 raw_notifier_call_chain_robust+0x50/0xd4 cpu_pm_enter+0x48/0x80 psci_enter_idle_state+0x34/0x74 cpuidle_enter_state+0x120/0x2a8 cpuidle_enter+0x38/0x50 do_idle+0x1e8/0x2b8 cpu_startup_entry+0x24/0x28 kernel_init+0x0/0x1a0 start_kernel+0x0/0x470 start_kernel+0x34c/0x470 __primary_switched+0xbc/0xc4
If we look into for this callback, we can see the lock sequence is:
cti_cpu_pm_notify() `> cpu_pm_notify_robust(): `> raw_spin_lock_irqsave(cpu_pm_notifier.lock, flag) -> a raw spinlock `> cti_cpu_pm_notify() `> spin_lock(&drvdata->spinlock) -> a normal spinlock
A raw spinlock is not a sleepable lock, and normal spinlock can be a sleepable lock (e.g. it can be a mutex after enabled PREEMPT_RT).
One solution is we can change to a raw spinlock in CTI driver, so this can dismiss the lockdep warning.
Actually, I am a bit suspect if it's really necessary to use spinlock in CTI PM callbacks, the reason is in CPU's idle flow, it will run into idle thread context and disable the local IRQ, which means it likely has no race condition with thread context and interrupt handler, so we can remove the locking in PM callbacks.
Mike, could you check for this? Thanks!
Leo