On 11/07/2023 16:45, James Clark wrote:
On 11/07/2023 15:05, Greg Kroah-Hartman wrote:
On Tue, Jul 11, 2023 at 03:05:36PM +0800, quanyang.wang@windriver.com wrote:
From: Quanyang Wang quanyang.wang@windriver.com
For PREEMPT_RT kernel, spinlock_t locks become sleepable. The functions etm_dying_cpu and etm_starting_cpu which call spin_lock/unlock run in an irq-disabled context, this will trigger the following calltrace:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 25, name: migration/1 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 1 lock held by migration/1/25: #0: 82a7587c (&drvdata->spinlock){....}-{2:2}, at: etm_dying_cpu+0x28/0x54 Preemption disabled at: [<801ec760>] cpu_stopper_thread+0x94/0x120 CPU: 1 PID: 25 Comm: migration/1 Not tainted 6.1.35-rt10-yocto-preempt-rt #30 Hardware name: Xilinx Zynq Platform Stopper: multi_cpu_stop+0x0/0x174 <- __stop_cpus.constprop.0+0x48/0x88 unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __might_resched+0x14c/0x1c0 __might_resched from rt_spin_lock+0x4c/0x84 rt_spin_lock from etm_dying_cpu+0x28/0x54 etm_dying_cpu from cpuhp_invoke_callback+0x140/0x33c cpuhp_invoke_callback from __cpuhp_invoke_callback_range+0xa4/0x104 __cpuhp_invoke_callback_range from take_cpu_down+0x7c/0xa8 take_cpu_down from multi_cpu_stop+0x15c/0x174 multi_cpu_stop from cpu_stopper_thread+0x9c/0x120 cpu_stopper_thread from smpboot_thread_fn+0x31c/0x360 smpboot_thread_fn from kthread+0x100/0x124 kthread from ret_from_fork+0x14/0x2c
Convert struct etm_drvdata's spinlock to raw_spinlock to fix it.
wait, why will a raw_spinlock fix this? Why not fix the root problem here, that of calling these locks inproperly in irq context?
How is changing to a raw_spinlock going to fix the above splat?
thanks,
greg k-h
If it's just etm_starting_cpu() and etm_dying_cpu() as mentioned in the commit message then can those spinlocks be removed?
Surely there can't be any concurrent access to the per-cpu data when the hotplug callbacks are called?
Accessing the per-cpu data is not a problem. The spinlocks are there to protect the accesses to drvdata->mode. etm_starting_cpu() would try to enable the etm (i.e., start the tracing) if the mode is not DISABLED. Especially for SYSFS mode, this could be controlled from a different CPU, affecting the mode. I think we may still be able to avoid this lock, by allowing the modifications to the mode performed via enable_hw/disable_hw on the CPU. That way, there cannot be concurrent modifications to the mode for a given ETM bound to the CPU.
Suzuki