Hi Mike,
I was doing CPU hot plug test today and encoutner some CTI issues. I'd like to know whether they are known issues so someone is already on it. If no one is working on this, I can provide some patch later.
1. Deadlock [ 988.335937] CPU: 6 PID: 10258 Comm: sh Tainted: G W L 5.8.0-rc6-mainline-16783-gc38daa79b26b-dirty #1 [ 988.346364] Hardware name: Thundercomm Dragonboard 845c (DT) [ 988.352073] pstate: 20400005 (nzCv daif +PAN -UAO BTYPE=--) [ 988.357689] pc : smp_call_function_single+0x158/0x1b8 [ 988.362782] lr : smp_call_function_single+0x124/0x1b8 ... [ 988.451638] Call trace: [ 988.454119] smp_call_function_single+0x158/0x1b8 [ 988.458866] cti_enable+0xb4/0xf8 [coresight_cti] [ 988.463618] coresight_control_assoc_ectdev+0x6c/0x128 [coresight] [ 988.469855] coresight_enable+0x1f0/0x364 [coresight] [ 988.474957] enable_source_store+0x5c/0x9c [coresight] [ 988.480140] dev_attr_store+0x14/0x28 [ 988.483839] sysfs_kf_write+0x38/0x4c [ 988.487532] kernfs_fop_write+0x1c0/0x2b0 [ 988.491585] vfs_write+0xfc/0x300 [ 988.494931] ksys_write+0x78/0xe0 [ 988.498283] __arm64_sys_write+0x18/0x20 [ 988.502240] el0_svc_common+0x98/0x160 [ 988.506024] do_el0_svc+0x78/0x80 [ 988.509377] el0_sync_handler+0xd4/0x270 [ 988.513337] el0_sync+0x164/0x180
Root cause: CPU6: Grab drvdata->spinlock in cti_enable() Call smp_call_function_single(drvdata->ctidev.cpu, cti_enable_hw_smp_call, drvdata, 1); and wait for CPU2 to write CTI HW.
CPU2: In cti_cpu_pm_notify() with interrupt disabled and spin on drvdata->spinlock.
2. Warning [ 121.436987] WARNING: CPU: 1 PID: 15 at drivers/hwtracing/coresight/coresight-core.c:227 coresight_disclaim_device+0x30/0x44 [coresight] [ 121.438144] Hardware name: Thundercomm Dragonboard 845c (DT) [ 121.438156] pstate: 80c00085 (Nzcv daIf +PAN +UAO BTYPE=--) [ 121.438167] pc : coresight_disclaim_device+0x30/0x44 [coresight] [ 121.438203] lr : cti_dying_cpu+0x34/0x4c [coresight_cti]
Root cause: coresight_disclaim() is called in dying unconditionally while coresight_claim() is called only when it's enabled.
3. When checking the code, I think there's some issue on pm_runtime_get_sync() as well. It's called in cti_starting_cpu but put() is not called in dying. We could have unbalanced pm count here.
Test script: adb wait-for-device root adb wait-for-device :loop adb shell "echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink" adb shell "echo 1 > /sys/bus/coresight/devices/etm2/enable_source" adb shell "echo 0 > /sys/devices/system/cpu/cpu2/online" adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online" adb shell "echo 0 > /sys/devices/system/cpu/cpu2/online" adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online" adb shell "echo 0 > /sys/bus/coresight/devices/etm2/enable_source" goto loop
Thanks, Tingwei