On 2020-07-24 21:28, Suzuki K Poulose wrote:
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
Yes I never noticed but ETM hotplug seems broken. As for this race, I can reliably trigger this race now on other platforms with latest kernel if I do async probe, i.e., with PROBE_PREFER_ASYNCHRONOUS set as probe_type and also we need "arm,coresight-loses-context-with-cpu" as well.
Thanks, Sai