On 07/27/2020 03:55 PM, Mike Leach wrote:
Hi Suzuki,
On Mon, 27 Jul 2020 at 06:59, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
On 2020-07-24 21:28, Suzuki K Poulose wrote:
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote: > > Hi Mike, > > Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power > management > setup in probe() function"), > ETM probe fails consistently like below: > > localhost ~ # dmesg | grep -i etm > [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with > error > -16 > [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized > [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized > [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized > [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized > [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized > [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized > [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized > localhost ~ # > > Most of the time its for ETM0 but I occasionally see ETM1 and other > ETMs > failing to probe, > but the some ETM probe failure is always there. I'm using SC7180 > based > platform on 5.4 kernel > which has all the coresight patches backported. > > If I revert that commit, I don't see the issue at all. In case you > can > identify something which > might be causing this, please let me know. I'm planning to look > into > this as well in the meantime. >
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
Does this include the cpu_pm callback as well?
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
I've looked at this - as far as I can tell moving the callback registration will make no difference to this issue. The CPUHP / cpu_pm registrations are not cpu bound and will succeed on the first cpu that runs a probe function for the ETM - irrespective of whether this is the cpu bound to the ETM being probed.
The problem is that the probe() function requires a register read by the cpu bound to the ETM - if this cpu is not up then this fails the probe and this is never retried. Using your maxcpus=1 test on DB410 I can see the CPUHP states correctly registered with a single cpu active. However, only a single ETM has initialised and the other three have errors "ETM arch init failed". As expected (as I am currently using the module patchset), I can then start another CPU, remove and re-install the coresight-etm4x module and 2 of the ETMs are now initialised.
Yes, I understand this and I am planning to solve this part. My point is having the hotplug notifier registered doesn't hurt. CPU hotplug is indeed a slow path. And having this simplifies the code a lot.
It seems to me that the cpu related initialisation and possibly registration with the CS core & coresight bus should be pended till the cpu comes online. I guess this could either be completed on the CPUHP starting call - having returned a success code from probe() and set a pending flag in the driver, or the probe() could be set to a
That is precisely what I have in mind.
pended state to be retried by the AMBA core - but this latter seems rather inefficient as the retries will have to happen until the CPU becomes live.
You're right. This option is complicated.
Cheers
Suzuki