Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
Thanks, Sai
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
The only differences I can see are:- 1) the cpu_pm_register_notifier() call is earlier in the sequence. Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Regards
Mike
Thanks, Sai
-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
[ 6.413532] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.420401] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.446244] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.461772] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.474249] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.482801] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.524947] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.535943] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.536308] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.541993] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.542087] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.542724] etm4_pm_setup_cpuslocked: etm4_starting_cpu ret=-16 <--- [ 6.643953] etm4_probe: etm4_pm_setup_cpuslocked ret=-16 dev=7440000.etm <--- [ 6.656030] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.662426] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7540000.etm [ 6.666221] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.682263] etm4_pm_setup_cpuslocked: etm4_starting_cpu ret=-16 <--- [ 6.691385] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.699443] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.699991] etm4_probe: etm4_pm_setup_cpuslocked ret=-16 dev=7640000.etm <--- [ 6.722837] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.737515] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7740000.etm [ 6.774354] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.781174] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.804638] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.811482] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7540000.etm [ 6.825762] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.839725] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.853700] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.867612] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7740000.etm [ 6.882355] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm
Thanks, Sai
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
Regards
Mike
[ 6.413532] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.420401] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.446244] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.461772] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.474249] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.482801] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.524947] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.535943] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.536308] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.541993] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.542087] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.542724] etm4_pm_setup_cpuslocked: etm4_starting_cpu ret=-16 <--- [ 6.643953] etm4_probe: etm4_pm_setup_cpuslocked ret=-16 dev=7440000.etm <--- [ 6.656030] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.662426] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7540000.etm [ 6.666221] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.682263] etm4_pm_setup_cpuslocked: etm4_starting_cpu ret=-16 <--- [ 6.691385] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.699443] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.699991] etm4_probe: etm4_pm_setup_cpuslocked ret=-16 dev=7640000.etm <--- [ 6.722837] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.737515] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7740000.etm [ 6.774354] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.781174] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.804638] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.811482] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7540000.etm [ 6.825762] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.839725] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.853700] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.867612] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7740000.etm [ 6.882355] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm
Thanks, Sai
-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
Cheers Suzuki
Regards
Mike
[ 6.413532] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.420401] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.446244] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.461772] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.474249] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.482801] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.524947] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.535943] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.536308] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.541993] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.542087] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.542724] etm4_pm_setup_cpuslocked: etm4_starting_cpu ret=-16 <--- [ 6.643953] etm4_probe: etm4_pm_setup_cpuslocked ret=-16 dev=7440000.etm <--- [ 6.656030] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.662426] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7540000.etm [ 6.666221] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.682263] etm4_pm_setup_cpuslocked: etm4_starting_cpu ret=-16 <--- [ 6.691385] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.699443] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.699991] etm4_probe: etm4_pm_setup_cpuslocked ret=-16 dev=7640000.etm <--- [ 6.722837] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.737515] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7740000.etm [ 6.774354] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.781174] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm [ 6.804638] etm4_pm_setup_cpuslocked: etm4_online_cpu ret=181 [ 6.811482] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7540000.etm [ 6.825762] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7140000.etm [ 6.839725] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7040000.etm [ 6.853700] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7340000.etm [ 6.867612] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7740000.etm [ 6.882355] etm4_probe: etm4_pm_setup_cpuslocked ret=0 dev=7240000.etm
Thanks, Sai
-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On 2020-07-24 21:28, Suzuki K Poulose wrote:
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
Yes I never noticed but ETM hotplug seems broken. As for this race, I can reliably trigger this race now on other platforms with latest kernel if I do async probe, i.e., with PROBE_PREFER_ASYNCHRONOUS set as probe_type and also we need "arm,coresight-loses-context-with-cpu" as well.
Thanks, Sai
Hi Suzuki,
On Mon, 27 Jul 2020 at 06:59, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
On 2020-07-24 21:28, Suzuki K Poulose wrote:
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
Does this include the cpu_pm callback as well?
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
I've looked at this - as far as I can tell moving the callback registration will make no difference to this issue. The CPUHP / cpu_pm registrations are not cpu bound and will succeed on the first cpu that runs a probe function for the ETM - irrespective of whether this is the cpu bound to the ETM being probed.
The problem is that the probe() function requires a register read by the cpu bound to the ETM - if this cpu is not up then this fails the probe and this is never retried. Using your maxcpus=1 test on DB410 I can see the CPUHP states correctly registered with a single cpu active. However, only a single ETM has initialised and the other three have errors "ETM arch init failed". As expected (as I am currently using the module patchset), I can then start another CPU, remove and re-install the coresight-etm4x module and 2 of the ETMs are now initialised.
It seems to me that the cpu related initialisation and possibly registration with the CS core & coresight bus should be pended till the cpu comes online. I guess this could either be completed on the CPUHP starting call - having returned a success code from probe() and set a pending flag in the driver, or the probe() could be set to a pended state to be retried by the AMBA core - but this latter seems rather inefficient as the retries will have to happen until the CPU becomes live.
Regards
Mike
Yes I never noticed but ETM hotplug seems broken. As for this race, I can reliably trigger this race now on other platforms with latest kernel if I do async probe, i.e., with PROBE_PREFER_ASYNCHRONOUS set as probe_type and also we need "arm,coresight-loses-context-with-cpu" as well.
Thanks, Sai
-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
On 07/27/2020 03:55 PM, Mike Leach wrote:
Hi Suzuki,
On Mon, 27 Jul 2020 at 06:59, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
On 2020-07-24 21:28, Suzuki K Poulose wrote:
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote: > > Hi Mike, > > Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power > management > setup in probe() function"), > ETM probe fails consistently like below: > > localhost ~ # dmesg | grep -i etm > [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with > error > -16 > [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized > [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized > [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized > [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized > [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized > [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized > [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized > localhost ~ # > > Most of the time its for ETM0 but I occasionally see ETM1 and other > ETMs > failing to probe, > but the some ETM probe failure is always there. I'm using SC7180 > based > platform on 5.4 kernel > which has all the coresight patches backported. > > If I revert that commit, I don't see the issue at all. In case you > can > identify something which > might be causing this, please let me know. I'm planning to look > into > this as well in the meantime. >
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
Does this include the cpu_pm callback as well?
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
I've looked at this - as far as I can tell moving the callback registration will make no difference to this issue. The CPUHP / cpu_pm registrations are not cpu bound and will succeed on the first cpu that runs a probe function for the ETM - irrespective of whether this is the cpu bound to the ETM being probed.
The problem is that the probe() function requires a register read by the cpu bound to the ETM - if this cpu is not up then this fails the probe and this is never retried. Using your maxcpus=1 test on DB410 I can see the CPUHP states correctly registered with a single cpu active. However, only a single ETM has initialised and the other three have errors "ETM arch init failed". As expected (as I am currently using the module patchset), I can then start another CPU, remove and re-install the coresight-etm4x module and 2 of the ETMs are now initialised.
Yes, I understand this and I am planning to solve this part. My point is having the hotplug notifier registered doesn't hurt. CPU hotplug is indeed a slow path. And having this simplifies the code a lot.
It seems to me that the cpu related initialisation and possibly registration with the CS core & coresight bus should be pended till the cpu comes online. I guess this could either be completed on the CPUHP starting call - having returned a success code from probe() and set a pending flag in the driver, or the probe() could be set to a
That is precisely what I have in mind.
pended state to be retried by the AMBA core - but this latter seems rather inefficient as the retries will have to happen until the CPU becomes live.
You're right. This option is complicated.
Cheers
Suzuki
Hi Suzuki
Looks like I missed your response as I was trying a couple of things out to confirm my understanding of the situation.
On Mon, 27 Jul 2020 at 17:38, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 07/27/2020 03:55 PM, Mike Leach wrote:
Hi Suzuki,
On Mon, 27 Jul 2020 at 06:59, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
On 2020-07-24 21:28, Suzuki K Poulose wrote:
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote: > Hi Sai, > > > On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan > saiprakash.ranjan@codeaurora.org wrote: >> >> Hi Mike, >> >> Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power >> management >> setup in probe() function"), >> ETM probe fails consistently like below: >> >> localhost ~ # dmesg | grep -i etm >> [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with >> error >> -16 >> [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized >> [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized >> [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized >> [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized >> [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized >> [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized >> [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized >> localhost ~ # >> >> Most of the time its for ETM0 but I occasionally see ETM1 and other >> ETMs >> failing to probe, >> but the some ETM probe failure is always there. I'm using SC7180 >> based >> platform on 5.4 kernel >> which has all the coresight patches backported. >> >> If I revert that commit, I don't see the issue at all. In case you >> can >> identify something which >> might be causing this, please let me know. I'm planning to look >> into >> this as well in the meantime. >> > > I'm not seeing any issues - using DB410 + 5.8 kernel. > > The patch is a clean-up & fixes an issue that the goto skipped an > unlock on error, rather than any functional change. >
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
> The only differences I can see are:- > 1) the cpu_pm_register_notifier() call is earlier in the sequence. > Shouldn't make a difference as cpu_pm and hotplug are different > systems. > 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no > longer > ignored. > > I would suggest that 2) may be the issue on your system - if you are > now seeing an error that was not being processed before? >
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
Does this include the cpu_pm callback as well?
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
I've looked at this - as far as I can tell moving the callback registration will make no difference to this issue. The CPUHP / cpu_pm registrations are not cpu bound and will succeed on the first cpu that runs a probe function for the ETM - irrespective of whether this is the cpu bound to the ETM being probed.
The problem is that the probe() function requires a register read by the cpu bound to the ETM - if this cpu is not up then this fails the probe and this is never retried. Using your maxcpus=1 test on DB410 I can see the CPUHP states correctly registered with a single cpu active. However, only a single ETM has initialised and the other three have errors "ETM arch init failed". As expected (as I am currently using the module patchset), I can then start another CPU, remove and re-install the coresight-etm4x module and 2 of the ETMs are now initialised.
Yes, I understand this and I am planning to solve this part. My point is having the hotplug notifier registered doesn't hurt. CPU hotplug is indeed a slow path. And having this simplifies the code a lot.
OK - I'd read this as the HP registration change was needed to solve the ETM probe on CPU enable issue.
It seems to me that the cpu related initialisation and possibly registration with the CS core & coresight bus should be pended till the cpu comes online. I guess this could either be completed on the CPUHP starting call - having returned a success code from probe() and set a pending flag in the driver, or the probe() could be set to a
That is precisely what I have in mind.
Agreed - but we need to address CPU bound CTI devices as well - the problem is mirrored for this driver too.
Regards
Mike
pended state to be retried by the AMBA core - but this latter seems rather inefficient as the retries will have to happen until the CPU becomes live.
You're right. This option is complicated.
Cheers
Suzuki
Hi,
On Mon, 27 Jul 2020 at 15:55, Mike Leach mike.leach@linaro.org wrote:
Hi Suzuki,
On Mon, 27 Jul 2020 at 06:59, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
On 2020-07-24 21:28, Suzuki K Poulose wrote:
On 07/24/2020 04:38 PM, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote: > > Hi Mike, > > Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power > management > setup in probe() function"), > ETM probe fails consistently like below: > > localhost ~ # dmesg | grep -i etm > [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with > error > -16 > [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized > [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized > [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized > [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized > [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized > [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized > [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized > localhost ~ # > > Most of the time its for ETM0 but I occasionally see ETM1 and other > ETMs > failing to probe, > but the some ETM probe failure is always there. I'm using SC7180 > based > platform on 5.4 kernel > which has all the coresight patches backported. > > If I revert that commit, I don't see the issue at all. In case you > can > identify something which > might be causing this, please let me know. I'm planning to look > into > this as well in the meantime. >
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
That looks quite possible. We rely on etm4x_count to detect whether we have registered/unregistered the notifiers. We could simply register the notifiers and leave them registered at driver registration time. And only remove them when we remove the driver. It is fine to execute the callback in the absence of the ETM on the CPU. It is not a fast path anyway.
Does this include the cpu_pm callback as well?
This will also help us to solve the CPU hotplug issues. i.e, if a CPU is not brought online during the etm4 driver probe, we can never enable ETM on the CPU anymore. You can trigger this by booting a system with maxcpus=1 and later bringing the CPUs online manually.
I've looked at this - as far as I can tell moving the callback registration will make no difference to this issue. The CPUHP / cpu_pm registrations are not cpu bound and will succeed on the first cpu that runs a probe function for the ETM - irrespective of whether this is the cpu bound to the ETM being probed.
The problem is that the probe() function requires a register read by the cpu bound to the ETM - if this cpu is not up then this fails the probe and this is never retried. Using your maxcpus=1 test on DB410 I can see the CPUHP states correctly registered with a single cpu active. However, only a single ETM has initialised and the other three have errors "ETM arch init failed". As expected (as I am currently using the module patchset), I can then start another CPU, remove and re-install the coresight-etm4x module and 2 of the ETMs are now initialised.
It seems to me that the cpu related initialisation and possibly registration with the CS core & coresight bus should be pended till the cpu comes online. I guess this could either be completed on the CPUHP starting call - having returned a success code from probe() and set a pending flag in the driver, or the probe() could be set to a pended state to be retried by the AMBA core - but this latter seems rather inefficient as the retries will have to happen until the CPU becomes live.
And with a bit of code re-arrangement (on tingwei's v4 set with module support) the following is possible, without changing the position of the CPU callback registration:-
root@linaro-developer:/home/linaro/cs-mods# insmod coresight-etm4x.ko
[ 232.615419] coresight etm0: CPU0: ETM v4.0 initialized [ 232.615848] coresight-etm4x 85d000.etm: CPU1 offline: ETM initialize pending [ 232.619721] coresight-etm4x 85e000.etm: CPU2 offline: ETM initialize pending [ 232.626937] coresight-etm4x 85f000.etm: CPU3 offline: ETM initialize pending
root@linaro-developer:/home/linaro/cs-mods# echo 1 > /sys/devices/system/cpu/cpu1/online[
297.924286] Detected VIPT I-cache on CPU1 [ 297.924791] coresight etm1: CPU1: ETM v4.0 initialized [ 297.924806] CPU1: Booted secondary processor 0x0000000001 [0x410fd030]
root@linaro-developer:/home/linaro/cs-mods# ls /sys/bus/coresight/devices/ etm0 etm1 root@linaro-developer:/home/linaro/cs-mods# echo 1 > /sys/devices/system/cpu/cpu3/online
[ 324.365883] Detected VIPT I-cache on CPU3 [ 324.366394] coresight etm3: CPU3: ETM v4.0 initialized [ 324.366412] CPU3: Booted secondary processor 0x0000000003 [0x410fd030]
root@linaro-developer:/home/linaro/cs-mods# echo 1 > /sys/devices/system/cpu/cpu2/online
[ 335.209703] Detected VIPT I-cache on CPU2 [ 335.210238] coresight etm2: CPU2: ETM v4.0 initialized [ 335.210256] CPU2: Booted secondary processor 0x0000000002 [0x410fd030]
root@linaro-developer:/home/linaro/cs-mods# ls /sys/bus/coresight/devices/ etm0 etm1 etm2 etm3
Of course the CTI's bound to a CPU have similar issues. The CPUHP is also guarded by a counter There is no actual register access in the cti probe function but some power state to be regarded. Any solution - irrespective of potential moving of registration of CPUHP and cpu_pm callback notifiers, needs to cover both device types to be comprehensive
Regards
Mike
Regards
Mike
Yes I never noticed but ETM hotplug seems broken. As for this race, I can reliably trigger this race now on other platforms with latest kernel if I do async probe, i.e., with PROBE_PREFER_ASYNCHRONOUS set as probe_type and also we need "arm,coresight-loses-context-with-cpu" as well.
Thanks, Sai
-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
Hi Mike,
On 2020-07-24 21:08, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 12:44, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-07-24 16:35, Mike Leach wrote:
Hi Sai,
On Fri, 24 Jul 2020 at 08:48, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
Since commit 9b6a3f3633a5cc9("coresight: etmv4: Fix CPU power management setup in probe() function"), ETM probe fails consistently like below:
localhost ~ # dmesg | grep -i etm [ 6.460602] coresight-etm4x: probe of 7040000.etm failed with error -16 [ 6.524756] coresight etm1: CPU1: ETM v4.2 initialized [ 6.531152] coresight etm2: CPU2: ETM v4.2 initialized [ 6.538495] coresight etm3: CPU3: ETM v4.2 initialized [ 6.545124] coresight etm4: CPU4: ETM v4.2 initialized [ 6.552904] coresight etm5: CPU5: ETM v4.2 initialized [ 6.559714] coresight etm6: CPU6: ETM v4.2 initialized [ 6.569596] coresight etm7: CPU7: ETM v4.2 initialized localhost ~ #
Most of the time its for ETM0 but I occasionally see ETM1 and other ETMs failing to probe, but the some ETM probe failure is always there. I'm using SC7180 based platform on 5.4 kernel which has all the coresight patches backported.
If I revert that commit, I don't see the issue at all. In case you can identify something which might be causing this, please let me know. I'm planning to look into this as well in the meantime.
I'm not seeing any issues - using DB410 + 5.8 kernel.
The patch is a clean-up & fixes an issue that the goto skipped an unlock on error, rather than any functional change.
Yes, I tested on another platform which is based on SDM845 and there the issue is not seen.
The only differences I can see are:-
- the cpu_pm_register_notifier() call is earlier in the sequence.
Shouldn't make a difference as cpu_pm and hotplug are different systems. 2) the error from cpuhp_setup_state_nocalls_cpuslocked() is no longer ignored.
I would suggest that 2) may be the issue on your system - if you are now seeing an error that was not being processed before?
Yes the error is from cpuhp_setup_state_nocalls_cpuslocked(),
Looking at the code, (-16 / -EBUSY)this seems to come from the internal cpuhp_store_callbacks() function in cpu.c. This prevents multiple registrations of callbacks for a given state. It could be that on your system there is an issue with a race on the etm4_count variable, allowing two calls to the function cpuhp_setup_state_nocalls_cpuslocked() function, with one hitting the error. I would consider protecting this either by mutex or turning it into an atomic to see if that fixes your problem
Yes, converting etm4_count to atomic variable works and I can't trigger the issue anymore.
Thanks, Sai