Hi Greg
Please find the pull request for coresight subsystem for v5.18.
Suzuki
The following changes since commit dfd42facf1e4ada021b939b4e19c935dcdd55566:
Linux 5.17-rc3 (2022-02-06 12:20:50 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git tags/coresight-next-v5.18
for you to fetch changes up to b54f53bc11a584713f79a704c70c488489f524b8:
coresight: Drop unused 'none' enum value for each component (2022-02-28 09:51:40 -0700)
----------------------------------------------------------------
coresight: changes for v5.18
The coresight update for v5.18 includes
- TRBE erratum workarounds for Arm Cortex-A510
- Fixes for leaking root namespace PIDs into non-root namespace
trace sessions
- Miscellaneous fixes and cleanups
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
----------------------------------------------------------------
Anshuman Khandual (4):
coresight: trbe: Work around the ignored system register writes
coresight: trbe: Work around the invalid prohibited states
coresight: trbe: Work around the trace data corruption
coresight: Drop unused 'none' enum value for each component
James Clark (2):
coresight: Fix TRCCONFIGR.QE sysfs interface
coresight: no-op refactor to make INSTP0 check more idiomatic
Leo Yan (4):
coresight: etm4x: Add lock for reading virtual context ID comparator
coresight: etm4x: Don't use virtual contextID for non-root PID namespace
coresight: etm4x: Don't trace PID for non-root PID namespace
coresight: etm3x: Don't trace PID for non-root PID namespace
Miaoqian Lin (1):
coresight: syscfg: Fix memleak on registration failure in cscfg_create_device
Rafael J. Wysocki (1):
hwtracing: coresight: Replace acpi_bus_get_device()
Sudeep Holla (1):
coresight: trbe: Move check for kernel page table isolation from EL0 to probe
arch/arm64/Kconfig | 6 +-
drivers/hwtracing/coresight/coresight-core.c | 3 -
drivers/hwtracing/coresight/coresight-etm3x-core.c | 4 +
drivers/hwtracing/coresight/coresight-etm4x-core.c | 12 +-
.../hwtracing/coresight/coresight-etm4x-sysfs.c | 38 ++++++-
drivers/hwtracing/coresight/coresight-platform.c | 8 +-
drivers/hwtracing/coresight/coresight-syscfg.c | 2 +-
drivers/hwtracing/coresight/coresight-trbe.c | 125 +++++++++++++++------
drivers/hwtracing/coresight/coresight-trbe.h | 8 --
include/linux/coresight.h | 5 -
10 files changed, 149 insertions(+), 62 deletions(-)
On Thu, 10 Mar 2022 at 14:44, Greg KH <gregkh(a)linuxfoundation.org> wrote:
>
> On Thu, Mar 03, 2022 at 11:03:01PM +0000, Suzuki K Poulose wrote:
> > Hi Greg
> >
> > Please find the pull request for coresight subsystem for v5.18.
> >
> > Suzuki
> >
> > The following changes since commit dfd42facf1e4ada021b939b4e19c935dcdd55566:
> >
> > Linux 5.17-rc3 (2022-02-06 12:20:50 -0800)
> >
> > are available in the Git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git tags/coresight-next-v5.18
> >
> > for you to fetch changes up to b54f53bc11a584713f79a704c70c488489f524b8:
> >
> > coresight: Drop unused 'none' enum value for each component (2022-02-28 09:51:40 -0700)
>
> I have the following errors when pulling this tree and having the
> scripts check the commits:
>
> Commit 5340bf5df9d2 ("coresight: syscfg: Fix memleak on registration failure in cscfg_create_device")
> committer Signed-off-by missing
> author email: linmq006(a)gmail.com
> committer email: suzuki.poulose(a)arm.com
> Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
> Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>
> Commit 91a2f2941df2 ("coresight: Fix TRCCONFIGR.QE sysfs interface")
> committer Signed-off-by missing
> author email: james.clark(a)arm.com
> committer email: suzuki.poulose(a)arm.com
> Signed-off-by: James Clark <james.clark(a)arm.com>
> Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>
> Commit 7f4cd3375906 ("coresight: trbe: Work around the trace data corruption")
> committer Signed-off-by missing
> author email: anshuman.khandual(a)arm.com
> committer email: suzuki.poulose(a)arm.com
> Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
> Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>
> Commit 0ecf2c747437 ("coresight: trbe: Work around the invalid prohibited states")
> committer Signed-off-by missing
> author email: anshuman.khandual(a)arm.com
> committer email: suzuki.poulose(a)arm.com
> Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
> Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>
> Commit 8b6927d0adad ("coresight: trbe: Work around the ignored system register writes")
> committer Signed-off-by missing
> author email: anshuman.khandual(a)arm.com
> committer email: suzuki.poulose(a)arm.com
> Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
> Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>
>
> What went wrong???
>
(Sigh)
I know what happened. The tree was rebased from -rc1 to -rc3 to pick
up a dependency with the aarch64 tree that was needed for a patch. In
doing so the original committer information was overwritten, which is
what the script is complaining about.
We will send you another pull request.
> linux-next didn't complain about this already?
>
It did not.
Thanks for the patience,
Mathieu
Hi Sudeep,
On Wed, 9 Mar 2022 at 11:50, Sudeep Holla <sudeep.holla(a)arm.com> wrote:
>
> On Wed, Mar 09, 2022 at 11:31:16AM +0000, Mike Leach wrote:
> > This is a resend of a patch from some time ago (04/2020)[1] which seems to
> > have fallen through the cracks - most likely as last time I mistakenly
> > tagged it as dt-bindings: rather than dts:
> >
>
> Quite likely, but I vaguely remember this and I assume the bindings had
> on-going discussions at that time.
>
> > I am planning a release of additional CTI configuration examples, which
> > include some for Juno - so this is now needed upstream to support that work.
> >
> > Patch unchanged, other than a correction to the subject.
> >
>
> That may not work. I haven't tried applying but it would be good to
> post it rebasing on -next at this moment or after v5.18-rc1 is released.
> I have already sent v5.18 material, so I need to queue this for v5.19.
> So preferable post the rebase version at v5.18-rc1 in 2+ weeks time.
> We have had some restructuring including the new scmi version of DTB
> in the mainline or queued in -next at the moment.
>
I did check it on coresight/next (5.17-rc3) and it was fine.
However re-doing for 5.18 is no problem as anything that depends on it
will be on there or later anyway.
Will rebase and repost when 5.18-rc1 becomes available.
Thanks
Mike
> Sorry for missing this last time.
>
> > [1] https://lore.kernel.org/linux-arm-kernel/20200415201330.15894-1-mike.leach@…
> >
> >
> > Mike Leach (1):
> > arm64: dts: arm: Juno - add CTI entries to device tree
> >
>
> No need of the cover letter for one patch, just post the patch next time.
>
> --
> Regards,
> Sudeep
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
This is a resend of a patch from some time ago (04/2020)[1] which seems to have fallen through
the cracks - most likely as last time I mistakenly tagged it as dt-bindings: rather than dts:
I am planning a release of additional CTI configuration examples, which include some for
Juno - so this is now needed upstream to support that work.
Patch unchanged, other than a correction to the subject.
[1] https://lore.kernel.org/linux-arm-kernel/20200415201330.15894-1-mike.leach@…
Mike Leach (1):
arm64: dts: arm: Juno - add CTI entries to device tree
arch/arm64/boot/dts/arm/juno-base.dtsi | 162 +++++++++++++++++++++-
arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi | 37 ++++-
arch/arm64/boot/dts/arm/juno-r1.dts | 25 ++++
arch/arm64/boot/dts/arm/juno-r2.dts | 25 ++++
arch/arm64/boot/dts/arm/juno.dts | 25 ++++
5 files changed, 269 insertions(+), 5 deletions(-)
--
2.17.1
On 03/03/2022 11:36, Jiri Olsa wrote:
> On Wed, Mar 02, 2022 at 04:20:00PM +0000, James Clark wrote:
>>
>>
>> On 27/02/2022 22:50, Jiri Olsa wrote:
>>> On Thu, Feb 24, 2022 at 05:19:55PM +0000, James Clark wrote:
>>>> MMAP records that occur after the build-id header is parsed do not have
>>>> their build-id set even if the filename matches an entry from the
>>>> header. Set the build-id on these dsos as long as the MMAP record
>>>> doesn't have its own build-id set.
>>>>
>>>> This fixes an issue with off target analysis where the local version of
>>>> a dso is loaded rather than one from ~/.debug via a build-id.
>>>
>>> nice catch :)
>>>
>>>>
>>>> Reported-by: Denis Nikitin <denik(a)chromium.org>
>>>> Signed-off-by: James Clark <james.clark(a)arm.com>
>>>> ---
>>>> tools/perf/util/dso.h | 1 +
>>>> tools/perf/util/header.c | 1 +
>>>> tools/perf/util/map.c | 16 ++++++++++++++--
>>>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
>>>> index 011da3924fc1..3a9fd4d389b5 100644
>>>> --- a/tools/perf/util/dso.h
>>>> +++ b/tools/perf/util/dso.h
>>>> @@ -167,6 +167,7 @@ struct dso {
>>>> enum dso_load_errno load_errno;
>>>> u8 adjust_symbols:1;
>>>> u8 has_build_id:1;
>>>> + u8 header_build_id:1;
>>>> u8 has_srcline:1;
>>>> u8 hit:1;
>>>> u8 annotate_warned:1;
>>>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>>>> index 6da12e522edc..571d73d4f976 100644
>>>> --- a/tools/perf/util/header.c
>>>> +++ b/tools/perf/util/header.c
>>>> @@ -2200,6 +2200,7 @@ static int __event_process_build_id(struct perf_record_header_build_id *bev,
>>>>
>>>> build_id__init(&bid, bev->data, size);
>>>> dso__set_build_id(dso, &bid);
>>>> + dso->header_build_id = 1;
>>>>
>>>> if (dso_space != DSO_SPACE__USER) {
>>>> struct kmod_path m = { .name = NULL, };
>>>> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
>>>> index 1803d3887afe..4ae91e491e23 100644
>>>> --- a/tools/perf/util/map.c
>>>> +++ b/tools/perf/util/map.c
>>>> @@ -127,7 +127,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
>>>>
>>>> if (map != NULL) {
>>>> char newfilename[PATH_MAX];
>>>> - struct dso *dso;
>>>> + struct dso *dso, *header_bid_dso;
>>>> int anon, no_dso, vdso, android;
>>>>
>>>> android = is_android_lib(filename);
>>>> @@ -185,7 +185,19 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
>>>>
>>>> if (build_id__is_defined(bid))
>>>> dso__set_build_id(dso, bid);
>>>> -
>>>> + else {
>>>
>>> nit please add { } to the if clause as well
>>>
>>>> + /*
>>>> + * If the mmap event had no build ID, search for an existing dso from the
>>>> + * build ID header by name. Otherwise only the dso loaded at the time of
>>>> + * reading the header will have the build ID set and all future mmaps will
>>>> + * have it missing.
>>>> + */
>>>> + header_bid_dso = __dsos__find(&machine->dsos, filename, false);
>>>
>>> is this 'perf top' safe? I think dso should be added in the
>>> same thread, but please check and add comment why we don't
>>> need locking in here
>>
>> Seems like there are multiple synthesize_threads_workers using the same machine->dsos object so
>> I think locking is needed.
>>
>> At first I thought of doing this:
>>
>> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
>> index 4ae91e491e23..b87b81e3d41c 100644
>> --- a/tools/perf/util/map.c
>> +++ b/tools/perf/util/map.c
>> @@ -192,7 +192,9 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
>> * reading the header will have the build ID set and all future mmaps will
>> * have it missing.
>> */
>> + down_read(&machine->dsos.lock);
>> header_bid_dso = __dsos__find(&machine->dsos, filename, false);
>> + up_read(&machine->dsos.lock);
>> if (header_bid_dso && header_bid_dso->header_build_id) {
>> dso__set_build_id(dso, &header_bid_dso->bid);
>> dso->header_build_id = 1;
>>
>> But then I was wondering why it doesn't need a write lock all the way from machine__findnew_dso_id() to
>> dso__put()? At the moment there are writes to the dso like dso__set_loaded(), dso->nsinfo = nsi and
>> dso__set_build_id(), so another thread could find the dso in a partially constructed state.
>
> I think that's fine, machine__findnew_dso_id takes the dso ref so
> the real 'release' is when the machine object goes down at dsos__purge
>
>>
>> Not sure if this is an issue currently without my patch, but at least with it they would have to be found
>> with header_build_id already set to 1 otherwise it will mess things up.
>
> as for the partial changes I think it's also fine, because it happens
> at separate places.. we'd need to investigate specific example to see
> if there's a problem
>
>>
>> Extending the write lock outside of machine__findnew_dso_id() is difficult because it already
>> releases it before it returns. Does it need to be changed so that machine__findnew_dso_id() takes all the
>> arguments needed to construct it inside the lock?
>
> the change above should do it.. as for top threads I don't think
> it actually matters because there's events processing and hists
> processing, and I did not find them clashing in this.. but there's
> some setup session code that touches this, so it's better to be safe
Ok yes I think I agree. The header flag isn't important for perf top so having it all
constructed in place isn't necessary.
I've submitted v2 with this change.
Thanks for the review.
>
> thanks,
> jirka
Changes since v1:
* Add read lock around dso find
* Bracket style fix
Hi,
We are seeing an issue with doing Coresight decode off target where
initially the correct dso from ~/.debug is used, but after a new thread
in the perf.data file is passed with its mmap record, the local version
of the dso is picked up instead. This happens if the binary exists in the
same path on both devices, for example /bin/ls.
Initially when parsing the build-ids in the header, the dso for /bin/ls
will be created, and the file will correctly point to
~/.debug/bin/ls/2f15ad836be3339dec0e2e6a3c637e08e48aacbd/elf, but for any
new threads or mmaps that are also for /bin/ls, they will not have a
build-id set so they point to /bin/ls on the local machine rather than the
debug folder.
To fix this I made it possible to look up which existing dsos have
build id's set that originate from the header and then copy that build-id
onto the new dso if the name matches. Another way to do it would
be to stop comparing the mmap id so it matches on filename only, but I
think we do want to differentiate between different mmaps, even if they
have the same name, which is how it works in this version.
Applies to perf/core 56dce8681
James Clark (1):
perf: Set build-id using build-id header on new mmap records
tools/perf/util/dso.h | 1 +
tools/perf/util/header.c | 1 +
tools/perf/util/map.c | 20 +++++++++++++++++---
3 files changed, 19 insertions(+), 3 deletions(-)
--
2.28.0
On 27/02/2022 22:50, Jiri Olsa wrote:
> On Thu, Feb 24, 2022 at 05:19:55PM +0000, James Clark wrote:
>> MMAP records that occur after the build-id header is parsed do not have
>> their build-id set even if the filename matches an entry from the
>> header. Set the build-id on these dsos as long as the MMAP record
>> doesn't have its own build-id set.
>>
>> This fixes an issue with off target analysis where the local version of
>> a dso is loaded rather than one from ~/.debug via a build-id.
>
> nice catch :)
>
>>
>> Reported-by: Denis Nikitin <denik(a)chromium.org>
>> Signed-off-by: James Clark <james.clark(a)arm.com>
>> ---
>> tools/perf/util/dso.h | 1 +
>> tools/perf/util/header.c | 1 +
>> tools/perf/util/map.c | 16 ++++++++++++++--
>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
>> index 011da3924fc1..3a9fd4d389b5 100644
>> --- a/tools/perf/util/dso.h
>> +++ b/tools/perf/util/dso.h
>> @@ -167,6 +167,7 @@ struct dso {
>> enum dso_load_errno load_errno;
>> u8 adjust_symbols:1;
>> u8 has_build_id:1;
>> + u8 header_build_id:1;
>> u8 has_srcline:1;
>> u8 hit:1;
>> u8 annotate_warned:1;
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index 6da12e522edc..571d73d4f976 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -2200,6 +2200,7 @@ static int __event_process_build_id(struct perf_record_header_build_id *bev,
>>
>> build_id__init(&bid, bev->data, size);
>> dso__set_build_id(dso, &bid);
>> + dso->header_build_id = 1;
>>
>> if (dso_space != DSO_SPACE__USER) {
>> struct kmod_path m = { .name = NULL, };
>> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
>> index 1803d3887afe..4ae91e491e23 100644
>> --- a/tools/perf/util/map.c
>> +++ b/tools/perf/util/map.c
>> @@ -127,7 +127,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
>>
>> if (map != NULL) {
>> char newfilename[PATH_MAX];
>> - struct dso *dso;
>> + struct dso *dso, *header_bid_dso;
>> int anon, no_dso, vdso, android;
>>
>> android = is_android_lib(filename);
>> @@ -185,7 +185,19 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
>>
>> if (build_id__is_defined(bid))
>> dso__set_build_id(dso, bid);
>> -
>> + else {
>
> nit please add { } to the if clause as well
>
>> + /*
>> + * If the mmap event had no build ID, search for an existing dso from the
>> + * build ID header by name. Otherwise only the dso loaded at the time of
>> + * reading the header will have the build ID set and all future mmaps will
>> + * have it missing.
>> + */
>> + header_bid_dso = __dsos__find(&machine->dsos, filename, false);
>
> is this 'perf top' safe? I think dso should be added in the
> same thread, but please check and add comment why we don't
> need locking in here
Seems like there are multiple synthesize_threads_workers using the same machine->dsos object so
I think locking is needed.
At first I thought of doing this:
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 4ae91e491e23..b87b81e3d41c 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -192,7 +192,9 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
* reading the header will have the build ID set and all future mmaps will
* have it missing.
*/
+ down_read(&machine->dsos.lock);
header_bid_dso = __dsos__find(&machine->dsos, filename, false);
+ up_read(&machine->dsos.lock);
if (header_bid_dso && header_bid_dso->header_build_id) {
dso__set_build_id(dso, &header_bid_dso->bid);
dso->header_build_id = 1;
But then I was wondering why it doesn't need a write lock all the way from machine__findnew_dso_id() to
dso__put()? At the moment there are writes to the dso like dso__set_loaded(), dso->nsinfo = nsi and
dso__set_build_id(), so another thread could find the dso in a partially constructed state.
Not sure if this is an issue currently without my patch, but at least with it they would have to be found
with header_build_id already set to 1 otherwise it will mess things up.
Extending the write lock outside of machine__findnew_dso_id() is difficult because it already
releases it before it returns. Does it need to be changed so that machine__findnew_dso_id() takes all the
arguments needed to construct it inside the lock?
James
>
> thanks,
> jirka
>
>> + if (header_bid_dso && header_bid_dso->header_build_id) {
>> + dso__set_build_id(dso, &header_bid_dso->bid);
>> + dso->header_build_id = 1;
>> + }
>> + }
>> dso__put(dso);
>> }
>> return map;
>> --
>> 2.28.0
>>
On 02/03/2022 08:01, Mao Jinlong wrote:
> It is possibe that probe failure issue happens when the device
> and its child_device's probe happens at the same time.
> In coresight_make_links, has_conns_grp is true for parent, but
> has_conns_grp is false for child device as has_conns_grp is set
> to true in coresight_create_conns_sysfs_group. The probe of parent
> device will fail at this condition. Add has_conns_grp check for
> child device before make the links and make the process from
> device_register to connection_create be atomic to avoid this
> probe failure issue.
>
> Suggested-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
> Suggested-by: Mike Leach <mike.leach(a)linaro.org>
> Signed-off-by: Mao Jinlong <quic_jinlmao(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/coresight-core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 88653d1c06a4..7ce78dddfe31 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -1382,7 +1382,7 @@ static int coresight_fixup_device_conns(struct coresight_device *csdev)
> continue;
> conn->child_dev =
> coresight_find_csdev_by_fwnode(conn->child_fwnode);
> - if (conn->child_dev) {
> + if (conn->child_dev && conn->child_dev->has_conns_grp) {
> ret = coresight_make_links(csdev, conn,
> conn->child_dev);
> if (ret)
> @@ -1619,6 +1619,7 @@ struct coresight_device *coresight_register(struct coresight_desc *desc)
> csdev->dev.fwnode = fwnode_handle_get(dev_fwnode(desc->dev));
> dev_set_name(&csdev->dev, "%s", desc->name);
>
> + mutex_lock(&coresight_mutex);
> ret = device_register(&csdev->dev);
> if (ret) {
> put_device(&csdev->dev);
> @@ -1645,8 +1646,6 @@ struct coresight_device *coresight_register(struct coresight_desc *desc)
> }
> }
>
> - mutex_lock(&coresight_mutex);
> -
> ret = coresight_create_conns_sysfs_group(csdev);
> if (!ret)
> ret = coresight_fixup_device_conns(csdev);
> @@ -1666,6 +1665,7 @@ struct coresight_device *coresight_register(struct coresight_desc *desc)
> err_free_csdev:
> kfree(csdev);
> err_out:
> + mutex_unlock(&coresight_mutex);
This appears to be wrong. We may do an unlock when we didn't lock it in
the first place.
Please could you double check. Also, I think it may be neater to move
the kfree(csdev) to the only user and return straight away from there.
Cheers
Suzuki
> /* Cleanup the connection information */
> coresight_release_platform_data(NULL, desc->pdata);
> return ERR_PTR(ret);
Hi
On 01/03/2022 13:30, Jinlong Mao wrote:
> Hi Mike,
>
> On 3/1/2022 9:15 PM, Mike Leach wrote:
>> Hi,
>>
>> On Tue, 1 Mar 2022 at 11:42, Jinlong Mao <quic_jinlmao(a)quicinc.com>
>> wrote:
>>> On 2/28/2022 10:51 PM, Suzuki K Poulose wrote:
>>>
...
>>>
>>> Hi Suzuki,
>>>
>>> This issue happens when race condition happens.
>>> The condition is that the device and its child_device's probe happens
>>> at the same time.
>>>
>>> For example: device0 and its child device device1.
>>> Both of them are calling coresight_register function. device0 is
>>> calling coresight_fixup_device_conns.
>>> device1 is waiting for device0 to release the coresight_mutex.
>>> Because device1's csdev node is allocated,
>>> coresight_make_links will be called for device0. Then in
>>> coresight_add_sysfs_link, has_conns_grp is true
>>> for device0, but has_conns_grp is false for device1 as has_conns_grp
>>> is set to true in coresight_create_conns_sysfs_group .
>>> The probe of device0 will fail for at this condition.
>>>
>>>
>>> struct coresight_device *coresight_register(struct coresight_desc *desc)
>>> {
>>> .........
>>> mutex_lock(&coresight_mutex);
>>>
>>> ret = coresight_create_conns_sysfs_group(csdev);
>>> if (!ret)
>>> ret = coresight_fixup_device_conns(csdev);
>>> if (!ret)
>>> ret = coresight_fixup_orphan_conns(csdev);
>>> if (!ret && cti_assoc_ops && cti_assoc_ops->add)
>>> cti_assoc_ops->add(csdev);
>>>
>>> mutex_unlock(&coresight_mutex);
>>>
>>> .........
>>>
>>> }
>>>
>>> static int coresight_fixup_device_conns(struct coresight_device *csdev)
>>> {
>>> ..........
>>> conn->child_dev =
>>> coresight_find_csdev_by_fwnode(conn->child_fwnode);
>> The issue appears to be a constraint hidden in the lower layers of the
>> code.
>> Would a better solution not be to alter the code here:
>>
>> if (conn->child_dev && conn->child_dev->has_conns_grp) {
>> ...
>> } else {
>> csdev->orphan = true;
>> }
>>
>> which would mean that the connection attempt would drop through to
>> label the connection as an orphan, to be cleaned up by the child
>> itself when it runs coresight_fixup_orphan_conns()
>>
Tnanks Mike, I think that is a good solution. Alternatively, we
could make sure that device_register() and the fixup following
that are atomic.
i.e.
mutex_lock()
device_register()
fixup_connections()
create_sysfs()
mutex_unlock();
The fix may be a bit invasive than Mike's proposal, but it makes
sure we don't end up with half baked device on the coresight-bus.
Suzuki