On Tue, 12 May 2026 09:56:07 +0800, Jie Gan wrote:
> When coresight_path_assign_trace_id() cannot assign a valid trace ID,
> coresight_enable_sysfs() takes the err_path goto with ret still 0,
> returning success to the caller despite no trace session being started.
>
> Change coresight_path_assign_trace_id() to return int, moving the
> IS_VALID_CS_TRACE_ID() check inside it so it returns -EINVAL on failure
> and 0 on success. Update both callers to propagate this return value
> directly instead of inspecting path->trace_id after the call.
>
> [...]
Applied, thanks!
[1/1] coresight: fix missing error code when trace ID is invalid
https://git.kernel.org/coresight/c/f4526ffee6ff
Best regards,
--
Suzuki K Poulose <suzuki.poulose(a)arm.com>
On Tue, May 12, 2026 at 09:56:07AM +0800, Jie Gan wrote:
> When coresight_path_assign_trace_id() cannot assign a valid trace ID,
> coresight_enable_sysfs() takes the err_path goto with ret still 0,
> returning success to the caller despite no trace session being started.
>
> Change coresight_path_assign_trace_id() to return int, moving the
> IS_VALID_CS_TRACE_ID() check inside it so it returns -EINVAL on failure
> and 0 on success. Update both callers to propagate this return value
> directly instead of inspecting path->trace_id after the call.
>
> Fixes: d87d76d823d1 ("Coresight: Allocate trace ID after building the path")
> Reviewed-by: James Clark <james.clark(a)linaro.org>
> Reviewed-by: Richard Cheng <icheng(a)nvidia.com>
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
Reviewed-by: Leo Yan <leo.yan(a)arm.com>
On Mon, May 11, 2026 at 05:27:10PM +0800, Jie Gan wrote:
[...]
> > > @@ -755,10 +755,16 @@ void coresight_path_assign_trace_id(struct coresight_path *path,
> > > * Non 0 is either success or fail.
> > > */
> > > if (trace_id != 0) {
> > > - path->trace_id = trace_id;
> > > - return;
> > > + if (IS_VALID_CS_TRACE_ID(trace_id)) {
> > > + path->trace_id = trace_id;
> > > + return 0;
> > > + }
> > > +
> > > + return -EINVAL;
I'd advocate a bit early exit style, like:
/* 0 means the device has no ID assignment, so keep searching */
if (trace_id == 0)
continue;
if (!IS_VALID_CS_TRACE_ID(trace_id))
return -EINVAL;
path->trace_id = trace_id;
return 0;
Early exit can reduce indentation depth, and it handles simple cases
first and then the complex logic. In some cases (maye not this case),
we may benefit a bit from compiler optimization [1].
[1] https://xania.org/202512/18-partial-inlining
[...]
> The return value has been ignored in perf mode. It will introduce noisy by
> adding __must_check. So I think its better without __must_check?
Wouldn't it need to update perf mode as well?
Regarding __must_check, I searched Documentation but didn't find
guidance on when it should be used. I don't want to use this annotation
randomly (some functions use it and some not), this will be hard for
everyone to follow up.
IMO, it's fine not to use __must_check here. I would leave this to
Suzuki and other maintainers if have different opinions.
Thanks,
Leo
On 08/05/2026 6:45 am, Jie Gan wrote:
> When coresight_path_assign_trace_id() fails to allocate a valid trace
> ID, the code jumps to err_path without setting ret to an error value.
> This causes coresight_enable_sysfs() to return 0 (success) to the
> caller even though no trace session was started.
>
> Set ret = -EINVAL before the goto so that callers receive a proper
> error code.
>
> Fixes: d87d76d823d1 ("Coresight: Allocate trace ID after building the path")
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
> ---
> drivers/hwtracing/coresight/coresight-sysfs.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-sysfs.c b/drivers/hwtracing/coresight/coresight-sysfs.c
> index d2a6ed8bcc74..c9338c783540 100644
> --- a/drivers/hwtracing/coresight/coresight-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-sysfs.c
> @@ -195,42 +195,44 @@ int coresight_enable_sysfs(struct coresight_device *csdev)
> */
> if (subtype == CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE)
> csdev->refcnt++;
> goto out;
> }
>
> sink = coresight_find_activated_sysfs_sink(csdev);
> if (!sink) {
> ret = -EINVAL;
> goto out;
> }
>
> path = coresight_build_path(csdev, sink);
> if (IS_ERR(path)) {
> pr_err("building path(s) failed\n");
> ret = PTR_ERR(path);
> goto out;
> }
>
> coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> - if (!IS_VALID_CS_TRACE_ID(path->trace_id))
> + if (!IS_VALID_CS_TRACE_ID(path->trace_id)) {
> + ret = -EINVAL;
> goto err_path;
> + }
>
> ret = coresight_enable_path(path, CS_MODE_SYSFS);
> if (ret)
> goto err_path;
>
> ret = coresight_enable_source_sysfs(csdev, CS_MODE_SYSFS, path);
> if (ret)
> goto err_source;
>
> switch (subtype) {
> case CORESIGHT_DEV_SUBTYPE_SOURCE_PROC:
> /*
> * When working from sysFS it is important to keep track
> * of the paths that were created so that they can be
> * undone in 'coresight_disable()'. Since there can only
> * be a single session per tracer (when working from sysFS)
> * a per-cpu variable will do just fine.
> */
> cpu = source_ops(csdev)->cpu_id(csdev);
> per_cpu(tracer_path, cpu) = path;
>
> ---
> base-commit: 17c7841d09ee7d33557fd075562d9289b6018c90
> change-id: 20260508-fix-trace-id-error-dbfdd4d8f2d1
>
> Best regards,
Reviewed-by: James Clark <james.clark(a)linaro.org>
On Mon, May 11, 2026 at 05:28:37PM +0800, Jie Gan wrote:
> Hi Leo,
>
> On 5/11/2026 5:24 PM, Leo Yan wrote:
> > On Mon, May 11, 2026 at 05:04:44PM +0800, Jie Gan wrote:
> > > In coresight_enable_sysfs(), for non-CPU sources (SOFTWARE, TPDM,
> > > OTHERS), the source device is enabled via coresight_enable_source_sysfs()
> > > before idr_alloc_u32() maps the path. If idr_alloc_u32() fails, the
> > > original code jumped directly to err_source, which only calls
> > > coresight_disable_path() and coresight_release_path(). The source device
> > > was left enabled with an incremented refcnt but no path tracked for it,
> > > leaving the device in an inconsistent state.
> > >
> > > Disable the source before jumping to err_source so the enable and path
> > > operations are fully unwound.
> > >
> > > Fixes: 1f5149c7751c ("coresight: Move all sysfs code to sysfs file")
> > > Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
> >
> > Actually I have noticed this. Since my PM series will remove IDR things,
> > and I don't think anyone really hit idr alloc error, this is why I
> > didn't send fix for this.
> >
> > Anyway, this is a reasonable fix. I will send out my PM series later
> > in today, I will pick this patch into my series and rebase on it, hope
> > this is easier for all of us.
>
> Well noted. Please feel free to pick it into your series.
Thanks! Just note, I updated the Fixes tag as:
Fixes: 5c0016d7b343 ("coresight: core: Use IDR for non-cpu bound sources' paths.")
Which is the original patch for the issue.
Thanks,
Leo
On Mon, May 11, 2026 at 05:04:44PM +0800, Jie Gan wrote:
> In coresight_enable_sysfs(), for non-CPU sources (SOFTWARE, TPDM,
> OTHERS), the source device is enabled via coresight_enable_source_sysfs()
> before idr_alloc_u32() maps the path. If idr_alloc_u32() fails, the
> original code jumped directly to err_source, which only calls
> coresight_disable_path() and coresight_release_path(). The source device
> was left enabled with an incremented refcnt but no path tracked for it,
> leaving the device in an inconsistent state.
>
> Disable the source before jumping to err_source so the enable and path
> operations are fully unwound.
>
> Fixes: 1f5149c7751c ("coresight: Move all sysfs code to sysfs file")
> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
Actually I have noticed this. Since my PM series will remove IDR things,
and I don't think anyone really hit idr alloc error, this is why I
didn't send fix for this.
Anyway, this is a reasonable fix. I will send out my PM series later
in today, I will pick this patch into my series and rebase on it, hope
this is easier for all of us.
Thanks,
Leo
On Fri, May 08, 2026 at 01:45:35PM +0800, Jie Gan wrote:
[...]
> coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> - if (!IS_VALID_CS_TRACE_ID(path->trace_id))
> + if (!IS_VALID_CS_TRACE_ID(path->trace_id)) {
> + ret = -EINVAL;
> goto err_path;
> + }
On the top of this patch, could we do a further improvement?
Move IS_VALID_CS_TRACE_ID() into coresight_path_assign_trace_id() and
return 0 for success and < 0 for failures. As result, callers only
need to check the returned value.
For this patch:
Reviewed-by: Leo Yan <leo.yan(a)arm.com>
On Thu, May 07, 2026 at 03:55:26PM +0100, James Clark wrote:
[...]
> Hi Leo,
>
> Testing on the Orion O6 board was all good, and so was stress testing
> concurrent sysfs mode and hotplug on Juno.
Thanks a lot for test!
> However, I was trying to stress test sysfs mode and rmmod on Juno and
> ran into an issue, although a similar issue is present without your
> patchset.
I don't think CPU PM introduces additional complexity for the above cases.
The reason is that CPU PM notifiers _only_ apply to active sessions, and
once a device is enabled, the module cannot be removed.
If the race conditions between enabling/disabling sessions and module
load/unload are properly handled, CPU PM should be safe. If we have bug
in these race conditions, the high frequency data access in CPU PM may
expose the issues - I don't expect CPU PM is the culprit.
> If you run an rmmod on all the coresight devices at the same time as an
> enable_source / disable loop you always get this:
>
> WARNING: possible circular locking dependency detected
> 7.0.0-rc1+ #713 Tainted: G N
> ------------------------------------------------------
> rmmod/1361 is trying to acquire lock:
> ffff0008042f69a8 (kn->active#144){++++}-{0:0}, at:
> __kernfs_remove+0x1b8/0x2c8
kn->active is not a lock but an active reference of sysfs node, but it
use lockdep annotation to detect lock dependency.
> Possible unsafe locking scenario:
> CPU0 CPU1
> ---- ----
> lock(coresight_mutex);
> lock(cpu_hotplug_lock);
> lock(coresight_mutex);
> lock(kn->active#144);
> *** DEADLOCK ***
The potential deadlock sequence could be:
kernfs_fop_write_iter()
`> kernfs_get_active_of() => acquire kn->active
`> coresight_enable_sysfs() => acquire coresight_mutex
coresight_unregister() => acquire coresight_mutex
`> device_unregister()
`> __kernfs_remove()
`> kernfs_drain() => acquire kn->active
> I think the issue can be fixed by releasing the coresight_mutex before
> device_unregister():
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c
> b/drivers/hwtracing/coresight/coresight-core.c
> index 015363da12fa..620560880f12 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -1639,8 +1639,8 @@ void coresight_unregister(struct coresight_device
> *csdev)
> coresight_remove_conns(csdev);
> coresight_clear_default_sink(csdev);
> coresight_release_platform_data(csdev->dev.parent, csdev->pdata);
> - device_unregister(&csdev->dev);
> mutex_unlock(&coresight_mutex);
> + device_unregister(&csdev->dev);
> }
> EXPORT_SYMBOL_GPL(coresight_unregister);
If so, we also need to move device_register() out of the mutex scope.
That said, I still think we should dive a bit if can use smaller locking
granluarity (combining with bus management provided by device model).
> Although I didn't think too hard about the implications, but it might be ok
> because once all the connections are removed the device can't be used so
> releasing the coresight_mutex isn't an issue.
>
> But then testing that I ran into some kind of refleak where I couldn't
> unload modules anymore, even though I'd disabled everything. But that could
> be a different issue:
>
> rmmod: ERROR: Module coresight_funnel is in use
> rmmod: ERROR: Module coresight_replicator is in use
> rmmod: ERROR: Module coresight_etm4x is in use
> rmmod: ERROR: Module coresight_tmc is in use
> rmmod: ERROR: Module coresight_cti is in use
> rmmod: ERROR: Module coresight is in use by: coresight_tmc coresight_cti
> coresight_etm4x coresight_replicator coresight_funnel
I suspect this is due to module references are not properly released, or
the entire CS path is not properly disabled.
After the issue occurs, can the ETM sysfs knob still be accessed? I am
curious whether this is caused by the sysfs knob disappearing so no way
to disable the path or the sysfs knob still exists but the driver
internally misses to disable the path.
> Anyway I don't think your patches make this worse, so we can probably ignore
> it, but it would be good to be able to stress test the new modifications
> around the same area.
As no regression in test, I agree that we should not defer this series.
We can fix the race with module load/unload as a separate task:
- sysfs mode + module load/unload
- perf mode + module load/unload
Then we can combine stress test with CPU idle/hotplug.
Thanks,
Leo
On 09/05/2026 1:58 am, Jie Gan wrote:
> When coresight_path_assign_trace_id() cannot assign a valid trace ID,
> coresight_enable_sysfs() takes the err_path goto with ret still 0,
> returning success to the caller despite no trace session being started.
>
> Fix this by changing coresight_path_assign_trace_id() to return int.
> Move the IS_VALID_CS_TRACE_ID() check inside the function so it returns
> -EINVAL on failure and 0 on success. Update coresight_enable_sysfs() to
> check the return value directly instead of inspecting path->trace_id
> after the call.
>
> The other caller in coresight-etm-perf.c discards the return value and
> continues to check path->trace_id via IS_VALID_CS_TRACE_ID() directly.
> This is unaffected: on failure path->trace_id is no longer written, so
> it remains 0, which IS_VALID_CS_TRACE_ID() rejects the same as before.
>
> Fixes: d87d76d823d1 ("Coresight: Allocate trace ID after building the path")
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
> ---
> Changes in v2:
> - Refactor the coresight_path_assign_trace_id function.
> - Link to v1: https://lore.kernel.org/r/20260508-fix-trace-id-error-v1-1-5f11a5456fdf@oss…
> ---
> drivers/hwtracing/coresight/coresight-core.c | 14 ++++++++++----
> drivers/hwtracing/coresight/coresight-priv.h | 2 +-
> drivers/hwtracing/coresight/coresight-sysfs.c | 4 ++--
> 3 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 46f247f73cf6..cabdc0c72f38 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -739,12 +739,12 @@ static int coresight_get_trace_id(struct coresight_device *csdev,
> * Call this after creating the path and before enabling it. This leaves
> * the trace ID set on the path, or it remains 0 if it couldn't be assigned.
> */
> -void coresight_path_assign_trace_id(struct coresight_path *path,
> +int coresight_path_assign_trace_id(struct coresight_path *path,
> enum cs_mode mode)
> {
> struct coresight_device *sink = coresight_get_sink(path);
> struct coresight_node *nd;
> - int trace_id;
> + int trace_id, ret = -EINVAL;
>
> list_for_each_entry(nd, &path->path_list, link) {
> /* Assign a trace ID to the path for the first device that wants to do it */
> @@ -755,10 +755,16 @@ void coresight_path_assign_trace_id(struct coresight_path *path,
> * Non 0 is either success or fail.
> */
> if (trace_id != 0) {
> - path->trace_id = trace_id;
> - return;
> + if (IS_VALID_CS_TRACE_ID(trace_id)) {
> + path->trace_id = trace_id;
> + ret = 0;
Reviewed-by: James Clark <james.clark(a)linaro.org>
Minor nit, but just "return 0" here is a bit simpler.
> + }
> +
> + return ret;
And then "return -EINVAL" for these ones.
> }
> }
> +
> + return ret; > }
>
> /**
> diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
> index 1ea882dffd70..34c7e792adbd 100644
> --- a/drivers/hwtracing/coresight/coresight-priv.h
> +++ b/drivers/hwtracing/coresight/coresight-priv.h
> @@ -153,7 +153,7 @@ int coresight_make_links(struct coresight_device *orig,
> void coresight_remove_links(struct coresight_device *orig,
> struct coresight_connection *conn);
> u32 coresight_get_sink_id(struct coresight_device *csdev);
> -void coresight_path_assign_trace_id(struct coresight_path *path,
> +int coresight_path_assign_trace_id(struct coresight_path *path,
> enum cs_mode mode);
>
> #if IS_ENABLED(CONFIG_CORESIGHT_SOURCE_ETM3X)
> diff --git a/drivers/hwtracing/coresight/coresight-sysfs.c b/drivers/hwtracing/coresight/coresight-sysfs.c
> index d2a6ed8bcc74..b6a870399e83 100644
> --- a/drivers/hwtracing/coresight/coresight-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-sysfs.c
> @@ -211,8 +211,8 @@ int coresight_enable_sysfs(struct coresight_device *csdev)
> goto out;
> }
>
> - coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> - if (!IS_VALID_CS_TRACE_ID(path->trace_id))
> + ret = coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> + if (ret)
> goto err_path;
>
> ret = coresight_enable_path(path, CS_MODE_SYSFS);
>
> ---
> base-commit: 17c7841d09ee7d33557fd075562d9289b6018c90
> change-id: 20260508-fix-trace-id-error-dbfdd4d8f2d1
>
> Best regards,
On Wed, Apr 22, 2026 at 02:21:54PM +0100, Yeoreum Yun wrote:
[...]
> @@ -573,11 +573,9 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
> etm4x_relaxed_write32(csa, config->res_ctrl[i], TRCRSCTLRn(i));
>
> for (i = 0; i < caps->nr_ss_cmp; i++) {
> - /* always clear status bit on restart if using single-shot */
> - if (config->ss_ctrl[i] || config->ss_pe_cmp[i])
> - config->ss_status[i] &= ~TRCSSCSRn_STATUS;
> etm4x_relaxed_write32(csa, config->ss_ctrl[i], TRCSSCCRn(i));
> - etm4x_relaxed_write32(csa, config->ss_status[i], TRCSSCSRn(i));
> + /* always clear status and pending bits on restart if using single-shot */
> + etm4x_relaxed_write32(csa, 0x0, TRCSSCSRn(i));
After confirmed with hardware team, we should preserve status bits
(including STATUS and PENDING bits) during a session. So here we should
set drvdata->ss_status to TRCSSCSRn.
> @@ -1503,8 +1501,9 @@ static void etm4_init_arch_data(void *info)
> */
> caps->nr_ss_cmp = FIELD_GET(TRCIDR4_NUMSSCC_MASK, etmidr4);
> for (i = 0; i < caps->nr_ss_cmp; i++) {
> - drvdata->config.ss_status[i] =
> - etm4x_relaxed_read32(csa, TRCSSCSRn(i));
> + drvdata->ss_status[i] = etm4x_relaxed_read32(csa, TRCSSCSRn(i));
> + drvdata->ss_status[i] &= (TRCSSCSRn_PC | TRCSSCSRn_DV |
> + TRCSSCSRn_DA | TRCSSCSRn_INST);
It is fine for read these capacity bits when probe, but we need to clear
status when a session is starting to avoid the stale value left from
previous session:
drvdata->ss_status[idx] &= ~(TRCSSCSRn_STATUS | TRCSSCSRn_PENDING);
We can do this in etm4_parse_event_config() for perf mode, and might
create a new function (say etm4_parse_sysfs_config()) for preparing
config for sysfs mode?
The rest looks good to me.
Thanks,
Leo