Hi,
On Wed, 2 Apr 2025 at 01:50, Jie Gan <quic_jiegan(a)quicinc.com> wrote:
>
>
>
> On 4/1/2025 5:56 PM, Mike Leach wrote:
> > Hi,
> >
> > On Tue, 1 Apr 2025 at 07:11, Anshuman Khandual
> > <anshuman.khandual(a)arm.com> wrote:
> >>
> >> On 4/1/25 07:12, Jie Gan wrote:
> >>> The trace_id in coresight_path may contain an error number which means a
> >>> negative integer, but the current type of the trace_id is u8. Change the
> >>> type to int to fix it.
> >>>
> >>> Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
> >>> Fixes: 3c03c49b2fa5 ("Coresight: Introduce a new struct coresight_path")
> >>> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
> >>
> >> LGTM
> >>
> >> Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
> >>
> >>> ---
> >>> include/linux/coresight.h | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> >>> index d79a242b271d..c2bf10c43e7c 100644
> >>> --- a/include/linux/coresight.h
> >>> +++ b/include/linux/coresight.h
> >>> @@ -337,7 +337,7 @@ static struct coresight_dev_list (var) = { \
> >>> */
> >>> struct coresight_path {
> >>> struct list_head path_list;
> >>> - u8 trace_id;
> >>> + int trace_id;
> >>> };
> >>>
> >>> enum cs_mode {
> >
> > There are many places in the Coresight drivers that assign a u8
> > traceid from the path trace ID.
> >
> > e.g.
> > In coresight-etm4x-core.c : etm4_enable_perf()
> >
> > drvdata->trcid = path->trace_id;
> >
> > drvdata->trcid is defined as a u8 - the reason being trace IDs are
> > 128 bits wide with some reserved values.
> >
> > Will this not just trigger the same issue if path->trace_id is changed
> > to an int? Even if not it is inconsistent handling of the trace ID
> > values.
> >
> > Trace ID errors should be handled by returning an invalid trace ID
> > value - were the trace ID value will fail the macro
> > IS_VALID_CS_TRACE_ID(), or separate the return of a trace ID from an
> > error return in a function.
> >
>
> Hi Mike,
>
> The path->trace_id is verified after it has been assigned with the logic
> you mentioned:
>
> if (!IS_VALID_CS_TRACE_ID(path->trace_id))
> goto err_path;
>
> So it should be safe to assign to another u8 parameter, like you mentioned:
>
> In coresight-etm4x-core.c : etm4_enable_perf()
>
> drvdata->trcid = path->trace_id;
>
It is safe but will it not trigger a warning just like the one you are
trying to fix as the types are mismatched?
Mike
> Thanks,
> Jie
>
>
> > Regards
> >
> > Mike
> >
> >
> >
> > --
> > Mike Leach
> > Principal Engineer, ARM Ltd.
> > Manchester Design Centre. UK
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
On 28/03/2025 10:38 pm, Yabin Cui wrote:
> When tracing ETM data on multiple CPUs concurrently via the
> perf interface, the CATU device is shared across different CPU
> paths. This can lead to race conditions when multiple CPUs attempt
> to enable or disable the CATU device simultaneously.
>
> To address these race conditions, this patch introduces the
> following changes:
>
> 1. The enable and disable operations for the CATU device are not
> reentrant. Therefore, a spinlock is added to ensure that only
> one CPU can enable or disable a given CATU device at any point
> in time.
>
> 2. A reference counter is used to manage the enable/disable state
> of the CATU device. The device is enabled when the first CPU
> requires it and is only disabled when the last CPU finishes
> using it. This ensures the device remains active as long as at
> least one CPU needs it.
>
> Signed-off-by: Yabin Cui <yabinc(a)google.com>
> ---
> drivers/hwtracing/coresight/coresight-catu.c | 29 ++++++++++++++------
> drivers/hwtracing/coresight/coresight-catu.h | 1 +
> 2 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> index 275cc0d9f505..834a7ffbbdbc 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.c
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -458,12 +458,19 @@ static int catu_enable_hw(struct catu_drvdata *drvdata, enum cs_mode cs_mode,
> static int catu_enable(struct coresight_device *csdev, enum cs_mode mode,
> void *data)
> {
> - int rc;
> + int rc = 0;
> + unsigned long flags;
> struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>
> - CS_UNLOCK(catu_drvdata->base);
> - rc = catu_enable_hw(catu_drvdata, mode, data);
> - CS_LOCK(catu_drvdata->base);
> + spin_lock_irqsave(&catu_drvdata->spinlock, flags);
Hi Yabin,
This needs to be a raw_spinlock since [1]. Also you might as well use
the new guard() thing to save someone find-and-replacing it later.
But I'm wondering if this is accurate. The ETR's refcount is dependent
on the pid of the owner of the trace session:
/* Do not proceed if this device is associated with another session */
if (drvdata->pid != -1 && drvdata->pid != pid) {
rc = -EBUSY;
goto unlock_out;
}
/*
* No HW configuration is needed if the sink is already in
* use for this session.
*/
if (drvdata->pid == pid) {
csdev->refcnt++;
goto unlock_out;
}
If the helpers get enabled first, could this mean that CATU gets
associated with a different session than the ETR? Maybe not, but it
would be easier to understand if the core code handled the refcounting
and locking for linked devices.
[1]:
https://lore.kernel.org/all/20250306121110.1647948-3-yeoreum.yun@arm.com/
Thanks
James
> + if (csdev->refcnt == 0) {
> + CS_UNLOCK(catu_drvdata->base);
> + rc = catu_enable_hw(catu_drvdata, mode, data);
> + CS_LOCK(catu_drvdata->base);
> + }
> + if (!rc)
> + csdev->refcnt++;
> + spin_unlock_irqrestore(&catu_drvdata->spinlock, flags);
> return rc;
> }
>
> @@ -486,12 +493,17 @@ static int catu_disable_hw(struct catu_drvdata *drvdata)
>
> static int catu_disable(struct coresight_device *csdev, void *__unused)
> {
> - int rc;
> + int rc = 0;
> + unsigned long flags;
> struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>
> - CS_UNLOCK(catu_drvdata->base);
> - rc = catu_disable_hw(catu_drvdata);
> - CS_LOCK(catu_drvdata->base);
> + spin_lock_irqsave(&catu_drvdata->spinlock, flags);
> + if (--csdev->refcnt == 0) {
> + CS_UNLOCK(catu_drvdata->base);
> + rc = catu_disable_hw(catu_drvdata);
> + CS_LOCK(catu_drvdata->base);
> + }
> + spin_unlock_irqrestore(&catu_drvdata->spinlock, flags);
> return rc;
> }
>
> @@ -550,6 +562,7 @@ static int __catu_probe(struct device *dev, struct resource *res)
> dev->platform_data = pdata;
>
> drvdata->base = base;
> + spin_lock_init(&drvdata->spinlock);
> catu_desc.access = CSDEV_ACCESS_IOMEM(base);
> catu_desc.pdata = pdata;
> catu_desc.dev = dev;
> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
> index 141feac1c14b..663282ec6381 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.h
> +++ b/drivers/hwtracing/coresight/coresight-catu.h
> @@ -65,6 +65,7 @@ struct catu_drvdata {
> void __iomem *base;
> struct coresight_device *csdev;
> int irq;
> + spinlock_t spinlock;
> };
>
> #define CATU_REG32(name, offset) \
On 4/1/25 07:12, Jie Gan wrote:
> The trace_id in coresight_path may contain an error number which means a
> negative integer, but the current type of the trace_id is u8. Change the
> type to int to fix it.
>
> Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
> Fixes: 3c03c49b2fa5 ("Coresight: Introduce a new struct coresight_path")
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
LGTM
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
> ---
> include/linux/coresight.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index d79a242b271d..c2bf10c43e7c 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -337,7 +337,7 @@ static struct coresight_dev_list (var) = { \
> */
> struct coresight_path {
> struct list_head path_list;
> - u8 trace_id;
> + int trace_id;
> };
>
> enum cs_mode {
On 31/03/2025 8:14 am, Andy Shevchenko wrote:
> The fwnode.h is not supposed to be used by the drivers as it
> has the definitions for the core parts for different device
> property provider implementations. Drop it.
>
> Since the code wants to use the pointer to the struct fwnode_handle
> the forward declaration is provided.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
> ---
> drivers/hwtracing/coresight/coresight-cti.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-cti.h b/drivers/hwtracing/coresight/coresight-cti.h
> index 16e310e7e9d4..8362a47c939c 100644
> --- a/drivers/hwtracing/coresight/coresight-cti.h
> +++ b/drivers/hwtracing/coresight/coresight-cti.h
> @@ -9,7 +9,6 @@
>
> #include <linux/coresight.h>
> #include <linux/device.h>
> -#include <linux/fwnode.h>
> #include <linux/list.h>
> #include <linux/spinlock.h>
> #include <linux/sysfs.h>
> @@ -17,6 +16,8 @@
>
> #include "coresight-priv.h"
>
> +struct fwnode_handle;
> +
> /*
> * Device registers
> * 0x000 - 0x144: CTI programming and status
Reviewed-by: James Clark <james.clark(a)linaro.org>
Hi Levi,
On Mon, Mar 24, 2025 at 07:17:40PM +0000, Yeoreum Yun wrote:
> While enable active config via cscfg_csdev_enable_active_config(),
> active config could be deactivated via configfs' sysfs interface.
> This could make UAF issue in below scenario:
>
> CPU0 CPU1
> (sysfs enable) load module
> cscfg_load_config_sets()
> activate config. // sysfs
> (sys_active_cnt == 1)
> ...
> cscfg_csdev_enable_active_config()
> lock(csdev->cscfg_csdev_lock)
> // here load config activate by CPU1
> unlock(csdev->cscfg_csdev_lock)
>
> deactivate config // sysfs
> (sys_activec_cnt == 0)
> cscfg_unload_config_sets()
> unload module
>
> // access to config_desc which freed
> // while unloading module.
> cfs_csdev_enable_config
I am not sure if this flow can happen. CoreSight configfs feature is
integrated into the CoreSight core layer, and the other CoreSight
modules are dependent on it.
For example, if the ETM4x module is not removed, the kernel module
management will natually prevent the CoreSight core module from being
removed.
> To address this, use cscfg_config_desc's active_cnt as a reference count
> which will be holded when
> - activate the config.
> - enable the activated config.
> and put the module reference when config_active_cnt == 0.
>
> Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
> ---
> Since v3:
> - Remove enable arguments in cscfg_config_desc_get() (from Mike).
> - https://lore.kernel.org/all/20250109171956.3535294-1-yeoreum.yun@arm.com/
> ---
> .../hwtracing/coresight/coresight-config.h | 2 +-
> .../coresight/coresight-etm4x-core.c | 3 ++
> .../hwtracing/coresight/coresight-syscfg.c | 52 +++++++++++++------
> 3 files changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-config.h b/drivers/hwtracing/coresight/coresight-config.h
> index b9ebc9fcfb7f..90fd937d3bd8 100644
> --- a/drivers/hwtracing/coresight/coresight-config.h
> +++ b/drivers/hwtracing/coresight/coresight-config.h
> @@ -228,7 +228,7 @@ struct cscfg_feature_csdev {
> * @feats_csdev:references to the device features to enable.
> */
> struct cscfg_config_csdev {
> - const struct cscfg_config_desc *config_desc;
> + struct cscfg_config_desc *config_desc;
> struct coresight_device *csdev;
> bool enabled;
> struct list_head node;
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index e5972f16abff..ef96028fa56b 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -1020,6 +1020,9 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
> smp_call_function_single(drvdata->cpu, etm4_disable_hw, drvdata, 1);
>
> raw_spin_unlock(&drvdata->spinlock);
> +
> + cscfg_csdev_disable_active_config(csdev);
> +
In general, we need to split changes into several patches if each
addresses a different issue. From my understanding, the change above is
to fix missing to disable config when disable Sysfs mode.
If so, could we use a seperate patch for this change?
> cpus_read_unlock();
>
> /*
> diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c
> index a70c1454b410..6d8c212ad434 100644
> --- a/drivers/hwtracing/coresight/coresight-syscfg.c
> +++ b/drivers/hwtracing/coresight/coresight-syscfg.c
> @@ -391,14 +391,17 @@ static void cscfg_owner_put(struct cscfg_load_owner_info *owner_info)
> static void cscfg_remove_owned_csdev_configs(struct coresight_device *csdev, void *load_owner)
> {
> struct cscfg_config_csdev *config_csdev, *tmp;
> + unsigned long flags;
>
> if (list_empty(&csdev->config_csdev_list))
> return;
>
> + raw_spin_lock_irqsave(&csdev->cscfg_csdev_lock, flags);
I think we should use spinlock to guard the condition checking
list_empty().
Here the race condition is the 'config_csdev_list' list and
configurations on the list. For atomicity, we should use lock to
protect any operations on the list (read, add, delete, etc).
A side topic, as here it adds locks for protecting 'config_csdev_list',
I am wandering why we do not do the same thing for
'feature_csdev_list' (See cscfg_remove_owned_csdev_features() and
cscfg_get_feat_csdev()).
> list_for_each_entry_safe(config_csdev, tmp, &csdev->config_csdev_list, node) {
> if (config_csdev->config_desc->load_owner == load_owner)
> list_del(&config_csdev->node);
> }
> + raw_spin_unlock_irqrestore(&csdev->cscfg_csdev_lock, flags);
> }
>
> static void cscfg_remove_owned_csdev_features(struct coresight_device *csdev, void *load_owner)
> @@ -867,6 +870,25 @@ void cscfg_csdev_reset_feats(struct coresight_device *csdev)
> }
> EXPORT_SYMBOL_GPL(cscfg_csdev_reset_feats);
>
> +static bool cscfg_config_desc_get(struct cscfg_config_desc *config_desc)
> +{
> + if (!atomic_fetch_inc(&config_desc->active_cnt)) {
> + /* must ensure that config cannot be unloaded in use */
> + if (unlikely(cscfg_owner_get(config_desc->load_owner))) {
> + atomic_dec(&config_desc->active_cnt);
> + return false;
> + }
> + }
> +
> + return true;
> +}
> +
> +static void cscfg_config_desc_put(struct cscfg_config_desc *config_desc)
> +{
> + if (!atomic_dec_return(&config_desc->active_cnt))
> + cscfg_owner_put(config_desc->load_owner);
> +}
> +
> /*
> * This activate configuration for either perf or sysfs. Perf can have multiple
> * active configs, selected per event, sysfs is limited to one.
> @@ -890,22 +912,17 @@ static int _cscfg_activate_config(unsigned long cfg_hash)
> if (config_desc->available == false)
> return -EBUSY;
>
> - /* must ensure that config cannot be unloaded in use */
> - err = cscfg_owner_get(config_desc->load_owner);
> - if (err)
> + if (!cscfg_config_desc_get(config_desc)) {
> + err = -EINVAL;
> break;
> + }
> +
> /*
> * increment the global active count - control changes to
> * active configurations
> */
> atomic_inc(&cscfg_mgr->sys_active_cnt);
Seems to me, it is more reasonable to use 'sys_active_cnt' to acquire
the module reference instead of 'config_desc->active_cnt'. The reason
is 'sys_active_cnt' is a global counter.
> - /*
> - * mark the descriptor as active so enable config on a
> - * device instance will use it
> - */
> - atomic_inc(&config_desc->active_cnt);
> -
> err = 0;
> dev_dbg(cscfg_device(), "Activate config %s.\n", config_desc->name);
> break;
> @@ -920,9 +937,8 @@ static void _cscfg_deactivate_config(unsigned long cfg_hash)
>
> list_for_each_entry(config_desc, &cscfg_mgr->config_desc_list, item) {
> if ((unsigned long)config_desc->event_ea->var == cfg_hash) {
> - atomic_dec(&config_desc->active_cnt);
> atomic_dec(&cscfg_mgr->sys_active_cnt);
> - cscfg_owner_put(config_desc->load_owner);
> + cscfg_config_desc_put(config_desc);
> dev_dbg(cscfg_device(), "Deactivate config %s.\n", config_desc->name);
> break;
> }
> @@ -1047,7 +1063,7 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> unsigned long cfg_hash, int preset)
> {
> struct cscfg_config_csdev *config_csdev_active = NULL, *config_csdev_item;
> - const struct cscfg_config_desc *config_desc;
> + struct cscfg_config_desc *config_desc;
> unsigned long flags;
> int err = 0;
>
> @@ -1062,8 +1078,8 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> raw_spin_lock_irqsave(&csdev->cscfg_csdev_lock, flags);
> list_for_each_entry(config_csdev_item, &csdev->config_csdev_list, node) {
> config_desc = config_csdev_item->config_desc;
> - if ((atomic_read(&config_desc->active_cnt)) &&
> - ((unsigned long)config_desc->event_ea->var == cfg_hash)) {
> + if (((unsigned long)config_desc->event_ea->var == cfg_hash) &&
> + cscfg_config_desc_get(config_desc)) {
This seems to me not right. Why a config descriptor is get in multiple
places? One time getting a config descriptor is in
_cscfg_activate_config(), another is at here.
To be honest, I am not clear what is the difference between 'activate'
config and 'enable active' config. Literally, I think we only need to
acquire the config at its creating phase (maybe match to activate
config?).
> config_csdev_active = config_csdev_item;
> csdev->active_cscfg_ctxt = (void *)config_csdev_active;
> break;
> @@ -1097,7 +1113,11 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> err = -EBUSY;
> raw_spin_unlock_irqrestore(&csdev->cscfg_csdev_lock, flags);
> }
> +
> + if (err)
> + cscfg_config_desc_put(config_desc);
> }
> +
> return err;
> }
> EXPORT_SYMBOL_GPL(cscfg_csdev_enable_active_config);
> @@ -1136,8 +1156,10 @@ void cscfg_csdev_disable_active_config(struct coresight_device *csdev)
> raw_spin_unlock_irqrestore(&csdev->cscfg_csdev_lock, flags);
>
> /* true if there was an enabled active config */
> - if (config_csdev)
> + if (config_csdev) {
> cscfg_csdev_disable_config(config_csdev);
> + cscfg_config_desc_put(config_csdev->config_desc);
> + }
> }
> EXPORT_SYMBOL_GPL(cscfg_csdev_disable_active_config);
>
> --
> LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
>
I've gotten stuck a few times with unusable Coresight after a warm boot
due to lingering claim tags, especially when testing the Coresight
panic patchsets.
This change does some tidy ups, adds some debug messages and clears the
self hosted claim tag on probe. The last two commits are unrelated
tidyups but they touch some of the same functions so to avoid extra
conflicts I'm including them here.
This gets as far as fixing the claim tag issue, but there is some other
state not being cleared on probe that results in the following error.
This can be fixed up as a later change:
coresight tmc_etf0: timeout while waiting for TMC to be Ready
coresight tmc_etf0: Failed to enable : TMC is not ready
Changes in v3:
- Collapse rename and locked/unlocked addition commits of
coresight_clear_self_claim_tag() so we don't change the name twice.
- Make coresight_clear_self_claim_tag() a bit more generic by only
doing UNLOCK for MMIO devices (although there is no use of this right
now)
- Link to v2: https://lore.kernel.org/r/20250318-james-coresight-claim-tags-v2-0-e9c8a9cd…
Changes in v2:
* Revert most of the interface changes, just call
coresight_clear_self_claim_tag() directly. This is possible because
we're not doing the read first, so it has fewer knock on effects.
* Split out the change to add struct cs_access to etm3x
* Add another warning for racing with external debugger
--
2.34.1
---
James Clark (7):
coresight: Convert tag clear function to take a struct cs_access
coresight: Only check bottom two claim bits
coresight: Add claim tag warnings and debug messages
coresight: etm3x: Convert raw base pointer to struct coresight access
coresight: Clear self hosted claim tag on probe
coresight: Remove inlines from static function definitions
coresight: Remove extern from function declarations
drivers/hwtracing/coresight/coresight-catu.c | 12 +--
drivers/hwtracing/coresight/coresight-core.c | 87 ++++++++++++++--------
drivers/hwtracing/coresight/coresight-cti-core.c | 2 +
drivers/hwtracing/coresight/coresight-etb10.c | 4 +-
drivers/hwtracing/coresight/coresight-etm.h | 6 +-
drivers/hwtracing/coresight/coresight-etm3x-core.c | 28 +++----
.../hwtracing/coresight/coresight-etm3x-sysfs.c | 8 +-
drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++-
.../hwtracing/coresight/coresight-etm4x-sysfs.c | 4 +-
drivers/hwtracing/coresight/coresight-funnel.c | 1 +
drivers/hwtracing/coresight/coresight-platform.c | 26 +++----
drivers/hwtracing/coresight/coresight-priv.h | 20 ++---
drivers/hwtracing/coresight/coresight-replicator.c | 3 +-
drivers/hwtracing/coresight/coresight-stm.c | 6 +-
.../coresight/coresight-syscfg-configfs.c | 2 +-
drivers/hwtracing/coresight/coresight-tmc-core.c | 9 ++-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 16 ++--
drivers/hwtracing/coresight/coresight-trbe.c | 18 ++---
include/linux/coresight.h | 40 +++++-----
19 files changed, 168 insertions(+), 134 deletions(-)
---
base-commit: 5442d22da7dbff3ba8c6720fc6f23ea4934d402d
change-id: 20250317-james-coresight-claim-tags-ae1461f1f5e0
Best regards,
--
James Clark <james.clark(a)linaro.org>
I've gotten stuck a few times with unusable Coresight after a warm boot
due to lingering claim tags, especially when testing the Coresight
panic patchsets.
This change does some tidy ups, adds some debug messages and clears the
self hosted claim tag on probe. The last two commits are unrelated
tidyups but they touch some of the same functions so to avoid extra
conflicts I'm including them here.
This gets as far as fixing the claim tag issue, but there is some other
state not being cleared on probe that results in the following error.
This can be fixed up as a later change:
coresight tmc_etf0: timeout while waiting for TMC to be Ready
coresight tmc_etf0: Failed to enable : TMC is not ready
Changes in v4:
- Add _unlocked() suffix for consistency
- s/cs_access/csdev_access/
- Link to v3: https://lore.kernel.org/r/20250320-james-coresight-claim-tags-v3-0-d3145c15…
Changes in v3:
- Collapse rename and locked/unlocked addition commits of
coresight_clear_self_claim_tag() so we don't change the name twice.
- Make coresight_clear_self_claim_tag() a bit more generic by only
doing UNLOCK for MMIO devices (although there is no use of this right
now)
- Link to v2: https://lore.kernel.org/r/20250318-james-coresight-claim-tags-v2-0-e9c8a9cd…
Changes in v2:
* Revert most of the interface changes, just call
coresight_clear_self_claim_tag() directly. This is possible because
we're not doing the read first, so it has fewer knock on effects.
* Split out the change to add struct cs_access to etm3x
* Add another warning for racing with external debugger
--
2.34.1
---
James Clark (7):
coresight: Convert tag clear function to take a struct csdev_access
coresight: Only check bottom two claim bits
coresight: Add claim tag warnings and debug messages
coresight: etm3x: Convert raw base pointer to struct coresight access
coresight: Clear self hosted claim tag on probe
coresight: Remove inlines from static function definitions
coresight: Remove extern from function declarations
drivers/hwtracing/coresight/coresight-catu.c | 12 +--
drivers/hwtracing/coresight/coresight-core.c | 87 ++++++++++++++--------
drivers/hwtracing/coresight/coresight-cti-core.c | 2 +
drivers/hwtracing/coresight/coresight-etb10.c | 4 +-
drivers/hwtracing/coresight/coresight-etm.h | 6 +-
drivers/hwtracing/coresight/coresight-etm3x-core.c | 28 +++----
.../hwtracing/coresight/coresight-etm3x-sysfs.c | 8 +-
drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++-
.../hwtracing/coresight/coresight-etm4x-sysfs.c | 4 +-
drivers/hwtracing/coresight/coresight-funnel.c | 1 +
drivers/hwtracing/coresight/coresight-platform.c | 26 +++----
drivers/hwtracing/coresight/coresight-priv.h | 20 ++---
drivers/hwtracing/coresight/coresight-replicator.c | 3 +-
drivers/hwtracing/coresight/coresight-stm.c | 6 +-
.../coresight/coresight-syscfg-configfs.c | 2 +-
drivers/hwtracing/coresight/coresight-tmc-core.c | 9 ++-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 16 ++--
drivers/hwtracing/coresight/coresight-trbe.c | 18 ++---
include/linux/coresight.h | 40 +++++-----
19 files changed, 168 insertions(+), 134 deletions(-)
---
base-commit: 5442d22da7dbff3ba8c6720fc6f23ea4934d402d
change-id: 20250317-james-coresight-claim-tags-ae1461f1f5e0
Best regards,
--
James Clark <james.clark(a)linaro.org>
Hi,
On Mon, 24 Mar 2025 at 13:59, Yeoreum Yun <yeoreum.yun(a)arm.com> wrote:
>
> Hi Mike,
>
> Please ignore my foremer mail.. and please see my comments for your
> suggestion.
>
> > Hi
> >
> > On Fri, 14 Mar 2025 at 15:25, Yeo Reum Yun <YeoReum.Yun(a)arm.com> wrote:
> > >
> > > Hi, Mike.
> > >
> > > > > static void cscfg_remove_owned_csdev_features(struct coresight_device *csdev, void *load_owner)
> > > > > @@ -867,6 +870,28 @@ void cscfg_csdev_reset_feats(struct coresight_device *csdev)
> > > > > }
> > > > > EXPORT_SYMBOL_GPL(cscfg_csdev_reset_feats);
> > > > >
> > > > > +static bool cscfg_config_desc_get(struct cscfg_config_desc *config_desc, bool enable)
> > > > > +{
> > > > > + if (enable)
> > > > > + return atomic_inc_not_zero(&config_desc->active_cnt);
> > > > > +
> > > >
> > > > Not sure why we have an "enable" parameter here - it completely
> > > > changes the meaning of the function - with no comment at the start.
> > >
> > > Sorry. But what I intended is to distinguish
> > > - activation of config
> > > - enable of activated config.
> > > Because, current coresight doesn't grab the module reference on enable of activate config,
> > > But It grabs that reference only in activation.
> > > That's why I used to "enable" parameter to distinguish this
> > > while I integrate with module_owner count.
> > >
> > > > > list_for_each_entry(config_desc, &cscfg_mgr->config_desc_list, item) {
> > > > > if ((unsigned long)config_desc->event_ea->var == cfg_hash) {
> > > > > - atomic_dec(&config_desc->active_cnt);
> > > > > atomic_dec(&cscfg_mgr->sys_active_cnt);
> > > > > - cscfg_owner_put(config_desc->load_owner);
> > > > > + cscfg_config_desc_put(config_desc);
> > > > > dev_dbg(cscfg_device(), "Deactivate config %s.\n", config_desc->name);
> > > > > break;
> > > > > }
> > > > > @@ -1047,7 +1066,7 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> > > > > unsigned long cfg_hash, int preset)
> > > > > {
> > > > > struct cscfg_config_csdev *config_csdev_active = NULL, *config_csdev_item;
> > > > > - const struct cscfg_config_desc *config_desc;
> > > > > + struct cscfg_config_desc *config_desc;
> > > > > unsigned long flags;
> > > > > int err = 0;
> > > > >
> > > > > @@ -1062,8 +1081,8 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> > > > > raw_spin_lock_irqsave(&csdev->cscfg_csdev_lock, flags);
> > > > > list_for_each_entry(config_csdev_item, &csdev->config_csdev_list, node) {
> > > > > config_desc = config_csdev_item->config_desc;
> > > > > - if ((atomic_read(&config_desc->active_cnt)) &&
> > > > > - ((unsigned long)config_desc->event_ea->var == cfg_hash)) {
> > > > > + if (((unsigned long)config_desc->event_ea->var == cfg_hash) &&
> > > > > + cscfg_config_desc_get(config_desc, true)) {
> > > > >
> > > > This obfuscates the logic of the comparisons without good reason. With
> > > > the true parameter, the function does no "get" operation but just
> > > > replicates the logic being replaced - checking the active_cnt is
> > > > non-zero.
> > > >
> > > > Restore this to the original logic to make it readable again
> > >
> > > It's not a replicates of comparsion logic, but if true,
> >
> > sorry - missed that point .
> >
> > > It get the reference of active_cnt but not get module reference.
> > > The fundemental fault in the UAF becase of just "atomic_read()"
> > > so, it should hold reference in here.
> > >
> > > So, If you think the cscfg_config_desc_get()'s parameter makes obfuscation,
> > > I think there're two way to modfiy.
> > >
> > > 1. cscfg_config_desc_get()/put() always grab/drop the module count.
> > > 2. remove cscfg_config_desc_get()/put() but just use atomic_XXX(&active_cnt) only
> > > with cscfg_owner_get()/put()
> > >
> > > Any thougt?
> > >
> > > Thanks!
> > >
> > >
> >
> > The get and put functions are asymmetrical w.r.t. owner.
> >
> > The put will put owner if active count decrements to 0,
> > The get if not on enable path will put owner unconditionally.
> >
> > This means that the caller has to work out the correct input conditions.
> >
> > Might be better if:-
> >
> > get_desc()
> > {
> > if (! desc->refcnt) {
> > if (!get_owner())
> > return false;
> > }
> > desc->refcnt++;
> > return true;
> > }
>
> I think This makes another problem when
> it races with _cscfg_deactivate_config().
>
> CPU0 CPU1
> (sysfs enable) load module
> cscfg_load_config_sets()
> activate config. // sysfs
> (sys_active_cnt == 1)
>
> // sysfs
> _cscfg_deactivate_config()
> (sys_active_cnt == 0)
> (config->active_cnt = 0)
> ...
> cscfg_csdev_enable_active_config()
> lock(csdev->cscfg_csdev_lock)
> // here get module reference??
> // even sys_active_cnt == 0 and
> // config->active_cnt == 1.
> get_desc()
> unlock(csdev->cscfg_csdev_lock)
>
>
> // access to config_desc which freed
> // while unloading module.
> cfs_csdev_enable_config
>
>
> Because, the desc->refcnt meaning of zero is different from the context.
> - while activate . it should get module reference if zero.
> - while enable active configuration, if zero, it should be failed.
>
> that means to prevent this race, the core key point is:
> when config->active_cnt == 0, it should be failed in cscfg_csdev_enable_active_config()
>
This is not a failure case, it simple means that this config should
not be activated for this device.
It is possible for a configuration to be active on the system, without
it being active for a particular coresight device.
Having a get/put interface for the config descriptor - which prevents
the config from being unloaded is fine, the key logic here is that we
are searching a list of possible enabled configurations for this
device and taking the necessary action to enable it if we find one -
and there can only ever be a single configuration enabled for a trace
session.
Therefore when the list of loaded configs for a device is > 1, then
all but one is allowed to be active - so the search code will validly
find instances where config->active_cnt == 0.
My objection to the original interface was not the get/put operations
to protect the module from unload, but the fact that the logic
deciding if a config needed to be enabled on the device was hidden
inside the get() operation.
My suggestion is to restore the logic that decides if there is a
config to enable on the device be clear in the enable function itself,
then use get/put as appropriate to prevent module unload.
Regards
Mike
> Because, according to context the handling the zero reference value is
> different, It seems,to integrate the get_desc() interface to handle
> above case together without extra arguments (in case of here is
> "enable").
>
> If this interface is really ugly and unhappy to you,
> I think It should remove get_desc()/put_desc().
> although we can add new interface for cscfg_config_desc_get() for enable
> path. but it makes people more confused.
>
> So my suggestion is:
> - sustain this patch's contents
> - or remove get_desc()/put_desc() interface but use
> atomic_inc_zero(&config_desc->active_cnt) directly in
> cscfg_csdev_enable_active_config()
>
> Any thougt?
>
> Thanks.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi
On Fri, 14 Mar 2025 at 15:25, Yeo Reum Yun <YeoReum.Yun(a)arm.com> wrote:
>
> Hi, Mike.
>
> > > static void cscfg_remove_owned_csdev_features(struct coresight_device *csdev, void *load_owner)
> > > @@ -867,6 +870,28 @@ void cscfg_csdev_reset_feats(struct coresight_device *csdev)
> > > }
> > > EXPORT_SYMBOL_GPL(cscfg_csdev_reset_feats);
> > >
> > > +static bool cscfg_config_desc_get(struct cscfg_config_desc *config_desc, bool enable)
> > > +{
> > > + if (enable)
> > > + return atomic_inc_not_zero(&config_desc->active_cnt);
> > > +
> >
> > Not sure why we have an "enable" parameter here - it completely
> > changes the meaning of the function - with no comment at the start.
>
> Sorry. But what I intended is to distinguish
> - activation of config
> - enable of activated config.
> Because, current coresight doesn't grab the module reference on enable of activate config,
> But It grabs that reference only in activation.
> That's why I used to "enable" parameter to distinguish this
> while I integrate with module_owner count.
>
> > > list_for_each_entry(config_desc, &cscfg_mgr->config_desc_list, item) {
> > > if ((unsigned long)config_desc->event_ea->var == cfg_hash) {
> > > - atomic_dec(&config_desc->active_cnt);
> > > atomic_dec(&cscfg_mgr->sys_active_cnt);
> > > - cscfg_owner_put(config_desc->load_owner);
> > > + cscfg_config_desc_put(config_desc);
> > > dev_dbg(cscfg_device(), "Deactivate config %s.\n", config_desc->name);
> > > break;
> > > }
> > > @@ -1047,7 +1066,7 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> > > unsigned long cfg_hash, int preset)
> > > {
> > > struct cscfg_config_csdev *config_csdev_active = NULL, *config_csdev_item;
> > > - const struct cscfg_config_desc *config_desc;
> > > + struct cscfg_config_desc *config_desc;
> > > unsigned long flags;
> > > int err = 0;
> > >
> > > @@ -1062,8 +1081,8 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> > > raw_spin_lock_irqsave(&csdev->cscfg_csdev_lock, flags);
> > > list_for_each_entry(config_csdev_item, &csdev->config_csdev_list, node) {
> > > config_desc = config_csdev_item->config_desc;
> > > - if ((atomic_read(&config_desc->active_cnt)) &&
> > > - ((unsigned long)config_desc->event_ea->var == cfg_hash)) {
> > > + if (((unsigned long)config_desc->event_ea->var == cfg_hash) &&
> > > + cscfg_config_desc_get(config_desc, true)) {
> > >
> > This obfuscates the logic of the comparisons without good reason. With
> > the true parameter, the function does no "get" operation but just
> > replicates the logic being replaced - checking the active_cnt is
> > non-zero.
> >
> > Restore this to the original logic to make it readable again
>
> It's not a replicates of comparsion logic, but if true,
sorry - missed that point .
> It get the reference of active_cnt but not get module reference.
> The fundemental fault in the UAF becase of just "atomic_read()"
> so, it should hold reference in here.
>
> So, If you think the cscfg_config_desc_get()'s parameter makes obfuscation,
> I think there're two way to modfiy.
>
> 1. cscfg_config_desc_get()/put() always grab/drop the module count.
> 2. remove cscfg_config_desc_get()/put() but just use atomic_XXX(&active_cnt) only
> with cscfg_owner_get()/put()
>
> Any thougt?
>
> Thanks!
>
>
The get and put functions are asymmetrical w.r.t. owner.
The put will put owner if active count decrements to 0,
The get if not on enable path will put owner unconditionally.
This means that the caller has to work out the correct input conditions.
Might be better if:-
get_desc()
{
if (! desc->refcnt) {
if (!get_owner())
return false;
}
desc->refcnt++;
return true;
}
put_desc()
{
desc->refcnt--;
if (! desc->refcnt)
put_owner()
}
and then change the enable_active_cfg matching logic to
if ( (desc->refcnt) && (desc->hash == hash) && get_desc()) {
< set active cfg>
}
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK