On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI. It allows a debugger to send to trigger events to a processor or to send a trigger event to one or more processors when a trigger event occurs on another processor on the same SoC, or even between SoCs. Qualcomm CTI implementation differs from the standard CTI in the following aspects:
- The number of supported triggers is extended to 128.
- Several register offsets differ from the CoreSight specification.
I apologize for my late review of this series. For easier maintenance later, I have several comments for register access.
[...]
+static const u32 cti_normal_offset[] = {
- [INDEX_CTIINTACK] = CTIINTACK,
- [INDEX_CTIAPPSET] = CTIAPPSET,
- [INDEX_CTIAPPCLEAR] = CTIAPPCLEAR,
- [INDEX_CTIAPPPULSE] = CTIAPPPULSE,
- [INDEX_CTIINEN] = CTIINEN(0),
- [INDEX_CTIOUTEN] = CTIOUTEN(0),
I prefer to update the these two macros to CTIINENn and CTIOUTENn, as later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
- [INDEX_CTITRIGINSTATUS] = CTITRIGINSTATUS,
- [INDEX_CTITRIGOUTSTATUS] = CTITRIGOUTSTATUS,
- [INDEX_CTICHINSTATUS] = CTICHINSTATUS,
- [INDEX_CTICHOUTSTATUS] = CTICHOUTSTATUS,
- [INDEX_CTIGATE] = CTIGATE,
- [INDEX_ASICCTL] = ASICCTL,
- [INDEX_ITCHINACK] = ITCHINACK,
- [INDEX_ITTRIGINACK] = ITTRIGINACK,
- [INDEX_ITCHOUT] = ITCHOUT,
- [INDEX_ITTRIGOUT] = ITTRIGOUT,
- [INDEX_ITCHOUTACK] = ITCHOUTACK,
- [INDEX_ITTRIGOUTACK] = ITTRIGOUTACK,
- [INDEX_ITCHIN] = ITCHIN,
- [INDEX_ITTRIGIN] = ITTRIGIN,
- [INDEX_ITCTRL] = CORESIGHT_ITCTRL,
+};
+static const u32 cti_extended_offset[] = {
- [INDEX_CTIINTACK] = QCOM_CTIINTACK,
- [INDEX_CTIAPPSET] = QCOM_CTIAPPSET,
- [INDEX_CTIAPPCLEAR] = QCOM_CTIAPPCLEAR,
- [INDEX_CTIAPPPULSE] = QCOM_CTIAPPPULSE,
- [INDEX_CTIINEN] = QCOM_CTIINEN,
- [INDEX_CTIOUTEN] = QCOM_CTIOUTEN,
- [INDEX_CTITRIGINSTATUS] = QCOM_CTITRIGINSTATUS,
- [INDEX_CTITRIGOUTSTATUS] = QCOM_CTITRIGOUTSTATUS,
- [INDEX_CTICHINSTATUS] = QCOM_CTICHINSTATUS,
- [INDEX_CTICHOUTSTATUS] = QCOM_CTICHOUTSTATUS,
- [INDEX_CTIGATE] = QCOM_CTIGATE,
- [INDEX_ASICCTL] = QCOM_ASICCTL,
- [INDEX_ITCHINACK] = QCOM_ITCHINACK,
- [INDEX_ITTRIGINACK] = QCOM_ITTRIGINACK,
- [INDEX_ITCHOUT] = QCOM_ITCHOUT,
- [INDEX_ITTRIGOUT] = QCOM_ITTRIGOUT,
- [INDEX_ITCHOUTACK] = QCOM_ITCHOUTACK,
- [INDEX_ITTRIGOUTACK] = QCOM_ITTRIGOUTACK,
- [INDEX_ITCHIN] = QCOM_ITCHIN,
- [INDEX_ITTRIGIN] = QCOM_ITTRIGIN,
- [INDEX_ITCTRL] = CORESIGHT_ITCTRL,
+};
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
Then you could create two helpers for register address:
static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata, u32 reg, u32 nr) { /* convert to qcom specific offset */ if (unlikely(drvdata->is_qcom_cti)) reg = cti_extended_offset[reg];
return drvdata->base + reg + sizeof(u32) * nr; }
static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg) { return cti_reg_addr_with_nr(drvdata, reg, 0); }
/*
- CTI devices can be associated with a PE, or be connected to CoreSight
@@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata) /* write the CTI trigger registers */ for (i = 0; i < config->nr_trig_max; i++) {
writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
writel_relaxed(config->ctiinen[i],drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));
writel_relaxed(config->ctiinen[i], cti_reg_addr_with_nr(drvdata, CTIINENn, i));
And apply for the same cases below.
/* other regs */
- writel_relaxed(config->ctigate, drvdata->base + CTIGATE);
- writel_relaxed(config->asicctl, drvdata->base + ASICCTL);
- writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
- writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));
writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
And apply for the same cases below.
[...]
@@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op, /* update the local register values */ chan_bitmask = BIT(channel_idx);
- reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :
CTIOUTEN(trigger_idx));
- reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :
cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));
For readable, we can improve a bit with code alignment:
reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) : cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
[...]
@@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id) drvdata->csdev_release = drvdata->csdev->dev.release; drvdata->csdev->dev.release = cti_device_release;
- /* check architect value*/
- devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);
- if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {
drvdata->subtype = QCOM_CTI;drvdata->offsets = cti_extended_offset;
As a result, we can only set the is_qcom_cti flag:
drvdata->is_qcom_cti = true;
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);
I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
- } else {
drvdata->subtype = ARM_STD_CTI;drvdata->offsets = cti_normal_offset;- }
- /* all done - dec pm refcount */ pm_runtime_put(&adev->dev);
- dev_info(&drvdata->csdev->dev, "CTI initialized\n");
- dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);
dev_info(&drvdata->csdev->dev, "%s CTI initialized\n", drvdata->is_qcom_cti ? "QCOM" : "");
return 0; pm_release: diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c index a9df77215141..12a495382999 100644 --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = { /* register based attributes */ -/* Read registers with power check only (no enable check). */ -static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
struct device_attribute *attr, char *buf){ struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev, return sysfs_emit(buf, "0x%x\n", val); } +/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:val = readl_relaxed(drvdata->base +cti_offset(drvdata, cti_attr->off, idx));break;default:val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));break;}- }
- raw_spin_unlock(&drvdata->spinlock);
- pm_runtime_put_sync(dev->parent);
- return sysfs_emit(buf, "0x%x\n", val);
+}
/* Write registers with power check only (no enable check). */ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct device_attribute *attr, @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); unsigned long val = 0;
- u32 idx;
if (kstrtoul(buf, 0, &val)) return -EINVAL; pm_runtime_get_sync(dev->parent); raw_spin_lock(&drvdata->spinlock);
- if (drvdata->config.hw_powered)
cti_write_single_reg(drvdata, cti_attr->off, val);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);break;default:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);break;}- }
For both coresight_cti_reg_show() and coresight_cti_reg_store(), can we always use "cti_attr->off" as the offset for regitser access? I mean we don't need the extra config.ext_reg_sel, eventually any register we can calculate a offset for it.
raw_spin_unlock(&drvdata->spinlock); pm_runtime_put_sync(dev->parent); return size; } +#define coresight_cti_mgmt_reg(name, offset) \
- (&((struct cs_off_attribute[]) { \
{ \__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL), \offset \} \- })[0].attr.attr)
#define coresight_cti_reg(name, offset) \ (&((struct cs_off_attribute[]) { \ { \ @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, /* coresight management registers */ static struct attribute *coresight_cti_mgmt_attrs[] = {
- coresight_cti_reg(devaff0, CTIDEVAFF0),
- coresight_cti_reg(devaff1, CTIDEVAFF1),
- coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),
- coresight_cti_reg(devarch, CORESIGHT_DEVARCH),
- coresight_cti_reg(devid, CORESIGHT_DEVID),
- coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),
- coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),
- coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),
- coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),
- coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),
- coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
- coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),
- coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),
- coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),
- coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),
- coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),
- coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),
- coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),
- coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),
- coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),
- coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),
- coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),
I don't see any benefit for updating from coresight_cti_reg() to coresight_cti_mgmt_reg(). If really want to do this, should remove the macro coresight_cti_reg()?
NULL, }; @@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
- If inaccessible & pcached_val not NULL then show cached value.
*/ static ssize_t cti_reg32_show(struct device *dev, char *buf,
u32 *pcached_val, int reg_offset)
u32 *pcached_val, int index)
We don't need to change anything for this. The passed "reg_offset" should be always a final offset, no matter for standard CTI or QCOM case, the driver directly uses the offset for register access.
[...]
+/*
- QCOM CTI supports up to 128 triggers, there are 6 registers need to be
- expanded to up to 4 instances, and ext_reg_sel can be used to indicate
- which one is in use.
- CTITRIGINSTATUS, CTITRIGOUTSTATUS,
- ITTRIGIN, ITTRIGOUT,
- ITTRIGINACK, ITTRIGOUTACK.
- */
+static ssize_t ext_reg_sel_show(struct device *dev,
struct device_attribute *attr,char *buf)+{
- u32 val;
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- val = drvdata->config.ext_reg_sel;
- raw_spin_unlock(&drvdata->spinlock);
- return sprintf(buf, "%d\n", val);
+}
+static ssize_t ext_reg_sel_store(struct device *dev,
struct device_attribute *attr,const char *buf, size_t size)+{
- unsigned long val;
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- if (kstrtoul(buf, 0, &val))
return -EINVAL;- if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))
return -EINVAL;- raw_spin_lock(&drvdata->spinlock);
- drvdata->config.ext_reg_sel = val;
- raw_spin_unlock(&drvdata->spinlock);
- return size;
+}
As said, I don't think the trigger register is any different from other register access. So the existed APIs would be sufficient.
As a result, we don't need to add two above functions.
Thanks, Leo
On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
[...]
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:
I read again and now I understand why you need "config.ext_reg_sel" as an index for these expending registers.
I think you should extend attrs for the new adding registers:
static struct attribute *coresight_cti_regs_attrs[] = { ... coresight_cti_reg(triginstatus, CTITRIGINSTATUS), /* Qcom CTI only for triginstatus1/2/3 */ coresight_cti_reg(triginstatus1, CTITRIGINSTATUS + 0x4), coresight_cti_reg(triginstatus2, CTITRIGINSTATUS + 0x8), coresight_cti_reg(triginstatus3, CTITRIGINSTATUS + 0xc), ... }
Then, you can add a is_visible() in coresight_cti_regs_group:
static umode_t coresight_cti_regs_is_visible(struct kobject *kobj, struct attribute *attr, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
/* Mute QCOM CTI registers for standard CTI module */ if (!drvdata->is_qcom_cti) { if (attr == &triginstatus1.attr || attr == &triginstatus2.attr || attr == &triginstatus3.attr) return 0; }
return attr->mode; }
static const struct attribute_group coresight_cti_regs_group = { .attrs = coresight_cti_regs_attrs, .name = "regs", .is_visible = coresight_cti_regs_is_visible, };
Thanks, Leo
Hi,
On Thu, 4 Dec 2025 at 08:38, Leo Yan leo.yan@arm.com wrote:
On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
[...]
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:I read again and now I understand why you need "config.ext_reg_sel" as an index for these expending registers.
Having this index for these extended registers matches what we do for the INEN and OUTEN registers. This gives the user a consistent approach. We do not want the unnecessary attributes as it will increase the memory footprint for all cti instances, not just the qcom ones.
The first patch in this series works to reduce the memory footprint by only allocating resource based on the actual configuration. For example for an ARM designed CTI with 8 trigger registers, we no longer declare static 128 x 32 bit arrays for each of INEN and OUTEN which were required by the original design.
Given that there can be 10s or 100s of CTIs in a large multicore system, reducing the footprint to match the actual configuration, and offering a level of compression by using an index + single file to access a set of registers improves the efficiency of the driver.
Regards
Mike
I think you should extend attrs for the new adding registers:
static struct attribute *coresight_cti_regs_attrs[] = { ... coresight_cti_reg(triginstatus, CTITRIGINSTATUS), /* Qcom CTI only for triginstatus1/2/3 */ coresight_cti_reg(triginstatus1, CTITRIGINSTATUS + 0x4), coresight_cti_reg(triginstatus2, CTITRIGINSTATUS + 0x8), coresight_cti_reg(triginstatus3, CTITRIGINSTATUS + 0xc), ... }
Then, you can add a is_visible() in coresight_cti_regs_group:
static umode_t coresight_cti_regs_is_visible(struct kobject *kobj, struct attribute *attr, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
/* Mute QCOM CTI registers for standard CTI module */ if (!drvdata->is_qcom_cti) { if (attr == &triginstatus1.attr || attr == &triginstatus2.attr || attr == &triginstatus3.attr) return 0; } return attr->mode;}
static const struct attribute_group coresight_cti_regs_group = { .attrs = coresight_cti_regs_attrs, .name = "regs", .is_visible = coresight_cti_regs_is_visible, };
Thanks, Leo
Hi,
On Wed, 3 Dec 2025 at 18:29, Leo Yan leo.yan@arm.com wrote:
On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI. It allows a debugger to send to trigger events to a processor or to send a trigger event to one or more processors when a trigger event occurs on another processor on the same SoC, or even between SoCs. Qualcomm CTI implementation differs from the standard CTI in the following aspects:
- The number of supported triggers is extended to 128.
- Several register offsets differ from the CoreSight specification.
I apologize for my late review of this series. For easier maintenance later, I have several comments for register access.
[...]
+static const u32 cti_normal_offset[] = {
[INDEX_CTIINTACK] = CTIINTACK,[INDEX_CTIAPPSET] = CTIAPPSET,[INDEX_CTIAPPCLEAR] = CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = CTIAPPPULSE,[INDEX_CTIINEN] = CTIINEN(0),[INDEX_CTIOUTEN] = CTIOUTEN(0),I prefer to update the these two macros to CTIINENn and CTIOUTENn, as later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
[INDEX_CTITRIGINSTATUS] = CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = CTICHOUTSTATUS,[INDEX_CTIGATE] = CTIGATE,[INDEX_ASICCTL] = ASICCTL,[INDEX_ITCHINACK] = ITCHINACK,[INDEX_ITTRIGINACK] = ITTRIGINACK,[INDEX_ITCHOUT] = ITCHOUT,[INDEX_ITTRIGOUT] = ITTRIGOUT,[INDEX_ITCHOUTACK] = ITCHOUTACK,[INDEX_ITTRIGOUTACK] = ITTRIGOUTACK,[INDEX_ITCHIN] = ITCHIN,[INDEX_ITTRIGIN] = ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
+static const u32 cti_extended_offset[] = {
[INDEX_CTIINTACK] = QCOM_CTIINTACK,[INDEX_CTIAPPSET] = QCOM_CTIAPPSET,[INDEX_CTIAPPCLEAR] = QCOM_CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = QCOM_CTIAPPPULSE,[INDEX_CTIINEN] = QCOM_CTIINEN,[INDEX_CTIOUTEN] = QCOM_CTIOUTEN,[INDEX_CTITRIGINSTATUS] = QCOM_CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = QCOM_CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = QCOM_CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = QCOM_CTICHOUTSTATUS,[INDEX_CTIGATE] = QCOM_CTIGATE,[INDEX_ASICCTL] = QCOM_ASICCTL,[INDEX_ITCHINACK] = QCOM_ITCHINACK,[INDEX_ITTRIGINACK] = QCOM_ITTRIGINACK,[INDEX_ITCHOUT] = QCOM_ITCHOUT,[INDEX_ITTRIGOUT] = QCOM_ITTRIGOUT,[INDEX_ITCHOUTACK] = QCOM_ITCHOUTACK,[INDEX_ITTRIGOUTACK] = QCOM_ITTRIGOUTACK,[INDEX_ITCHIN] = QCOM_ITCHIN,[INDEX_ITTRIGIN] = QCOM_ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
I suggested the dual offset approach a couple of patchset revisions ago as it actually simplifies the code & makes it more efficient. The offset array in use is set during probe and the remaining code is then common to both without lots of "if qcom else " occurences.
Regards
Mike
Then you could create two helpers for register address:
static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata, u32 reg, u32 nr) { /* convert to qcom specific offset */ if (unlikely(drvdata->is_qcom_cti)) reg = cti_extended_offset[reg]; return drvdata->base + reg + sizeof(u32) * nr; } static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg) { return cti_reg_addr_with_nr(drvdata, reg, 0); }/*
- CTI devices can be associated with a PE, or be connected to CoreSight
@@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
/* write the CTI trigger registers */ for (i = 0; i < config->nr_trig_max; i++) {
writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
writel_relaxed(config->ctiinen[i],drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));writel_relaxed(config->ctiinen[i], cti_reg_addr_with_nr(drvdata, CTIINENn, i));
And apply for the same cases below.
/* other regs */
writel_relaxed(config->ctigate, drvdata->base + CTIGATE);writel_relaxed(config->asicctl, drvdata->base + ASICCTL);writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
And apply for the same cases below.
[...]
@@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
/* update the local register values */ chan_bitmask = BIT(channel_idx);
reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :CTIOUTEN(trigger_idx));
reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));For readable, we can improve a bit with code alignment:
reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) : cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
[...]
@@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id) drvdata->csdev_release = drvdata->csdev->dev.release; drvdata->csdev->dev.release = cti_device_release;
/* check architect value*/devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {drvdata->subtype = QCOM_CTI;drvdata->offsets = cti_extended_offset;As a result, we can only set the is_qcom_cti flag:
drvdata->is_qcom_cti = true;
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
} else {drvdata->subtype = ARM_STD_CTI;drvdata->offsets = cti_normal_offset;}/* all done - dec pm refcount */ pm_runtime_put(&adev->dev);
dev_info(&drvdata->csdev->dev, "CTI initialized\n");
dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);dev_info(&drvdata->csdev->dev, "%s CTI initialized\n", drvdata->is_qcom_cti ? "QCOM" : "");
return 0;pm_release: diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c index a9df77215141..12a495382999 100644 --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
/* register based attributes */
-/* Read registers with power check only (no enable check). */ -static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
struct device_attribute *attr, char *buf){ struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev, return sysfs_emit(buf, "0x%x\n", val); }
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);u32 idx, val = 0;pm_runtime_get_sync(dev->parent);raw_spin_lock(&drvdata->spinlock);idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:val = readl_relaxed(drvdata->base +cti_offset(drvdata, cti_attr->off, idx));break;default:val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));break;}}raw_spin_unlock(&drvdata->spinlock);pm_runtime_put_sync(dev->parent);return sysfs_emit(buf, "0x%x\n", val);+}
/* Write registers with power check only (no enable check). */ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct device_attribute *attr, @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); unsigned long val = 0;
u32 idx; if (kstrtoul(buf, 0, &val)) return -EINVAL; pm_runtime_get_sync(dev->parent); raw_spin_lock(&drvdata->spinlock);
if (drvdata->config.hw_powered)cti_write_single_reg(drvdata, cti_attr->off, val);
idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);break;default:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);break;}}For both coresight_cti_reg_show() and coresight_cti_reg_store(), can we always use "cti_attr->off" as the offset for regitser access? I mean we don't need the extra config.ext_reg_sel, eventually any register we can calculate a offset for it.
raw_spin_unlock(&drvdata->spinlock); pm_runtime_put_sync(dev->parent); return size;}
+#define coresight_cti_mgmt_reg(name, offset) \
(&((struct cs_off_attribute[]) { \{ \__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL), \offset \} \})[0].attr.attr)#define coresight_cti_reg(name, offset) \ (&((struct cs_off_attribute[]) { \ { \ @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
/* coresight management registers */ static struct attribute *coresight_cti_mgmt_attrs[] = {
coresight_cti_reg(devaff0, CTIDEVAFF0),coresight_cti_reg(devaff1, CTIDEVAFF1),coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_reg(devid, CORESIGHT_DEVID),coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),I don't see any benefit for updating from coresight_cti_reg() to coresight_cti_mgmt_reg(). If really want to do this, should remove the macro coresight_cti_reg()?
NULL,};
@@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
- If inaccessible & pcached_val not NULL then show cached value.
*/ static ssize_t cti_reg32_show(struct device *dev, char *buf,
u32 *pcached_val, int reg_offset)
u32 *pcached_val, int index)We don't need to change anything for this. The passed "reg_offset" should be always a final offset, no matter for standard CTI or QCOM case, the driver directly uses the offset for register access.
[...]
+/*
- QCOM CTI supports up to 128 triggers, there are 6 registers need to be
- expanded to up to 4 instances, and ext_reg_sel can be used to indicate
- which one is in use.
- CTITRIGINSTATUS, CTITRIGOUTSTATUS,
- ITTRIGIN, ITTRIGOUT,
- ITTRIGINACK, ITTRIGOUTACK.
- */
+static ssize_t ext_reg_sel_show(struct device *dev,
struct device_attribute *attr,char *buf)+{
u32 val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);raw_spin_lock(&drvdata->spinlock);val = drvdata->config.ext_reg_sel;raw_spin_unlock(&drvdata->spinlock);return sprintf(buf, "%d\n", val);+}
+static ssize_t ext_reg_sel_store(struct device *dev,
struct device_attribute *attr,const char *buf, size_t size)+{
unsigned long val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);if (kstrtoul(buf, 0, &val))return -EINVAL;if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))return -EINVAL;raw_spin_lock(&drvdata->spinlock);drvdata->config.ext_reg_sel = val;raw_spin_unlock(&drvdata->spinlock);return size;+}
As said, I don't think the trigger register is any different from other register access. So the existed APIs would be sufficient.
As a result, we don't need to add two above functions.
Thanks, Leo
Hi
On Wed, 3 Dec 2025 at 18:29, Leo Yan leo.yan@arm.com wrote:
On Tue, Dec 02, 2025 at 02:42:21PM +0800, Yingchao Deng wrote:
The QCOM extended CTI is a heavily parameterized version of ARM’s CSCTI. It allows a debugger to send to trigger events to a processor or to send a trigger event to one or more processors when a trigger event occurs on another processor on the same SoC, or even between SoCs. Qualcomm CTI implementation differs from the standard CTI in the following aspects:
- The number of supported triggers is extended to 128.
- Several register offsets differ from the CoreSight specification.
I apologize for my late review of this series. For easier maintenance later, I have several comments for register access.
[...]
+static const u32 cti_normal_offset[] = {
[INDEX_CTIINTACK] = CTIINTACK,[INDEX_CTIAPPSET] = CTIAPPSET,[INDEX_CTIAPPCLEAR] = CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = CTIAPPPULSE,[INDEX_CTIINEN] = CTIINEN(0),[INDEX_CTIOUTEN] = CTIOUTEN(0),I prefer to update the these two macros to CTIINENn and CTIOUTENn, as later we will not use CTIINEN(n) and CTIOUTEN(n) anymore.
[INDEX_CTITRIGINSTATUS] = CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = CTICHOUTSTATUS,[INDEX_CTIGATE] = CTIGATE,[INDEX_ASICCTL] = ASICCTL,[INDEX_ITCHINACK] = ITCHINACK,[INDEX_ITTRIGINACK] = ITTRIGINACK,[INDEX_ITCHOUT] = ITCHOUT,[INDEX_ITTRIGOUT] = ITTRIGOUT,[INDEX_ITCHOUTACK] = ITCHOUTACK,[INDEX_ITTRIGOUTACK] = ITTRIGOUTACK,[INDEX_ITCHIN] = ITCHIN,[INDEX_ITTRIGIN] = ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
+static const u32 cti_extended_offset[] = {
[INDEX_CTIINTACK] = QCOM_CTIINTACK,[INDEX_CTIAPPSET] = QCOM_CTIAPPSET,[INDEX_CTIAPPCLEAR] = QCOM_CTIAPPCLEAR,[INDEX_CTIAPPPULSE] = QCOM_CTIAPPPULSE,[INDEX_CTIINEN] = QCOM_CTIINEN,[INDEX_CTIOUTEN] = QCOM_CTIOUTEN,[INDEX_CTITRIGINSTATUS] = QCOM_CTITRIGINSTATUS,[INDEX_CTITRIGOUTSTATUS] = QCOM_CTITRIGOUTSTATUS,[INDEX_CTICHINSTATUS] = QCOM_CTICHINSTATUS,[INDEX_CTICHOUTSTATUS] = QCOM_CTICHOUTSTATUS,[INDEX_CTIGATE] = QCOM_CTIGATE,[INDEX_ASICCTL] = QCOM_ASICCTL,[INDEX_ITCHINACK] = QCOM_ITCHINACK,[INDEX_ITTRIGINACK] = QCOM_ITTRIGINACK,[INDEX_ITCHOUT] = QCOM_ITCHOUT,[INDEX_ITTRIGOUT] = QCOM_ITTRIGOUT,[INDEX_ITCHOUTACK] = QCOM_ITCHOUTACK,[INDEX_ITTRIGOUTACK] = QCOM_ITTRIGOUTACK,[INDEX_ITCHIN] = QCOM_ITCHIN,[INDEX_ITTRIGIN] = QCOM_ITTRIGIN,[INDEX_ITCTRL] = CORESIGHT_ITCTRL,+};
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
Then you could create two helpers for register address:
static void __iomem *cti_reg_addr_with_nr(struct cti_drvdata *drvdata, u32 reg, u32 nr) { /* convert to qcom specific offset */ if (unlikely(drvdata->is_qcom_cti)) reg = cti_extended_offset[reg]; return drvdata->base + reg + sizeof(u32) * nr; } static void __iomem *cti_reg_addr(struct cti_drvdata *drvdata, u32 reg) { return cti_reg_addr_with_nr(drvdata, reg, 0); }/*
- CTI devices can be associated with a PE, or be connected to CoreSight
@@ -70,15 +119,16 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
/* write the CTI trigger registers */ for (i = 0; i < config->nr_trig_max; i++) {
writel_relaxed(config->ctiinen[i], drvdata->base + CTIINEN(i));
writel_relaxed(config->ctiinen[i],drvdata->base + cti_offset(drvdata, INDEX_CTIINEN, i));writel_relaxed(config->ctiinen[i], cti_reg_addr_with_nr(drvdata, CTIINENn, i));
And apply for the same cases below.
/* other regs */
writel_relaxed(config->ctigate, drvdata->base + CTIGATE);writel_relaxed(config->asicctl, drvdata->base + ASICCTL);writel_relaxed(config->ctiappset, drvdata->base + CTIAPPSET);
writel_relaxed(config->ctigate, drvdata->base + cti_offset(drvdata, INDEX_CTIGATE, 0));writel_relaxed(config->ctigate, cti_reg_addr(drvdata, CTIGATE));
And apply for the same cases below.
[...]
@@ -394,8 +447,8 @@ int cti_channel_trig_op(struct device *dev, enum cti_chan_op op,
/* update the local register values */ chan_bitmask = BIT(channel_idx);
reg_offset = (direction == CTI_TRIG_IN ? CTIINEN(trigger_idx) :CTIOUTEN(trigger_idx));
reg_offset = (direction == CTI_TRIG_IN ? cti_offset(drvdata, INDEX_CTIINEN, trigger_idx) :cti_offset(drvdata, INDEX_CTIOUTEN, trigger_idx));For readable, we can improve a bit with code alignment:
reg_offset = (direction == CTI_TRIG_IN) ? cti_reg_addr_with_nr(drvdata, CTIINENn, trigger_idx) : cti_reg_addr_with_nr(drvdata, CTIOUTENn, trigger_idx);
[...]
@@ -981,9 +1035,28 @@ static int cti_probe(struct amba_device *adev, const struct amba_id *id) drvdata->csdev_release = drvdata->csdev->dev.release; drvdata->csdev->dev.release = cti_device_release;
/* check architect value*/devarch = readl_relaxed(drvdata->base + CORESIGHT_DEVARCH);if (CTI_DEVARCH_ARCHITECT(devarch) == ARCHITECT_QCOM) {drvdata->subtype = QCOM_CTI;drvdata->offsets = cti_extended_offset;As a result, we can only set the is_qcom_cti flag:
drvdata->is_qcom_cti = true;
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Regards
Miike
} else {drvdata->subtype = ARM_STD_CTI;drvdata->offsets = cti_normal_offset;}/* all done - dec pm refcount */ pm_runtime_put(&adev->dev);
dev_info(&drvdata->csdev->dev, "CTI initialized\n");
dev_info(&drvdata->csdev->dev, "CTI initialized; subtype=%d\n", drvdata->subtype);dev_info(&drvdata->csdev->dev, "%s CTI initialized\n", drvdata->is_qcom_cti ? "QCOM" : "");
return 0;pm_release: diff --git a/drivers/hwtracing/coresight/coresight-cti-sysfs.c b/drivers/hwtracing/coresight/coresight-cti-sysfs.c index a9df77215141..12a495382999 100644 --- a/drivers/hwtracing/coresight/coresight-cti-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-cti-sysfs.c @@ -172,9 +172,8 @@ static struct attribute *coresight_cti_attrs[] = {
/* register based attributes */
-/* Read registers with power check only (no enable check). */ -static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+static ssize_t coresight_cti_mgmt_reg_show(struct device *dev,
struct device_attribute *attr, char *buf){ struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); @@ -189,6 +188,40 @@ static ssize_t coresight_cti_reg_show(struct device *dev, return sysfs_emit(buf, "0x%x\n", val); }
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);u32 idx, val = 0;pm_runtime_get_sync(dev->parent);raw_spin_lock(&drvdata->spinlock);idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:val = readl_relaxed(drvdata->base +cti_offset(drvdata, cti_attr->off, idx));break;default:val = readl_relaxed(drvdata->base + cti_offset(drvdata, cti_attr->off, 0));break;}}raw_spin_unlock(&drvdata->spinlock);pm_runtime_put_sync(dev->parent);return sysfs_emit(buf, "0x%x\n", val);+}
/* Write registers with power check only (no enable check). */ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct device_attribute *attr, @@ -197,19 +230,39 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev, struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent); struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr); unsigned long val = 0;
u32 idx; if (kstrtoul(buf, 0, &val)) return -EINVAL; pm_runtime_get_sync(dev->parent); raw_spin_lock(&drvdata->spinlock);
if (drvdata->config.hw_powered)cti_write_single_reg(drvdata, cti_attr->off, val);
idx = drvdata->config.ext_reg_sel;if (drvdata->config.hw_powered) {switch (cti_attr->off) {case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, idx), val);break;default:cti_write_single_reg(drvdata, cti_offset(drvdata, cti_attr->off, 0), val);break;}}For both coresight_cti_reg_show() and coresight_cti_reg_store(), can we always use "cti_attr->off" as the offset for regitser access? I mean we don't need the extra config.ext_reg_sel, eventually any register we can calculate a offset for it.
raw_spin_unlock(&drvdata->spinlock); pm_runtime_put_sync(dev->parent); return size;}
+#define coresight_cti_mgmt_reg(name, offset) \
(&((struct cs_off_attribute[]) { \{ \__ATTR(name, 0444, coresight_cti_mgmt_reg_show, NULL), \offset \} \})[0].attr.attr)#define coresight_cti_reg(name, offset) \ (&((struct cs_off_attribute[]) { \ { \ @@ -237,17 +290,17 @@ static __maybe_unused ssize_t coresight_cti_reg_store(struct device *dev,
/* coresight management registers */ static struct attribute *coresight_cti_mgmt_attrs[] = {
coresight_cti_reg(devaff0, CTIDEVAFF0),coresight_cti_reg(devaff1, CTIDEVAFF1),coresight_cti_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_reg(devid, CORESIGHT_DEVID),coresight_cti_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_reg(pidr4, CORESIGHT_PERIPHIDR4),
coresight_cti_mgmt_reg(devaff0, CTIDEVAFF0),coresight_cti_mgmt_reg(devaff1, CTIDEVAFF1),coresight_cti_mgmt_reg(authstatus, CORESIGHT_AUTHSTATUS),coresight_cti_mgmt_reg(devarch, CORESIGHT_DEVARCH),coresight_cti_mgmt_reg(devid, CORESIGHT_DEVID),coresight_cti_mgmt_reg(devtype, CORESIGHT_DEVTYPE),coresight_cti_mgmt_reg(pidr0, CORESIGHT_PERIPHIDR0),coresight_cti_mgmt_reg(pidr1, CORESIGHT_PERIPHIDR1),coresight_cti_mgmt_reg(pidr2, CORESIGHT_PERIPHIDR2),coresight_cti_mgmt_reg(pidr3, CORESIGHT_PERIPHIDR3),coresight_cti_mgmt_reg(pidr4, CORESIGHT_PERIPHIDR4),I don't see any benefit for updating from coresight_cti_reg() to coresight_cti_mgmt_reg(). If really want to do this, should remove the macro coresight_cti_reg()?
NULL,};
@@ -258,13 +311,15 @@ static struct attribute *coresight_cti_mgmt_attrs[] = {
- If inaccessible & pcached_val not NULL then show cached value.
*/ static ssize_t cti_reg32_show(struct device *dev, char *buf,
u32 *pcached_val, int reg_offset)
u32 *pcached_val, int index)We don't need to change anything for this. The passed "reg_offset" should be always a final offset, no matter for standard CTI or QCOM case, the driver directly uses the offset for register access.
[...]
+/*
- QCOM CTI supports up to 128 triggers, there are 6 registers need to be
- expanded to up to 4 instances, and ext_reg_sel can be used to indicate
- which one is in use.
- CTITRIGINSTATUS, CTITRIGOUTSTATUS,
- ITTRIGIN, ITTRIGOUT,
- ITTRIGINACK, ITTRIGOUTACK.
- */
+static ssize_t ext_reg_sel_show(struct device *dev,
struct device_attribute *attr,char *buf)+{
u32 val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);raw_spin_lock(&drvdata->spinlock);val = drvdata->config.ext_reg_sel;raw_spin_unlock(&drvdata->spinlock);return sprintf(buf, "%d\n", val);+}
+static ssize_t ext_reg_sel_store(struct device *dev,
struct device_attribute *attr,const char *buf, size_t size)+{
unsigned long val;struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);if (kstrtoul(buf, 0, &val))return -EINVAL;if (val > ((drvdata->config.nr_trig_max + 31) / 32 - 1))return -EINVAL;raw_spin_lock(&drvdata->spinlock);drvdata->config.ext_reg_sel = val;raw_spin_unlock(&drvdata->spinlock);return size;+}
As said, I don't think the trigger register is any different from other register access. So the existed APIs would be sufficient.
As a result, we don't need to add two above functions.
Thanks, Leo
On Thu, Dec 04, 2025 at 09:04:03AM +0000, Mike Leach wrote:
Hi,
On Thu, 4 Dec 2025 at 08:38, Leo Yan leo.yan@arm.com wrote:
On Wed, Dec 03, 2025 at 06:29:44PM +0000, Coresight ML wrote:
[...]
+/* Read registers with power check only (no enable check). */ +static ssize_t coresight_cti_reg_show(struct device *dev,
struct device_attribute *attr, char *buf)+{
- struct cti_drvdata *drvdata = dev_get_drvdata(dev->parent);
- struct cs_off_attribute *cti_attr = container_of(attr, struct cs_off_attribute, attr);
- u32 idx, val = 0;
- pm_runtime_get_sync(dev->parent);
- raw_spin_lock(&drvdata->spinlock);
- idx = drvdata->config.ext_reg_sel;
- if (drvdata->config.hw_powered) {
switch (cti_attr->off) {case INDEX_CTITRIGINSTATUS:case INDEX_CTITRIGOUTSTATUS:case INDEX_ITTRIGINACK:case INDEX_ITTRIGOUT:case INDEX_ITTRIGOUTACK:case INDEX_ITTRIGIN:I read again and now I understand why you need "config.ext_reg_sel" as an index for these expending registers.
Having this index for these extended registers matches what we do for the INEN and OUTEN registers. This gives the user a consistent approach. We do not want the unnecessary attributes as it will increase the memory footprint for all cti instances, not just the qcom ones.
I agree with using index for CTI triggers, but it is not necessary to add a new index for other registers (status, mode setting, ACK, etc).
It would be directive to present the status and mode setting registers, given these registers are only from 0-3. This will be easy accessed from userspace, and avoid complexity in the driver.
The first patch in this series works to reduce the memory footprint by only allocating resource based on the actual configuration. For example for an ARM designed CTI with 8 trigger registers, we no longer declare static 128 x 32 bit arrays for each of INEN and OUTEN which were required by the original design.
Given that there can be 10s or 100s of CTIs in a large multicore system, reducing the footprint to match the actual configuration, and offering a level of compression by using an index + single file to access a set of registers improves the efficiency of the driver.
It is good for reducing footprint, but I would give priority for a neat implementation and easy use interfaces.
And the sysfs attr code and global structures (e.g. register conversion struct) can shared by all instances, so I don't worry much the scale issue if we extend them.
Thanks, Leo
On Thu, Dec 04, 2025 at 09:07:56AM +0000, Mike Leach wrote:
[...]
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
I suggested the dual offset approach a couple of patchset revisions ago as it actually simplifies the code & makes it more efficient. The offset array in use is set during probe and the remaining code is then common to both without lots of "if qcom else " occurences.
AFAICS, we will handle the QCOM CTI particularly in three cases:
1) The register access; 2) The claim tag; 3) Sysfs attr is visible.
Now we are discussing the reigster access. As suggested, the "if qcom / else" is encapsulated (e.g., in cti_reg_addr_with_nr()), it will not spread out.
I'd use standard registers by default and convert to non-standard ones only when needed. A new "neutral" index layer seems redundant, as the existing standard register indexes already serve this purpose.
For the sysfs attrs, it makes sense to use a central place to decide which knobs are only visible for QCOM CTI, otherwise, we also will not spread the condition check.
I will reply separately for claim tag issue.
Thanks, Leo
On Thu, Dec 04, 2025 at 09:15:07AM +0000, Mike Leach wrote:
[...]
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Your patch works only when a module has implemented claim registers.
This leads to two issues: we end up clearing an unknown register in the CTI driver, and then the coresight core layer assumes it is reading a claim register even though it is not.
For QCOM CTI, combined with your patch, I would suggest directly setting csdev->access.claim_tag_impl to false (perhaps using a helper). This would be much clearer than the "hacking" way.
Thanks, Leo
Hi Leo
On Thu, 4 Dec 2025 at 10:47, Leo Yan leo.yan@arm.com wrote:
On Thu, Dec 04, 2025 at 09:15:07AM +0000, Mike Leach wrote:
[...]
/** QCOM CTI does not implement Claimtag functionality as* per CoreSight specification, but its CLAIMSET register* is incorrectly initialized to 0xF. This can mislead* tools or drivers into thinking the component is claimed.** Reset CLAIMSET to 0 to reflect that no claims are active.*/writel_relaxed(0, drvdata->base + CORESIGHT_CLAIMSET);I am confused for this. If QCOM CTI does not implement claim tag, then what is the designed register at the offset CORESIGHT_CLAIMSET?
Should you bypass all claim tag related operations for QCOM CTI case? (I don't see you touch anything for claim and declaim tags).
The patch I have created to handle systems without correct claim tag operation is a dependency for this patch set. Thus no need for override here as the core code will handle this correctly.
The only issue is ensuring the non-CTI spec implementation will result in the correct detection of no claim tags present.
Your patch works only when a module has implemented claim registers.
Per the Coresight spec - unimplemented registers must be RAZ/WI- so this still works for non implemented claim registers.
This leads to two issues: we end up clearing an unknown register in the CTI driver, and then the coresight core layer assumes it is reading a claim register even though it is not.
Again RAZ will simply read 0x0 - which is an indication that there are no claim bits implemented.
For QCOM CTI, combined with your patch, I would suggest directly setting csdev->access.claim_tag_impl to false (perhaps using a helper).
That would be a better solution, though as Qcom appear to have implemented a pair of standard RW registers rather than the claim tag functionality, the write solution works for this particular implementation.
Regards
Mike
This would be much clearer than the "hacking" way.
Thanks, Leo
Hi Leo
On Thu, 4 Dec 2025 at 10:31, Leo Yan leo.yan@arm.com wrote:
On Thu, Dec 04, 2025 at 09:07:56AM +0000, Mike Leach wrote:
[...]
I saw CTI registers are within 4KiB (0x1000), we can don't convert standard regiserts and only convert to QCOM register based on the standard ones. So you can drop the cti_normal_offset strucuture and only have a cti_reg_qcom_offset[] struct:
static const u32 cti_extended_offset[] = { [CTIINTACK] = QCOM_CTIINTACK, [CTIAPPSET] = QCOM_CTIAPPSET, [CTIAPPCLEAR] = QCOM_CTIAPPCLEAR, [CTIAPPPULSE] = QCOM_CTIAPPPULSE, [CTIINEN] = QCOM_CTIINEN, ... };
The tables in the patch are
[reg_type_array_index] = offset_address;
e.g.
[INDEX_CTIINTACK] = QCOM_CTIINTACK
which resolves to
[1] = 0x020
where index is constant for a given register type,
As far as I can tell what you have suggested above is a table that is
[std_addr_offset] = qcom_addr_offset;
e.g.
[CTIINTACK] = QCOM_CTIINTACK,
which resolves to
[0x10] = 0x020
which does not appear to work correctly?
The registers are sparsely spread across the memory map, so a simple mapping does not work, even if we divide the original offset by 4 to create a register number.
The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming the compiler allows us to sparselly populate the array (which I think it does, along with some padding), we end up with an array of at least 0xEF8 elements, rather then the indexed 21?
Regards
Mike
I suggested the dual offset approach a couple of patchset revisions ago as it actually simplifies the code & makes it more efficient. The offset array in use is set during probe and the remaining code is then common to both without lots of "if qcom else " occurences.
AFAICS, we will handle the QCOM CTI particularly in three cases:
- The register access;
- The claim tag;
- Sysfs attr is visible.
Now we are discussing the reigster access. As suggested, the "if qcom / else" is encapsulated (e.g., in cti_reg_addr_with_nr()), it will not spread out.
I'd use standard registers by default and convert to non-standard ones only when needed. A new "neutral" index layer seems redundant, as the existing standard register indexes already serve this purpose.
For the sysfs attrs, it makes sense to use a central place to decide which knobs are only visible for QCOM CTI, otherwise, we also will not spread the condition check.
I will reply separately for claim tag issue.
Thanks, Leo
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK