This allows enabling branch broadcast for Perf hosted sessions (the option is currently only available for the sysfs interface). Hopefully this could be useful for testing the decode in perf, for example does a determinisitic run with branch broadcast enabled look the same as with it disabled? It could also be used for scenarios like OpenJ9's support for JIT code:
java -Xjit:perfTool hello.java
Currently this is not working and you get the usual errors of a missing DSO, but branch broadcast would have to be enabled anyway before working through this next issue:
CS ETM Trace: Debug data not found for address 0xffff7b94b058 in /tmp/perf-29360.map
Address range support in Perf for branch broadcast has also not been added here, but could be added later.
The documentation has been refactored slightly to allow updates to be made that link the Perf format arguments with the existing documentation.
For Suzuki's comment, I will do it as a separate change that converts all the other hard coded values to a more consistent sysreg.h style:
nit: While at this, please could you change the hard coded value to ETM4_CFG_BIT_RETSTK ?
Changes since v1:
* Added Leo's reviewed by on patch 3 * Fix bracket styling * Add documentation
Applies on top of coresight/next efa56eddf5d5c. But this docs fix is also required to get the links to work: https://marc.info/?l=linux-doc&m=164139331923986&w=2
Also available at: https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-branch-broadcast-v2
James Clark (6): coresight: Add config flag to enable branch broadcast coresight: Fail to open with return stacks if they are unavailable perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast Documentation: coresight: Turn numbered subsections into real subsections Documentation: coresight: Link config options to existing documentation Documentation: coresight: Expand branch broadcast documentation
.../coresight/coresight-etm4x-reference.rst | 14 ++++- Documentation/trace/coresight/coresight.rst | 56 +++++++++++++++++-- .../hwtracing/coresight/coresight-etm-perf.c | 2 + .../coresight/coresight-etm4x-core.c | 23 ++++++-- include/linux/coresight-pmu.h | 2 + tools/include/linux/coresight-pmu.h | 2 + tools/perf/arch/arm/util/cs-etm.c | 3 + 7 files changed, 92 insertions(+), 10 deletions(-)
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com --- drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src); * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config'; * now take them as general formats and apply on all ETMs. */ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr, + &format_attr_branch_broadcast.attr, NULL, };
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
+ /* branch broadcast - enable if selected and supported */ + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) { + if (!drvdata->trcbb) { + ret = -EINVAL; + goto out; + } else { + config->cfg |= BIT(ETM4_CFG_BIT_BB); + } + } + out: return ret; } diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@ * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and * directly use below macros as config bits. */ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7
On Thu, 13 Jan 2022 at 09:11, James Clark james.clark@arm.com wrote:
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
- The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
- now take them as general formats and apply on all ETMs.
*/ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr,
&format_attr_branch_broadcast.attr, NULL,
};
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
/* branch broadcast - enable if selected and supported */
if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
if (!drvdata->trcbb) {
ret = -EINVAL;
goto out;
} else {
config->cfg |= BIT(ETM4_CFG_BIT_BB);
}
}
out: return ret; } diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6
#define ETM4_CFG_BIT_VMID 7
2.28.0
Reviewed-by: Mike Leach mike.leach@linaro.org
On 13/01/2022 09:10, James Clark wrote:
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
- The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
- now take them as general formats and apply on all ETMs.
*/ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr,
- &format_attr_branch_broadcast.attr,
Does it make sense to hide the option if the bb is not supported ? I guess it will be tricky as we don't track the common feature set. So, that said...
NULL, }; diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
- /* branch broadcast - enable if selected and supported */
- if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
if (!drvdata->trcbb) {
ret = -EINVAL;
Should we fail here ? We could simply ignore this and generate the trace normally. This would work on a big.LITTLE system with one set missing the branch broadcast, while the others support.
Mike,
Does this affect the trace decoding ? As such the OpenCSD should be able to decode the packets as they appear in the stream. Correct ?
Suzuki
goto out;
} else {
config->cfg |= BIT(ETM4_CFG_BIT_BB);
}
- }
- out: return ret; }
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29 /* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7
Hi James, Suzuki
On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 13/01/2022 09:10, James Clark wrote:
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
- The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
- now take them as general formats and apply on all ETMs.
*/ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr,
&format_attr_branch_broadcast.attr,
Does it make sense to hide the option if the bb is not supported ? I guess it will be tricky as we don't track the common feature set. So, that said...
NULL,
};
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
/* branch broadcast - enable if selected and supported */
if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
if (!drvdata->trcbb) {
ret = -EINVAL;
Should we fail here ? We could simply ignore this and generate the trace normally. This would work on a big.LITTLE system with one set missing the branch broadcast, while the others support.
Mike,
Does this affect the trace decoding ? As such the OpenCSD should be able to decode the packets as they appear in the stream. Correct ?
Depends on what you mean by affect the trace decoding! From the simplest perspective - no - there is no issue as the packets will be decode as seen. THE decoder does not know that BB exists - nor if it is enabled.
However, the reason that a user may engage BB is that the code is in some way self modifying - so that following the code static image and calculating addresses will result in a different path taken.
e.g. imagine an opcode:-
B <address0>
Without BB, this will trace as a single E atom, the decoder will calculate address0 from the opcode in the static image and continue from there as the next trace address.
Now look at the case where this is changed on the fly to
B <address1>
With BB, This will trace to E TGT_ADDR<address1>
The decoder will initially extract address0 from the static image, but the immediately following target address packet will alter the next address traced to address1 This is why we have BB.
So if the user has a reason to engage BB - we should really fail if it is not present - as the outcome of the trace can be affected.
Suzuki
goto out;
} else {
config->cfg |= BIT(ETM4_CFG_BIT_BB);
}
}
- out: return ret; }
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7
On 02/02/2022 20:25, Mike Leach wrote:
Hi James, Suzuki
On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 13/01/2022 09:10, James Clark wrote:
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
- The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
- now take them as general formats and apply on all ETMs.
*/ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr,
&format_attr_branch_broadcast.attr,
Does it make sense to hide the option if the bb is not supported ? I guess it will be tricky as we don't track the common feature set. So, that said...
NULL,
};
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
/* branch broadcast - enable if selected and supported */
if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
if (!drvdata->trcbb) {
ret = -EINVAL;
Should we fail here ? We could simply ignore this and generate the trace normally. This would work on a big.LITTLE system with one set missing the branch broadcast, while the others support.
Mike,
Does this affect the trace decoding ? As such the OpenCSD should be able to decode the packets as they appear in the stream. Correct ?
Depends on what you mean by affect the trace decoding! From the simplest perspective - no - there is no issue as the packets will be decode as seen. THE decoder does not know that BB exists - nor if it is enabled.
However, the reason that a user may engage BB is that the code is in some way self modifying - so that following the code static image and calculating addresses will result in a different path taken.
e.g. imagine an opcode:-
B <address0>
Without BB, this will trace as a single E atom, the decoder will calculate address0 from the opcode in the static image and continue from there as the next trace address.
Now look at the case where this is changed on the fly to
B <address1>
With BB, This will trace to E TGT_ADDR<address1>
The decoder will initially extract address0 from the static image, but the immediately following target address packet will alter the next address traced to address1 This is why we have BB.
So if the user has a reason to engage BB - we should really fail if it is not present - as the outcome of the trace can be affected.
Hi Mike,
Now I'm starting to wonder if it's best to have some kind of non-binary image mode for BB where you'd just get a list of addresses output by perf script and it doesn't attempt to do any lookups. Although I think that would require a change in OpenCSD if it's not aware of the mode?
I also need to go back to my JVM profiling test and see what's going on again. But I think I understand your points a bit more now.
Thanks James
Suzuki
goto out;
} else {
config->cfg |= BIT(ETM4_CFG_BIT_BB);
}
}
- out: return ret; }
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7
Hi James,
On Fri, 11 Mar 2022 at 14:58, James Clark james.clark@arm.com wrote:
On 02/02/2022 20:25, Mike Leach wrote:
Hi James, Suzuki
On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 13/01/2022 09:10, James Clark wrote:
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
- The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
- now take them as general formats and apply on all ETMs.
*/ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr,
&format_attr_branch_broadcast.attr,
Does it make sense to hide the option if the bb is not supported ? I guess it will be tricky as we don't track the common feature set. So, that said...
NULL,
};
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
/* branch broadcast - enable if selected and supported */
if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
if (!drvdata->trcbb) {
ret = -EINVAL;
Should we fail here ? We could simply ignore this and generate the trace normally. This would work on a big.LITTLE system with one set missing the branch broadcast, while the others support.
Mike,
Does this affect the trace decoding ? As such the OpenCSD should be able to decode the packets as they appear in the stream. Correct ?
Depends on what you mean by affect the trace decoding! From the simplest perspective - no - there is no issue as the packets will be decode as seen. THE decoder does not know that BB exists - nor if it is enabled.
However, the reason that a user may engage BB is that the code is in some way self modifying - so that following the code static image and calculating addresses will result in a different path taken.
e.g. imagine an opcode:-
B <address0>
Without BB, this will trace as a single E atom, the decoder will calculate address0 from the opcode in the static image and continue from there as the next trace address.
Now look at the case where this is changed on the fly to
B <address1>
With BB, This will trace to E TGT_ADDR<address1>
The decoder will initially extract address0 from the static image, but the immediately following target address packet will alter the next address traced to address1 This is why we have BB.
So if the user has a reason to engage BB - we should really fail if it is not present - as the outcome of the trace can be affected.
Hi Mike,
Now I'm starting to wonder if it's best to have some kind of non-binary image mode for BB where you'd just get a list of addresses output by perf script and it doesn't attempt to do any lookups.
Not at all sure what you mean by this
Mike
Although I think that would require a change in OpenCSD if it's not aware of the mode?
I also need to go back to my JVM profiling test and see what's going on again. But I think I understand your points a bit more now.
Thanks James
Suzuki
goto out;
} else {
config->cfg |= BIT(ETM4_CFG_BIT_BB);
}
}
- out: return ret; }
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7
On 11/03/2022 15:56, Mike Leach wrote:
Hi James,
On Fri, 11 Mar 2022 at 14:58, James Clark james.clark@arm.com wrote:
On 02/02/2022 20:25, Mike Leach wrote:
Hi James, Suzuki
On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 13/01/2022 09:10, James Clark wrote:
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
- The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
- now take them as general formats and apply on all ETMs.
*/ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr,
&format_attr_branch_broadcast.attr,
Does it make sense to hide the option if the bb is not supported ? I guess it will be tricky as we don't track the common feature set. So, that said...>>>>
NULL,
};
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
/* branch broadcast - enable if selected and supported */
if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
if (!drvdata->trcbb) {
ret = -EINVAL;
Should we fail here ? We could simply ignore this and generate the trace normally. This would work on a big.LITTLE system with one set missing the branch broadcast, while the others support.
Hi Suzuki,
I think if the user really needs branch broadcast because they've modified their binaries then they would want feedback that one of the cores doesn't support it. Otherwise their decode would silently go wrong and they could get stuck.
Perf already opens an event per core, so it wouldn't be much effort to manually modify the command line to only select cores where it is supported.
I think for that scenario the current patch already works that way because the feature is checked on the active core when the event is enabled?
James
Mike,
Does this affect the trace decoding ? As such the OpenCSD should be able to decode the packets as they appear in the stream. Correct ?
Depends on what you mean by affect the trace decoding! From the simplest perspective - no - there is no issue as the packets will be decode as seen. THE decoder does not know that BB exists - nor if it is enabled.
However, the reason that a user may engage BB is that the code is in some way self modifying - so that following the code static image and calculating addresses will result in a different path taken.
e.g. imagine an opcode:-
B <address0>
Without BB, this will trace as a single E atom, the decoder will calculate address0 from the opcode in the static image and continue from there as the next trace address.
Now look at the case where this is changed on the fly to
B <address1>
With BB, This will trace to E TGT_ADDR<address1>
The decoder will initially extract address0 from the static image, but the immediately following target address packet will alter the next address traced to address1 This is why we have BB.
So if the user has a reason to engage BB - we should really fail if it is not present - as the outcome of the trace can be affected.
Hi Mike,
Now I'm starting to wonder if it's best to have some kind of non-binary image mode for BB where you'd just get a list of addresses output by perf script and it doesn't attempt to do any lookups.
Not at all sure what you mean by this
Hi Mike,
I'm not sure if I really understood it either! Now I see what you mean by the decoding issues in Perf and how it doesn't make sense if there is no method to supply modified images.
Are you ok with adding this if I add to the new documentation that there is currently no support in Perf and the behavior will be wrong if the binary was modified? But maybe other users of perf_event_open might find it useful?
Thanks James
Mike
Although I think that would require a change in OpenCSD if it's not aware of the mode?
I also need to go back to my JVM profiling test and see what's going on again. But I think I understand your points a bit more now.
Thanks James
Suzuki
goto out;
} else {
config->cfg |= BIT(ETM4_CFG_BIT_BB);
}
}
- out: return ret; }
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7
On 22/04/2022 11:18, James Clark wrote:
On 11/03/2022 15:56, Mike Leach wrote:
Hi James,
On Fri, 11 Mar 2022 at 14:58, James Clark james.clark@arm.com wrote:
On 02/02/2022 20:25, Mike Leach wrote:
Hi James, Suzuki
On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 13/01/2022 09:10, James Clark wrote:
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 14 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src); * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config'; * now take them as general formats and apply on all ETMs. */ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr,
&format_attr_branch_broadcast.attr,
Does it make sense to hide the option if the bb is not supported ? I guess it will be tricky as we don't track the common feature set. So, that said...>>>>
NULL,
};
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf18128cf5de..04669ecc0efa 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
/* branch broadcast - enable if selected and supported */
if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
if (!drvdata->trcbb) {
ret = -EINVAL;
Should we fail here ? We could simply ignore this and generate the trace normally. This would work on a big.LITTLE system with one set missing the branch broadcast, while the others support.
Hi Suzuki,
I think if the user really needs branch broadcast because they've modified their binaries then they would want feedback that one of the cores doesn't support it. Otherwise their decode would silently go wrong and they could get stuck.
Perf already opens an event per core, so it wouldn't be much effort to manually modify the command line to only select cores where it is supported.
I think for that scenario the current patch already works that way because the feature is checked on the active core when the event is enabled?
Fair enough. We could let the user pin the perf to the CPUs where this is supported. This information is available to the user via sysfs.
So, let us proceed with the changes here.
Cheers Suzuki
Maintain consistency with the other options by failing to open when they aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are requested but not supported by hardware.
The consequence of not doing this is that the user may not be aware that they are not enabling the feature as it is silently disabled.
Signed-off-by: James Clark james.clark@arm.com --- drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 04669ecc0efa..a93c1a5fe045 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev, }
/* return stack - enable if selected and supported */ - if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack) - /* bit[12], Return stack enable bit */ - config->cfg |= BIT(12); - + if (attr->config & BIT(ETM_OPT_RETSTK)) { + if (!drvdata->retstack) { + ret = -EINVAL; + goto out; + } else { + /* bit[12], Return stack enable bit */ + config->cfg |= BIT(12); + } + } /* * Set any selected configuration and preset. *
Reviewed-by: Mike Leach mike.leach@linaro.org
On Thu, 13 Jan 2022 at 09:11, James Clark james.clark@arm.com wrote:
Maintain consistency with the other options by failing to open when they aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are requested but not supported by hardware.
The consequence of not doing this is that the user may not be aware that they are not enabling the feature as it is silently disabled.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 04669ecc0efa..a93c1a5fe045 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev, }
/* return stack - enable if selected and supported */
if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
if (attr->config & BIT(ETM_OPT_RETSTK)) {
if (!drvdata->retstack) {
ret = -EINVAL;
goto out;
} else {
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
}
} /* * Set any selected configuration and preset. *
-- 2.28.0
Hi James
On 13/01/2022 09:10, James Clark wrote:
Maintain consistency with the other options by failing to open when they aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are requested but not supported by hardware.
Looking at this again (with similar comment to the Branch Broadcast), won't it disable using retstack on all CPUs, even when some of them support it ?
i.e., CPU0 - supports retstack, CPU1 - doesn't
A perf run with retstack will fail, as CPU1 doesn't support it (even though we advertise it, unconditionally).
So, if we ignore the failure, this would still allow CPU0 to use the feature and as long as the OpenCSD is able to decode the trace we should ignore the failure ?
I think we may also need to tune the etm4x_enable_hw() to skip updating the TRCCONFIGR with features not supported by the ETM
Suzuki
The consequence of not doing this is that the user may not be aware that they are not enabling the feature as it is silently disabled.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 04669ecc0efa..a93c1a5fe045 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev, } /* return stack - enable if selected and supported */
- if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
- if (attr->config & BIT(ETM_OPT_RETSTK)) {
if (!drvdata->retstack) {
ret = -EINVAL;
goto out;
} else {
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
}
- } /*
- Set any selected configuration and preset.
On 28/01/2022 11:24, Suzuki K Poulose wrote:
Hi James
On 13/01/2022 09:10, James Clark wrote:
Maintain consistency with the other options by failing to open when they aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are requested but not supported by hardware.
Looking at this again (with similar comment to the Branch Broadcast), won't it disable using retstack on all CPUs, even when some of them support it ?
i.e., CPU0 - supports retstack, CPU1 - doesn't
A perf run with retstack will fail, as CPU1 doesn't support it (even though we advertise it, unconditionally).
So, if we ignore the failure, this would still allow CPU0 to use the feature and as long as the OpenCSD is able to decode the trace we should ignore the failure ?
I think we may also need to tune the etm4x_enable_hw() to skip updating the TRCCONFIGR with features not supported by the ETM
Hi Suzuki,
I'm picking up this branch broadcast change again after the haitus.
For this point, do you think it would be worth distinguishing between "no known CPUs that support the feature" vs "not currently running on a CPU that supports it but there are others that do"?
Also would we want to distinguish between per-CPU or per-process events? For the former it actually is possible to fail to open because all of the information is known.
I'm just thinking of the case where someone asks for a load of flags and thinks that they're getting them but get no feedback that they won't. But I understand having some complicated solution like I'm suggesting might be even more surprising to users.
Maybe the cleanest solution is to ask users to supply a config that can work on anywhere the event could possibly be scheduled. It doesn't really make sense to have retstack on a per-process event on big-little and then getting half of one type of data and half of another. It would make more sense to fail to open in that case and they have the choice of either doing per-CPU events or disabling retstacks altogether.
This seems like a similar problem to the issue causing the Coresight self test failure where a certain sink was picked that couldn't be reached and the test failed.
In that case the change we made doesn't quite match up to my suggestion here:
* Per-cpu but an unreachable sink -> fail * Per-process and potentially reachable sink in the future -> pass
Maybe it would have been better to say that the sink always has to be reachable otherwise is the outcome predicatable?
James
Suzuki
The consequence of not doing this is that the user may not be aware that they are not enabling the feature as it is silently disabled.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++---- Â 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 04669ecc0efa..a93c1a5fe045 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev, Â Â Â Â Â } Â Â Â Â Â Â /* return stack - enable if selected and supported */ -Â Â Â if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack) -Â Â Â Â Â Â Â /* bit[12], Return stack enable bit */ -Â Â Â Â Â Â Â config->cfg |= BIT(12);
+Â Â Â if (attr->config & BIT(ETM_OPT_RETSTK)) { +Â Â Â Â Â Â Â if (!drvdata->retstack) { +Â Â Â Â Â Â Â Â Â Â Â ret = -EINVAL; +Â Â Â Â Â Â Â Â Â Â Â goto out; +Â Â Â Â Â Â Â } else { +Â Â Â Â Â Â Â Â Â Â Â /* bit[12], Return stack enable bit */ +Â Â Â Â Â Â Â Â Â Â Â config->cfg |= BIT(12); +Â Â Â Â Â Â Â } +Â Â Â } Â Â Â Â Â /* Â Â Â Â Â Â * Set any selected configuration and preset. Â Â Â Â Â Â *
Hi,
On Fri, 11 Mar 2022 at 14:52, James Clark james.clark@arm.com wrote:
On 28/01/2022 11:24, Suzuki K Poulose wrote:
Hi James
On 13/01/2022 09:10, James Clark wrote:
Maintain consistency with the other options by failing to open when they aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are requested but not supported by hardware.
Looking at this again (with similar comment to the Branch Broadcast), won't it disable using retstack on all CPUs, even when some of them support it ?
i.e., CPU0 - supports retstack, CPU1 - doesn't
A perf run with retstack will fail, as CPU1 doesn't support it (even though we advertise it, unconditionally).
So, if we ignore the failure, this would still allow CPU0 to use the feature and as long as the OpenCSD is able to decode the trace we should ignore the failure ?
I think we may also need to tune the etm4x_enable_hw() to skip updating the TRCCONFIGR with features not supported by the ETM
Hi Suzuki,
I'm picking up this branch broadcast change again after the haitus.
For this point, do you think it would be worth distinguishing between "no known CPUs that support the feature" vs "not currently running on a CPU that supports it but there are others that do"?
Also would we want to distinguish between per-CPU or per-process events? For the former it actually is possible to fail to open because all of the information is known.
I'm just thinking of the case where someone asks for a load of flags and thinks that they're getting them but get no feedback that they won't. But I understand having some complicated solution like I'm suggesting might be even more surprising to users.
Maybe the cleanest solution is to ask users to supply a config that can work on anywhere the event could possibly be scheduled. It doesn't really make sense to have retstack on a per-process event on big-little and then getting half of one type of data and half of another. It would make more sense to fail to open in that case and they have the choice of either doing per-CPU events or disabling retstacks altogether.
return stack has no effect on the decoder output whatsoever. The only effect is to reduce the amount of traced addresses at the input (leaving more space for other trace), so it is irrelevant if CPU0 supports it but CPU1 doesn't.
sequence:
BL r0 (return stack is used only on link instructions) ... RET
will output trace:- ATOM E (BL r0) ... ADDR_ELEM <ret addr> ATOM E (RET)
for no return stack,
ATOM E (BL r0) ... ATOM E (RET)
fior return stack.
In both cases the decoder will push the address after BL r0 onto its return stack.
In the first case the decoder will use the supplied address, in the second will pop the top of its return stack.
The decode output in both cases will be "branched to r0, ran code, returned via link register"
The outcome is identical for the client. So the case for not tracing on a core that does not have return stack if specified is weak.
Perhaps a warning will be sufficient?
Mike
This seems like a similar problem to the issue causing the Coresight self test failure where a certain sink was picked that couldn't be reached and the test failed.
In that case the change we made doesn't quite match up to my suggestion here:
- Per-cpu but an unreachable sink -> fail
- Per-process and potentially reachable sink in the future -> pass
Maybe it would have been better to say that the sink always has to be reachable otherwise is the outcome predicatable?
James
Suzuki
The consequence of not doing this is that the user may not be aware that they are not enabling the feature as it is silently disabled.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 04669ecc0efa..a93c1a5fe045 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev, } /* return stack - enable if selected and supported */
- if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
- if (attr->config & BIT(ETM_OPT_RETSTK)) {
if (!drvdata->retstack) {
ret = -EINVAL;
goto out;
} else {
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
}
- } /*
- Set any selected configuration and preset.
On 11/03/2022 15:53, Mike Leach wrote:
Hi,
On Fri, 11 Mar 2022 at 14:52, James Clark james.clark@arm.com wrote:
On 28/01/2022 11:24, Suzuki K Poulose wrote:
Hi James
On 13/01/2022 09:10, James Clark wrote:
Maintain consistency with the other options by failing to open when they aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are requested but not supported by hardware.
Looking at this again (with similar comment to the Branch Broadcast), won't it disable using retstack on all CPUs, even when some of them support it ?
i.e., CPU0 - supports retstack, CPU1 - doesn't
A perf run with retstack will fail, as CPU1 doesn't support it (even though we advertise it, unconditionally).
So, if we ignore the failure, this would still allow CPU0 to use the feature and as long as the OpenCSD is able to decode the trace we should ignore the failure ?
I think we may also need to tune the etm4x_enable_hw() to skip updating the TRCCONFIGR with features not supported by the ETM
Hi Suzuki,
I'm picking up this branch broadcast change again after the haitus.
For this point, do you think it would be worth distinguishing between "no known CPUs that support the feature" vs "not currently running on a CPU that supports it but there are others that do"?
Also would we want to distinguish between per-CPU or per-process events? For the former it actually is possible to fail to open because all of the information is known.
I'm just thinking of the case where someone asks for a load of flags and thinks that they're getting them but get no feedback that they won't. But I understand having some complicated solution like I'm suggesting might be even more surprising to users.
Maybe the cleanest solution is to ask users to supply a config that can work on anywhere the event could possibly be scheduled. It doesn't really make sense to have retstack on a per-process event on big-little and then getting half of one type of data and half of another. It would make more sense to fail to open in that case and they have the choice of either doing per-CPU events or disabling retstacks altogether.
return stack has no effect on the decoder output whatsoever. The only effect is to reduce the amount of traced addresses at the input (leaving more space for other trace), so it is irrelevant if CPU0 supports it but CPU1 doesn't.
sequence:
BL r0 (return stack is used only on link instructions) ... RET
will output trace:- ATOM E (BL r0) ... ADDR_ELEM <ret addr> ATOM E (RET)
for no return stack,
ATOM E (BL r0) ... ATOM E (RET)
fior return stack.
In both cases the decoder will push the address after BL r0 onto its return stack.
In the first case the decoder will use the supplied address, in the second will pop the top of its return stack.
The decode output in both cases will be "branched to r0, ran code, returned via link register"
The outcome is identical for the client. So the case for not tracing on a core that does not have return stack if specified is weak.
Perhaps a warning will be sufficient?
Mike
This seems like a similar problem to the issue causing the Coresight self test failure where a certain sink was picked that couldn't be reached and the test failed.
In that case the change we made doesn't quite match up to my suggestion here:
- Per-cpu but an unreachable sink -> fail
- Per-process and potentially reachable sink in the future -> pass
Maybe it would have been better to say that the sink always has to be reachable otherwise is the outcome predicatable?
Hi Mike,
If it has no effect on the output then it makes sense to me to just drop this patch. I think even a warning would not add much and as far as I know they are discouraged.
James
James
Suzuki
The consequence of not doing this is that the user may not be aware that they are not enabling the feature as it is silently disabled.
Signed-off-by: James Clark james.clark@arm.com
drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 04669ecc0efa..a93c1a5fe045 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev, } /* return stack - enable if selected and supported */
- if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
- if (attr->config & BIT(ETM_OPT_RETSTK)) {
if (!drvdata->retstack) {
ret = -EINVAL;
goto out;
} else {
/* bit[12], Return stack enable bit */
config->cfg |= BIT(12);
}
- } /*
- Set any selected configuration and preset.
Now that a config flag for branch broadcast has been added, take it into account when trying to deduce what the driver would have programmed the TRCCONFIGR register to.
Reviewed-by: Leo Yan leo.yan@linaro.org Signed-off-by: James Clark james.clark@arm.com --- tools/include/linux/coresight-pmu.h | 2 ++ tools/perf/arch/arm/util/cs-etm.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@ * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and * directly use below macros as config bits. */ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7 diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 293a23bf8be3..c7ef4e9b4a3a 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) if (config_opts & BIT(ETM_OPT_CTXTID2)) config |= BIT(ETM4_CFG_BIT_VMID) | BIT(ETM4_CFG_BIT_VMID_OPT); + if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST)) + config |= BIT(ETM4_CFG_BIT_BB); + return config; }
On Thu, 13 Jan 2022 at 09:11, James Clark james.clark@arm.com wrote:
Now that a config flag for branch broadcast has been added, take it into account when trying to deduce what the driver would have programmed the TRCCONFIGR register to.
Reviewed-by: Leo Yan leo.yan@linaro.org Signed-off-by: James Clark james.clark@arm.com
tools/include/linux/coresight-pmu.h | 2 ++ tools/perf/arch/arm/util/cs-etm.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7 diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 293a23bf8be3..c7ef4e9b4a3a 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) if (config_opts & BIT(ETM_OPT_CTXTID2)) config |= BIT(ETM4_CFG_BIT_VMID) | BIT(ETM4_CFG_BIT_VMID_OPT);
if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST))
config |= BIT(ETM4_CFG_BIT_BB);
return config;
}
-- 2.28.0
Reviewed-by: Mike Leach mike.leach@linaro.org
On 13/01/2022 09:10, James Clark wrote:
Now that a config flag for branch broadcast has been added, take it into account when trying to deduce what the driver would have programmed the TRCCONFIGR register to.
Reviewed-by: Leo Yan leo.yan@linaro.org Signed-off-by: James Clark james.clark@arm.com
tools/include/linux/coresight-pmu.h | 2 ++ tools/perf/arch/arm/util/cs-etm.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29 /* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7 diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 293a23bf8be3..c7ef4e9b4a3a 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) if (config_opts & BIT(ETM_OPT_CTXTID2)) config |= BIT(ETM4_CFG_BIT_VMID) | BIT(ETM4_CFG_BIT_VMID_OPT);
- if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST))
config |= BIT(ETM4_CFG_BIT_BB);
- return config;
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
}
Em Fri, Jan 28, 2022 at 11:25:24AM +0000, Suzuki K Poulose escreveu:
On 13/01/2022 09:10, James Clark wrote:
Now that a config flag for branch broadcast has been added, take it into account when trying to deduce what the driver would have programmed the TRCCONFIGR register to.
Thanks, applied this one, the tools/ part.
- Arnaldo
Reviewed-by: Leo Yan leo.yan@linaro.org Signed-off-by: James Clark james.clark@arm.com
tools/include/linux/coresight-pmu.h | 2 ++ tools/perf/arch/arm/util/cs-etm.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@
- ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
- directly use below macros as config bits.
*/ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29 /* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7 diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 293a23bf8be3..c7ef4e9b4a3a 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) if (config_opts & BIT(ETM_OPT_CTXTID2)) config |= BIT(ETM4_CFG_BIT_VMID) | BIT(ETM4_CFG_BIT_VMID_OPT);
- if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST))
config |= BIT(ETM4_CFG_BIT_BB);
- return config;
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
}
This is to allow them to be referenced in a later commit. There was also a mistake where sysFS was introduced as section 2, but numbered as section 1. And vice versa for 'Using perf framework'. This can't happen with unnumbered sections.
Signed-off-by: James Clark james.clark@arm.com --- Documentation/trace/coresight/coresight.rst | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index a15571d96cc8..db66ff45ff4c 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -339,7 +339,8 @@ Preference is given to the former as using the sysFS interface requires a deep understanding of the Coresight HW. The following sections provide details on using both methods.
-1) Using the sysFS interface: +Using the sysFS interface +~~~~~~~~~~~~~~~~~~~~~~~~~
Before trace collection can start, a coresight sink needs to be identified. There is no limit on the amount of sinks (nor sources) that can be enabled at @@ -446,7 +447,8 @@ wealth of possibilities that coresight provides. Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} Timestamp Timestamp: 17107041535
-2) Using perf framework: +Using perf framework +~~~~~~~~~~~~~~~~~~~~
Coresight tracers are represented using the Perf framework's Performance Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of @@ -495,7 +497,11 @@ More information on the above and other example on how to use Coresight with the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub repository [#third]_.
-2.1) AutoFDO analysis using the perf tools: +Advanced perf framework usage +----------------------------- + +AutoFDO analysis using the perf tools +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
perf can be used to record and analyze trace of programs.
@@ -513,7 +519,8 @@ The --itrace option controls the type and frequency of synthesized events Note that only 64-bit programs are currently supported - further work is required to support instruction decode of 32-bit Arm programs.
-2.2) Tracing PID +Tracing PID +~~~~~~~~~~~
The kernel can be built to write the PID value into the PE ContextID registers. For a kernel running at EL1, the PID is stored in CONTEXTIDR_EL1. A PE may @@ -547,7 +554,7 @@ wants to trace PIDs for both host and guest, the two configs "contextid1" and
Generating coverage files for Feedback Directed Optimization: AutoFDO ---------------------------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'perf inject' accepts the --itrace option in which case tracing data is removed and replaced with the synthesized events. e.g.
Reviewed-by: Mike Leach mike.leach@linaro.org
On Thu, 13 Jan 2022 at 09:11, James Clark james.clark@arm.com wrote:
This is to allow them to be referenced in a later commit. There was also a mistake where sysFS was introduced as section 2, but numbered as section 1. And vice versa for 'Using perf framework'. This can't happen with unnumbered sections.
Signed-off-by: James Clark james.clark@arm.com
Documentation/trace/coresight/coresight.rst | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index a15571d96cc8..db66ff45ff4c 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -339,7 +339,8 @@ Preference is given to the former as using the sysFS interface requires a deep understanding of the Coresight HW. The following sections provide details on using both methods.
-1) Using the sysFS interface: +Using the sysFS interface +~~~~~~~~~~~~~~~~~~~~~~~~~
Before trace collection can start, a coresight sink needs to be identified. There is no limit on the amount of sinks (nor sources) that can be enabled at @@ -446,7 +447,8 @@ wealth of possibilities that coresight provides. Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} Timestamp Timestamp: 17107041535
-2) Using perf framework: +Using perf framework +~~~~~~~~~~~~~~~~~~~~
Coresight tracers are represented using the Perf framework's Performance Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of @@ -495,7 +497,11 @@ More information on the above and other example on how to use Coresight with the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub repository [#third]_.
-2.1) AutoFDO analysis using the perf tools: +Advanced perf framework usage +-----------------------------
+AutoFDO analysis using the perf tools +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
perf can be used to record and analyze trace of programs.
@@ -513,7 +519,8 @@ The --itrace option controls the type and frequency of synthesized events Note that only 64-bit programs are currently supported - further work is required to support instruction decode of 32-bit Arm programs.
-2.2) Tracing PID +Tracing PID +~~~~~~~~~~~
The kernel can be built to write the PID value into the PE ContextID registers. For a kernel running at EL1, the PID is stored in CONTEXTIDR_EL1. A PE may @@ -547,7 +554,7 @@ wants to trace PIDs for both host and guest, the two configs "contextid1" and
Generating coverage files for Feedback Directed Optimization: AutoFDO
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'perf inject' accepts the --itrace option in which case tracing data is removed and replaced with the synthesized events. e.g. -- 2.28.0
On 13/01/2022 09:10, James Clark wrote:
This is to allow them to be referenced in a later commit. There was also a mistake where sysFS was introduced as section 2, but numbered as section 1. And vice versa for 'Using perf framework'. This can't happen with unnumbered sections.
Signed-off-by: James Clark james.clark@arm.com
Looks good to me
In order to document the newly added branch_broadcast option, create a table that links all of the config option formats to any existing docs. That way when the branch broadcast docs are expanded they are accessible from both places.
Signed-off-by: James Clark james.clark@arm.com --- .../coresight/coresight-etm4x-reference.rst | 4 ++ Documentation/trace/coresight/coresight.rst | 39 +++++++++++++++++++ 2 files changed, 43 insertions(+)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index d25dfe86af9b..0439b4006227 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -650,6 +650,7 @@ Bit assignments shown below:- parameter is set this value is applied to the currently indexed address range.
+.. _coresight-branch-broadcast:
**bit (4):** ETM_MODE_BB @@ -657,6 +658,7 @@ Bit assignments shown below:- **description:** Set to enable branch broadcast if supported in hardware [IDR0].
+.. _coresight-cycle-accurate:
**bit (5):** ETMv4_MODE_CYCACC @@ -678,6 +680,7 @@ Bit assignments shown below:- **description:** Set to enable virtual machine ID tracing if supported [IDR2].
+.. _coresight-timestamp:
**bit (11):** ETMv4_MODE_TIMESTAMP @@ -685,6 +688,7 @@ Bit assignments shown below:- **description:** Set to enable timestamp generation if supported [IDR0].
+.. _coresight-return-stack:
**bit (12):** ETM_MODE_RETURNSTACK diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index db66ff45ff4c..803a224dbb0e 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -585,6 +585,45 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto Bubble sorting array of 30000 elements 5806 ms
+Config option formats +~~~~~~~~~~~~~~~~~~~~~ + +The following strings can be provided between // on the perf command line to enable various options. +They are also listed in the folder /sys/bus/event_source/devices/cs_etm/format/ + +.. list-table:: + :header-rows: 1 + + * - Option + - Description + * - branch_broadcast + - Session local version of the system wide setting: + :ref:`ETM_MODE_BB <coresight-branch-broadcast>` + * - contextid + - See `Tracing PID`_ + * - contextid1 + - See `Tracing PID`_ + * - contextid2 + - See `Tracing PID`_ + * - configid + - Selection for a custom configuration. This is an implementation detail and not used directly, + see :ref:`trace/coresight/coresight-config:Using Configurations in perf` + * - preset + - Override for parameters in a custom configuration, see + :ref:`trace/coresight/coresight-config:Using Configurations in perf` + * - sinkid + - Hashed version of the string to select a sink, automatically set when using the @ notation. + This is an internal implementation detail and is not used directly, see `Using perf + framework`_. + * - cycacc + - Session local version of the system wide setting: :ref:`ETMv4_MODE_CYCACC + <coresight-cycle-accurate>` + * - retstack + - Session local version of the system wide setting: :ref:`ETM_MODE_RETURNSTACK + <coresight-return-stack>` + * - timestamp + - Session local version of the system wide setting: :ref:`ETMv4_MODE_TIMESTAMP + <coresight-timestamp>`
How to use the STM module -------------------------
Reviewed-by: Mike Leach mike.leach@linaro.org
On Thu, 13 Jan 2022 at 09:11, James Clark james.clark@arm.com wrote:
In order to document the newly added branch_broadcast option, create a table that links all of the config option formats to any existing docs. That way when the branch broadcast docs are expanded they are accessible from both places.
Signed-off-by: James Clark james.clark@arm.com
.../coresight/coresight-etm4x-reference.rst | 4 ++ Documentation/trace/coresight/coresight.rst | 39 +++++++++++++++++++ 2 files changed, 43 insertions(+)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index d25dfe86af9b..0439b4006227 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -650,6 +650,7 @@ Bit assignments shown below:- parameter is set this value is applied to the currently indexed address range.
+.. _coresight-branch-broadcast:
**bit (4):** ETM_MODE_BB @@ -657,6 +658,7 @@ Bit assignments shown below:- **description:** Set to enable branch broadcast if supported in hardware [IDR0].
+.. _coresight-cycle-accurate:
**bit (5):** ETMv4_MODE_CYCACC @@ -678,6 +680,7 @@ Bit assignments shown below:- **description:** Set to enable virtual machine ID tracing if supported [IDR2].
+.. _coresight-timestamp:
**bit (11):** ETMv4_MODE_TIMESTAMP @@ -685,6 +688,7 @@ Bit assignments shown below:- **description:** Set to enable timestamp generation if supported [IDR0].
+.. _coresight-return-stack:
**bit (12):** ETM_MODE_RETURNSTACK diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index db66ff45ff4c..803a224dbb0e 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -585,6 +585,45 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto Bubble sorting array of 30000 elements 5806 ms
+Config option formats +~~~~~~~~~~~~~~~~~~~~~
+The following strings can be provided between // on the perf command line to enable various options. +They are also listed in the folder /sys/bus/event_source/devices/cs_etm/format/
+.. list-table::
- :header-rows: 1
- Option
- Description
- branch_broadcast
- Session local version of the system wide setting:
:ref:`ETM_MODE_BB <coresight-branch-broadcast>`
- contextid
- See `Tracing PID`_
- contextid1
- See `Tracing PID`_
- contextid2
- See `Tracing PID`_
- configid
- Selection for a custom configuration. This is an implementation detail and not used directly,
see :ref:`trace/coresight/coresight-config:Using Configurations in perf`
- preset
- Override for parameters in a custom configuration, see
:ref:`trace/coresight/coresight-config:Using Configurations in perf`
- sinkid
- Hashed version of the string to select a sink, automatically set when using the @ notation.
This is an internal implementation detail and is not used directly, see `Using perf
framework`_.
- cycacc
- Session local version of the system wide setting: :ref:`ETMv4_MODE_CYCACC
<coresight-cycle-accurate>`
- retstack
- Session local version of the system wide setting: :ref:`ETM_MODE_RETURNSTACK
<coresight-return-stack>`
- timestamp
- Session local version of the system wide setting: :ref:`ETMv4_MODE_TIMESTAMP
<coresight-timestamp>`
How to use the STM module
-- 2.28.0
Now that there is a way of enabling branch broadcast via perf, mention the possible use cases and known limitations.
Signed-off-by: James Clark james.clark@arm.com --- .../trace/coresight/coresight-etm4x-reference.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index 0439b4006227..ec336575919c 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -656,7 +656,15 @@ Bit assignments shown below:- ETM_MODE_BB
**description:** - Set to enable branch broadcast if supported in hardware [IDR0]. + Set to enable branch broadcast if supported in hardware [IDR0]. The primary use for this feature + is when code is patched dynamically at run time and the full program flow may not be able to be + reconstructed using only conditional branches. + + Choosing this option will result in a significant increase in the amount of trace generated - + possible danger of overflows, or fewer instructions covered. Note, that this option also + overrides any setting of :ref:`ETM_MODE_RETURNSTACK <coresight-return-stack>`, so where a branch + broadcast range overlaps a return stack range, return stacks will not be available for that + range.
.. _coresight-cycle-accurate:
On Thu, 13 Jan 2022 at 09:11, James Clark james.clark@arm.com wrote:
Now that there is a way of enabling branch broadcast via perf, mention the possible use cases and known limitations.
Signed-off-by: James Clark james.clark@arm.com
.../trace/coresight/coresight-etm4x-reference.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index 0439b4006227..ec336575919c 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -656,7 +656,15 @@ Bit assignments shown below:- ETM_MODE_BB
**description:**
- Set to enable branch broadcast if supported in hardware [IDR0].
- Set to enable branch broadcast if supported in hardware [IDR0]. The primary use for this feature
- is when code is patched dynamically at run time and the full program flow may not be able to be
- reconstructed using only conditional branches.
- Choosing this option will result in a significant increase in the amount of trace generated -
- possible danger of overflows, or fewer instructions covered. Note, that this option also
- overrides any setting of :ref:`ETM_MODE_RETURNSTACK <coresight-return-stack>`, so where a branch
- broadcast range overlaps a return stack range, return stacks will not be available for that
- range.
.. _coresight-cycle-accurate:
-- 2.28.0
Reviewed-by: Mike Leach mike.leach@linaro.org
Hi James,
I am seriously back-logged in my patch reviews and as such will not be able to get to your work in the usual 14 days. At this time and if everything goes well, I should be able to start reviewing this patchset during the week of February 7th.
Thanks, Mathieu
On Thu, 13 Jan 2022 at 02:11, James Clark james.clark@arm.com wrote:
This allows enabling branch broadcast for Perf hosted sessions (the option is currently only available for the sysfs interface). Hopefully this could be useful for testing the decode in perf, for example does a determinisitic run with branch broadcast enabled look the same as with it disabled? It could also be used for scenarios like OpenJ9's support for JIT code:
java -Xjit:perfTool hello.java
Currently this is not working and you get the usual errors of a missing DSO, but branch broadcast would have to be enabled anyway before working through this next issue:
CS ETM Trace: Debug data not found for address 0xffff7b94b058 in /tmp/perf-29360.map
Address range support in Perf for branch broadcast has also not been added here, but could be added later.
The documentation has been refactored slightly to allow updates to be made that link the Perf format arguments with the existing documentation.
For Suzuki's comment, I will do it as a separate change that converts all the other hard coded values to a more consistent sysreg.h style:
nit: While at this, please could you change the hard coded value to ETM4_CFG_BIT_RETSTK ?
Changes since v1:
- Added Leo's reviewed by on patch 3
- Fix bracket styling
- Add documentation
Applies on top of coresight/next efa56eddf5d5c. But this docs fix is also required to get the links to work: https://marc.info/?l=linux-doc&m=164139331923986&w=2
Also available at: https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-branch-broadcast-v2
James Clark (6): coresight: Add config flag to enable branch broadcast coresight: Fail to open with return stacks if they are unavailable perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast Documentation: coresight: Turn numbered subsections into real subsections Documentation: coresight: Link config options to existing documentation Documentation: coresight: Expand branch broadcast documentation
.../coresight/coresight-etm4x-reference.rst | 14 ++++- Documentation/trace/coresight/coresight.rst | 56 +++++++++++++++++-- .../hwtracing/coresight/coresight-etm-perf.c | 2 + .../coresight/coresight-etm4x-core.c | 23 ++++++-- include/linux/coresight-pmu.h | 2 + tools/include/linux/coresight-pmu.h | 2 + tools/perf/arch/arm/util/cs-etm.c | 3 + 7 files changed, 92 insertions(+), 10 deletions(-)
-- 2.28.0