Apologies for the delay.
Changes since v2:
* Pick up some of Mike's review tags * Add a comment explaining rationale for not opening the event when BB isn't supported * Extend docs to say that Perf doesn't support decode when binaries are modified * Drop Perf side patches that were already merged
Thanks James
James Clark (4): coresight: Add config flag to enable branch broadcast Documentation: coresight: Turn numbered subsections into real subsections Documentation: coresight: Link config options to existing documentation Documentation: coresight: Expand branch broadcast documentation
.../coresight/coresight-etm4x-reference.rst | 17 +++++- Documentation/trace/coresight/coresight.rst | 56 +++++++++++++++++-- .../hwtracing/coresight/coresight-etm-perf.c | 2 + .../coresight/coresight-etm4x-core.c | 14 +++++ include/linux/coresight-pmu.h | 2 + 5 files changed, 85 insertions(+), 6 deletions(-)
When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed.
Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed.
Signed-off-by: James Clark james.clark@arm.com Reviewed-by: Mike Leach mike.leach@linaro.org --- drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x-core.c | 14 ++++++++++++++ include/linux/coresight-pmu.h | 2 ++ 3 files changed, 18 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index c039b6ae206f..43bbd5dc3d3b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src); * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config'; * now take them as general formats and apply on all ETMs. */ +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST)); PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */ PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID)); @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = { &format_attr_sinkid.attr, &format_attr_preset.attr, &format_attr_configid.attr, + &format_attr_branch_broadcast.attr, NULL, };
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 87299e99dabb..cf249ecad5a5 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -696,6 +696,20 @@ static int etm4_parse_event_config(struct coresight_device *csdev, ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset); }
+ /* branch broadcast - enable if selected and supported */ + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) { + if (!drvdata->trcbb) { + /* + * Missing BB support could cause silent decode errors + * so fail to open if it's not supported. + */ + ret = -EINVAL; + goto out; + } else { + config->cfg |= BIT(ETM4_CFG_BIT_BB); + } + } + out: return ret; } diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@ * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and * directly use below macros as config bits. */ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7
This is to allow them to be referenced in a later commit. There was also a mistake where sysFS was introduced as section 2, but numbered as section 1. And vice versa for 'Using perf framework'. This can't happen with unnumbered sections.
Signed-off-by: James Clark james.clark@arm.com Reviewed-by: Mike Leach mike.leach@linaro.org --- Documentation/trace/coresight/coresight.rst | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index a15571d96cc8..db66ff45ff4c 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -339,7 +339,8 @@ Preference is given to the former as using the sysFS interface requires a deep understanding of the Coresight HW. The following sections provide details on using both methods.
-1) Using the sysFS interface: +Using the sysFS interface +~~~~~~~~~~~~~~~~~~~~~~~~~
Before trace collection can start, a coresight sink needs to be identified. There is no limit on the amount of sinks (nor sources) that can be enabled at @@ -446,7 +447,8 @@ wealth of possibilities that coresight provides. Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} Timestamp Timestamp: 17107041535
-2) Using perf framework: +Using perf framework +~~~~~~~~~~~~~~~~~~~~
Coresight tracers are represented using the Perf framework's Performance Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of @@ -495,7 +497,11 @@ More information on the above and other example on how to use Coresight with the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub repository [#third]_.
-2.1) AutoFDO analysis using the perf tools: +Advanced perf framework usage +----------------------------- + +AutoFDO analysis using the perf tools +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
perf can be used to record and analyze trace of programs.
@@ -513,7 +519,8 @@ The --itrace option controls the type and frequency of synthesized events Note that only 64-bit programs are currently supported - further work is required to support instruction decode of 32-bit Arm programs.
-2.2) Tracing PID +Tracing PID +~~~~~~~~~~~
The kernel can be built to write the PID value into the PE ContextID registers. For a kernel running at EL1, the PID is stored in CONTEXTIDR_EL1. A PE may @@ -547,7 +554,7 @@ wants to trace PIDs for both host and guest, the two configs "contextid1" and
Generating coverage files for Feedback Directed Optimization: AutoFDO ---------------------------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'perf inject' accepts the --itrace option in which case tracing data is removed and replaced with the synthesized events. e.g.
In order to document the newly added branch_broadcast option, create a table that links all of the config option formats to any existing docs. That way when the branch broadcast docs are expanded they are accessible from both places.
Signed-off-by: James Clark james.clark@arm.com Reviewed-by: Mike Leach mike.leach@linaro.org --- .../coresight/coresight-etm4x-reference.rst | 4 ++ Documentation/trace/coresight/coresight.rst | 39 +++++++++++++++++++ 2 files changed, 43 insertions(+)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index d25dfe86af9b..0439b4006227 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -650,6 +650,7 @@ Bit assignments shown below:- parameter is set this value is applied to the currently indexed address range.
+.. _coresight-branch-broadcast:
**bit (4):** ETM_MODE_BB @@ -657,6 +658,7 @@ Bit assignments shown below:- **description:** Set to enable branch broadcast if supported in hardware [IDR0].
+.. _coresight-cycle-accurate:
**bit (5):** ETMv4_MODE_CYCACC @@ -678,6 +680,7 @@ Bit assignments shown below:- **description:** Set to enable virtual machine ID tracing if supported [IDR2].
+.. _coresight-timestamp:
**bit (11):** ETMv4_MODE_TIMESTAMP @@ -685,6 +688,7 @@ Bit assignments shown below:- **description:** Set to enable timestamp generation if supported [IDR0].
+.. _coresight-return-stack:
**bit (12):** ETM_MODE_RETURNSTACK diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index db66ff45ff4c..803a224dbb0e 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -585,6 +585,45 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto Bubble sorting array of 30000 elements 5806 ms
+Config option formats +~~~~~~~~~~~~~~~~~~~~~ + +The following strings can be provided between // on the perf command line to enable various options. +They are also listed in the folder /sys/bus/event_source/devices/cs_etm/format/ + +.. list-table:: + :header-rows: 1 + + * - Option + - Description + * - branch_broadcast + - Session local version of the system wide setting: + :ref:`ETM_MODE_BB <coresight-branch-broadcast>` + * - contextid + - See `Tracing PID`_ + * - contextid1 + - See `Tracing PID`_ + * - contextid2 + - See `Tracing PID`_ + * - configid + - Selection for a custom configuration. This is an implementation detail and not used directly, + see :ref:`trace/coresight/coresight-config:Using Configurations in perf` + * - preset + - Override for parameters in a custom configuration, see + :ref:`trace/coresight/coresight-config:Using Configurations in perf` + * - sinkid + - Hashed version of the string to select a sink, automatically set when using the @ notation. + This is an internal implementation detail and is not used directly, see `Using perf + framework`_. + * - cycacc + - Session local version of the system wide setting: :ref:`ETMv4_MODE_CYCACC + <coresight-cycle-accurate>` + * - retstack + - Session local version of the system wide setting: :ref:`ETM_MODE_RETURNSTACK + <coresight-return-stack>` + * - timestamp + - Session local version of the system wide setting: :ref:`ETMv4_MODE_TIMESTAMP + <coresight-timestamp>`
How to use the STM module -------------------------
Now that there is a way of enabling branch broadcast via perf, mention the possible use cases and known limitations.
Signed-off-by: James Clark james.clark@arm.com --- .../trace/coresight/coresight-etm4x-reference.rst | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index 0439b4006227..fb7578fd9372 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -656,7 +656,18 @@ Bit assignments shown below:- ETM_MODE_BB
**description:** - Set to enable branch broadcast if supported in hardware [IDR0]. + Set to enable branch broadcast if supported in hardware [IDR0]. The primary use for this feature + is when code is patched dynamically at run time and the full program flow may not be able to be + reconstructed using only conditional branches. + + There is currently no support in Perf for supplying modified binaries to the decoder, so this + feature is only inteded to be used for debugging purposes or with a 3rd party tool. + + Choosing this option will result in a significant increase in the amount of trace generated - + possible danger of overflows, or fewer instructions covered. Note, that this option also + overrides any setting of :ref:`ETM_MODE_RETURNSTACK <coresight-return-stack>`, so where a branch + broadcast range overlaps a return stack range, return stacks will not be available for that + range.
.. _coresight-cycle-accurate:
On Wed, 11 May 2022 at 15:46, James Clark james.clark@arm.com wrote:
Now that there is a way of enabling branch broadcast via perf, mention the possible use cases and known limitations.
Signed-off-by: James Clark james.clark@arm.com
.../trace/coresight/coresight-etm4x-reference.rst | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index 0439b4006227..fb7578fd9372 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -656,7 +656,18 @@ Bit assignments shown below:- ETM_MODE_BB
**description:**
- Set to enable branch broadcast if supported in hardware [IDR0].
- Set to enable branch broadcast if supported in hardware [IDR0]. The primary use for this feature
- is when code is patched dynamically at run time and the full program flow may not be able to be
- reconstructed using only conditional branches.
- There is currently no support in Perf for supplying modified binaries to the decoder, so this
- feature is only inteded to be used for debugging purposes or with a 3rd party tool.
- Choosing this option will result in a significant increase in the amount of trace generated -
- possible danger of overflows, or fewer instructions covered. Note, that this option also
- overrides any setting of :ref:`ETM_MODE_RETURNSTACK <coresight-return-stack>`, so where a branch
- broadcast range overlaps a return stack range, return stacks will not be available for that
- range.
.. _coresight-cycle-accurate:
-- 2.28.0
Reviewed-by: Mike Leachmike.leach@linaro.org
Hi James,
On 11/05/2022 15:45, James Clark wrote:
Apologies for the delay.
Changes since v2:
- Pick up some of Mike's review tags
- Add a comment explaining rationale for not opening the event when BB isn't supported
- Extend docs to say that Perf doesn't support decode when binaries are modified
- Drop Perf side patches that were already merged
Thanks James
James Clark (4): coresight: Add config flag to enable branch broadcast Documentation: coresight: Turn numbered subsections into real subsections Documentation: coresight: Link config options to existing documentation Documentation: coresight: Expand branch broadcast documentation
The series looks good to me. Apologies, this missed the 5.19 window. I will queue this for 5.20 at -rc1.
Suzuki
.../coresight/coresight-etm4x-reference.rst | 17 +++++- Documentation/trace/coresight/coresight.rst | 56 +++++++++++++++++-- .../hwtracing/coresight/coresight-etm-perf.c | 2 + .../coresight/coresight-etm4x-core.c | 14 +++++ include/linux/coresight-pmu.h | 2 + 5 files changed, 85 insertions(+), 6 deletions(-)