Change applies to perf/core (45237f9898fc)
Changes since v6:
* Fix for snapshot mode where buffers are wrapped. This fix was done by clamping the aux record
size to the size of the buffer (see comment).
* Added an extra debugging printout.
* Typo/formatting fixes.
* Add the change for --dump-raw-trace as a second commit. I planned to do this later, but have now
finished it so I'll submit it at the same time.
* Did some more thorough testing around the different snapshot scenarios.
Decoding snapshot files with duplicate data is improved by this patchset because of the reason
mentioned at the end of the testing section. Coincidentally, the same issue is also fixed in
"[PATCH v1 0/3] coresight: Fix for snapshot mode" but by not saving duplicates, rather than not
decoding them.
James Clark (2):
perf cs-etm: Split Coresight decode by aux records
perf cs-etm: Split --dump-raw-trace by AUX records
tools/perf/util/cs-etm.c | 188 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 185 insertions(+), 3 deletions(-)
--
2.28.0
This patchset represents the second phase of CoreSight configuration
management.
1) API updated to allow dynamic load and unload of configurations and
features. Dependency management between loaded sets is added.
2) New configuration and feature sets can be added using a loadable module.
An example in /samples/coresight is provided to demonstrate this.
3) Resource management API is added. This allows the system to ensure that
loaded configurations and features are only loaded onto devices that can
support them.
Further - it ensures that configurations with multiple features cannot over
allocate resources.
4) configfs can be used to activate a configuration which will then be used
when controlling tracing using sysfs.
5) Resource management is added to ETMv4 configurations. This allows current
and future features and configurations to be defined in terms of resources
used as well as registers to be programmed.
Defining features in this way allows the resource management to operate
correctly.
The perf event parsing is also adjusted to allow the ETM resources requested
on the command line (e.g. address filters, etc) to be correctly handled
using resoruce management alongside the complex configurations such as
autofdo.
Applies to coresight/next - which is 5.13-rc1 + initial Coresight configuration
patchset.
To follow in future revisions / sets:-
a) load of additional config and features by configfs
b) ECT and CTI and other Coresight components support for configuration and
features.
Mike Leach (8):
coresight: syscfg: Update API to allow dynamic load and unload
coresight: syscfg: Update load API for config loadable modules
coresight: syscfg: Example CoreSight configuration loadable module
coresight: configfs: Allow configfs to activate configuration.
coresight: syscfg: Add API to check and validate device resources.
coresight: etm4x: syscfg: Add resource management to etm4x.
coresight: etm4x: Update perf event resource handling.
coresight: etm4x: Update configuration example.
MAINTAINERS | 1 +
.../hwtracing/coresight/coresight-cfg-afdo.c | 38 +-
.../coresight/coresight-cfg-preload.c | 9 +-
.../hwtracing/coresight/coresight-config.c | 71 ++-
.../hwtracing/coresight/coresight-config.h | 45 +-
.../hwtracing/coresight/coresight-etm4x-cfg.c | 533 ++++++++++++++++++
.../hwtracing/coresight/coresight-etm4x-cfg.h | 196 ++++++-
.../coresight/coresight-etm4x-core.c | 250 +++-----
.../coresight/coresight-syscfg-configfs.c | 87 +++
.../coresight/coresight-syscfg-configfs.h | 4 +
.../hwtracing/coresight/coresight-syscfg.c | 390 +++++++++++--
.../hwtracing/coresight/coresight-syscfg.h | 38 +-
include/linux/coresight.h | 2 +
samples/Kconfig | 9 +
samples/Makefile | 1 +
samples/coresight/Makefile | 4 +
samples/coresight/coresight-cfg-sample.c | 73 +++
17 files changed, 1511 insertions(+), 240 deletions(-)
create mode 100644 samples/coresight/Makefile
create mode 100644 samples/coresight/coresight-cfg-sample.c
--
2.17.1
Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
This can cause the "data has no samples!" error if the first
PERF_RECORD_AUX record comes before the first (or any relevant)
PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
to access the binary data.
This change pushes the start of non-timeless decoding to the very end of
parsing the file. The PERF_RECORD_EXIT event can't be used because it
might not exist in system-wide or snapshot modes.
I have not been able to find the exact cause for the events to be
intermittently in the wrong order in the basic scenario:
perf record -e cs_etm/@tmc_etr0/u top
But it can be made to happen every time with the --delay option. This is
because "enable_on_exec" is disabled, which causes tracing to start
before the process to be launched is exec'd. For example:
perf record -e cs_etm/@tmc_etr0/u --delay=1 top
perf report -D | grep 'AUX\|MAP'
0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
Another scenario in which decoding from the first aux record fails is a
workload that forks. Although the aux record comes after 'bash', it
comes before 'top', which is what we are interested in. For example:
perf record -e cs_etm/@tmc_etr0/u -- bash -c top
perf report -D | grep 'AUX\|MAP'
4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
...
0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
A third scenario is when the majority of time is spent in a shared
library that is not loaded at startup. For example a dynamically loaded
plugin.
Testing
=======
Testing was done by checking if any samples that are present in the
old output are missing from the new output. Timestamps must be
stripped out with awk because now they are set to the last AUX sample,
rather than the first:
./perf script $4 | awk '!($4="")' > new.script
./perf-default script $4 | awk '!($4="")' > default.script
comm -13 <(sort -u new.script) <(sort -u default.script)
Testing showed that the new output is a superset of the old. When lines
appear in the comm output, it is not because they are missing but
because [unknown] is now resolved to sensible locations. For example
last putp branch here now resolves to libtinfo, so it's not missing
from the output, but is actually improved:
Old:
top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
New:
top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
In the following two modes, decoding now works and the "data has no
samples!" error is not displayed any more:
perf record -e cs_etm/@tmc_etr0/u -- bash -c top
perf record -e cs_etm/@tmc_etr0/u --delay=1 top
In snapshot mode, there is also an improvement to decoding. Previously
samples for the 'kill' process that was used to send SIGUSR2 were
completely missing, because the process hadn't started yet. But now
there are additional samples present:
perf record -e cs_etm/@tmc_etr0/u --snapshot -a
perf script
stress 19380 [003] 161627.938153: 1000000 instructions:uH: aaaabb612fb4 [unknown] (/usr/bin/stress)
kill 19644 [000] 161627.938153: 1000000 instructions:uH: ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
stress 19380 [003] 161627.938153: 1000000 instructions:uH: ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
Also tested was the round trip of 'perf inject' followed by 'perf
report' which has the same differences and improvements.
Signed-off-by: James Clark <james.clark(a)arm.com>
---
tools/perf/util/cs-etm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 57aea2c7fc77..ceed0b038796 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
return err;
}
+ /*
+ * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
+ * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
+ * ETM_OPT_CTXTID is not enabled.
+ */
if (etm->timeless_decoding &&
event->header.type == PERF_RECORD_EXIT)
return cs_etm__process_timeless_queues(etm,
@@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
* onwards.
*/
etm->latest_kernel_timestamp = sample_kernel_timestamp;
- return cs_etm__process_queues(etm);
}
return 0;
--
2.28.0
Hi Linaro Coresight Team,
We are debugging Arm ETB. We don't know how to dump the ETB data with the right method. Please give me some comments, thanks so much.
* Let's me give a description for our ETB environment first.
For the Core-sight components, after enabled ETM and ETB. We can dump the ETB data after a while.
[cid:image004.jpg@01D76CF0.86E6C320]
According to ARM_CoreSight_Architecture_Specification.pdf, then We think it works by changing the trace source ID.
[cid:image005.jpg@01D76CF0.86E6C320]
The following data is from the head of trace buffer.
Before the trace source ID = 2:
00000005 = 00 | 0b10<<1+0b1 | 0x0 | 0x0
After the trace source ID = 4:
00000009 = 00 | 0b100<<1+0b1 | 0x0 | 0x0
0000000B
00000000
04000000
00800000
The registers about ETM are as follows.
TRCCONFIGR(id:0x10)=0x31F07
TRCTRACEIDR(id:0x40)=0x2
TRCIDR8(id:0x180)=0x1
TRCIDR9(id:0x184)=0x20
TRCIDR10(id:0x188)=0x2
TRCIDR11(id:0x18c)=0x0
TRCIDR12(id:0x190)=0x1
TRCIDR13(id:0x194)=0x0
TRCIDR0(id:0x1e0)=0x8020EFF
TRCIDR1(id:0x1e4)=0x4100F401 ARM ETM4.0.1
TRCIDR2(id:0x1e8)=0x420004
* How to use OpenCSD for M7?
We also try to dump with your source code "OpenCSD" https://github.com/Linaro/OpenCSD, it's a useful tool. But maybe we lost some configuration, our ETB can't be dumped.
Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x0
Idx:0; ID:0; I_NOT_SYNC : I Stream not synchronised
Idx:285; ID:0; I_INCOMPLETE_EOT : Incomplete packet at end of trace.[I_NOT_SYNC]
ID:0 END OF TRACE DATA
Trace Packet Lister : Trace buffer done, processed 288 bytes.
The attachments are the settings and trace buffer.
So my question is:
* Should we use ETM4D instead of ETMV4I?
#define OCSD_BUILTIN_DCD_ETMV4D "ETMV4D" /**< ETMv4 data decoder */
* Does OpenCSD support ETMV4D and Cortex-M7?
We can see the TODO from git repo.
Support to be added: ETMv4 data trace - packet processing and decode.
const char *decoderName = bDataChannel ? OCSD_BUILTIN_DCD_ETMV4D : OCSD_BUILTIN_DCD_ETMV4I;
I didn't see the M7 support.
69 { "Cortex-M0", { ARCH_V7, profile_CortexM } },
70 { "Cortex-M0+", { ARCH_V7, profile_CortexM } },
71 { "Cortex-M3", { ARCH_V7, profile_CortexM } },
72 { "Cortex-M4", { ARCH_V7, profile_CortexM } }
* A stupid question, could you kindly tell me which ARM document introduce the data encoder?
I debug test code step by step(see below lines), but it doesn't match with ETB trace buffer. I am not sure if there has more document about PEencoder/decoder.
debug_count += 1;
4B11 ldr r3,0x604
681B ldr r3,[r3]
3301 adds r3,#0x1
4A0F ldr r2,0x604
6013 str r3,[r2]
[cid:image011.jpg@01D76CF0.86E6C320]
[PC-bas<f and trace analyzer p roaessng Element System bus System PE interface JTAG port Trae:e unit Control and trace flWing FIFO generatOn Cores i g ht core"ht Figure 1-2 Example SOC with a trace unit and a dedicated trace buffer]
Looking forward to your feedback.
Thank you so much.
BR//Jinnan
This patchset introduces initial concepts in CoreSight system
configuration management support. to allow more detailed and complex
programming to be applied to CoreSight systems during trace capture.
Configurations consist of 2 elements:-
1) Features - programming combinations for devices, applied to a class of
device on the system (all ETMv4), or individual devices.
2) Configurations - a set of programmed features used when the named
configuration is selected.
Features and configurations are declared as a data table, a set of register,
resource and parameter requirements. Features and configurations are loaded
into the system by the virtual cs_syscfg device. This then matches features
to any registered devices and loads the feature into them.
Individual device classes that support feature and configuration register
with cs_syscfg.
Once loaded a configuration can be enabled for a specific trace run.
Configurations are registered with the perf cs_etm event as entries in
cs_etm/events. These can be selected on the perf command line as follows:-
perf record -e cs_etm/<config_name>/ ...
This patch set has one pre-loaded configuration and feature.
A named "strobing" feature is provided for ETMv4.
A named "autofdo" configuration is provided. This configuration enables
strobing on any ETM in used.
Thus the command:
perf record -e cs_etm/autofdo/ ...
will trace the supplied application while enabling the "autofdo" configuation
on each ETM as it is enabled by perf. This in turn will enable strobing for
the ETM - with default parameters. Parameters can be adjusted using configfs.
The sink used in the trace run will be automatically selected.
A configuration can supply up to 15 of preset parameter values, which will
subsitute in parameter values for any feature used in the configuration.
Selection of preset values as follows
perf record -e cs_etm/autofdo,preset=1/ ...
(valid presets 1-N, where N is the number supplied in the configuration, not
exceeding 15. preset=0 is the same as not selecting a preset.)
Applies to & tested against coresight/next-ETE-TRBE (5.12-rc3 base)
Changes since v6:
Fixed kernel test robot issues-
Reported-by: kernel test robot <lkp(a)intel.com>
Changes since v5:
1) Fix code style issues from auto-build reports, as
Reported-by: kernel test robot <lkp(a)intel.com>
2) Update comments to get consistent docs for API functions.
3) remove unused #define from autofdo example.
4) fix perf code style issues from patch 4 (Mathieu)
5) fix configfs code style issues from patch 9. (Mathieu)
Changes since v4: (based on comments from Matthieu and Suzuki).
No large functional changes - primarily code improvements and naming schema.
1) Updated entire set to ensure a consistent naming scheme was used for
variables and struct members that refer to the key objects in the system.
Suffixes _desc used for all references to feature and configuraion descriptors,
suffix _csdev used for all references to load feature and configs in the csdev
instances. (Mathieu & Suzuki).
2) Dropped the 'configurations' sub dir in cs_etm perf directories as superfluous
with the configfs containing the same information. (Mathieu).
3) Simplified perf handling code (suzuki)
4) Multiple simplifications and improvements for code readability (Matthieu
and Suzuki)
Changes since v3: (Primarily based on comments from Matthieu)
1) Locking mechanisms simplified.
2) Removed the possibility to enable features independently from
configurations.Only configurations can be enabled now. Simplifies programming
logic.
3) Configuration now uses an activate->enable mechanism. This means that perf
will activate a selected configuration at the start of a session (during
setup_aux), and disable at the end of a session (around free_aux)
The active configuration and associated features will be programmed into the
CoreSight device instances when they are enabled. This locks the configuration
into the system while in use. Parameters cannot be altered while this is
in place. This mechanism will be extended in future for dynamic load / unload
of configurations to prevent removal while in use.
4) Removed the custom bus / driver as un-necessary. A single device is
registered to own perf fs elements and configfs.
5) Various other minor issues addressed.
Changes since v2:
1) Added documentation file.
2) Altered cs_syscfg driver to no longer be coresight_device based, and moved
to its own custom bus to remove it from the main coresight bus. (Mathieu)
3) Added configfs support to inspect and control loaded configurations and
features. Allows listing of preset values (Yabin Cui)
4) Dropped sysfs support for adjusting feature parameters on the per device
basis, in favour of a single point adjustment in configfs that is pushed to all
device instances.
5) Altered how the config and preset command line options are handled in perf
and the drivers. (Mathieu and Suzuki).
6) Fixes for various issues and technical points (Mathieu, Yabin)
Changes since v1:
1) Moved preloaded configurations and features out of individual drivers.
2) Added cs_syscfg driver to manage configurations and features. Individual
drivers register with cs_syscfg indicating support for config, and provide
matching information that the system uses to load features into the drivers.
This allows individual drivers to be updated on an as needed basis - and
removes the need to consider devices that cannot benefit from configuration -
static replicators, funnels, tpiu.
3) Added perf selection of configuarations.
4) Rebased onto the coresight module loading set.
To follow in future revisions / sets:-
a) load of additional config and features by loadable module.
b) load of additional config and features by configfs
c) enhanced resource management for ETMv4 and checking features have sufficient
resources to be enabled.
d) ECT and CTI support for configuration and features.
Mike Leach (10):
coresight: syscfg: Initial coresight system configuration
coresight: syscfg: Add registration and feature loading for cs devices
coresight: config: Add configuration and feature generic functions
coresight: etm-perf: update to handle configuration selection
coresight: syscfg: Add API to activate and enable configurations
coresight: etm-perf: Update to activate selected configuration
coresight: etm4x: Add complex configuration handlers to etmv4
coresight: config: Add preloaded configurations
coresight: syscfg: Add initial configfs support
Documentation: coresight: Add documentation for CoreSight config
.../trace/coresight/coresight-config.rst | 244 ++++++
Documentation/trace/coresight/coresight.rst | 16 +
drivers/hwtracing/coresight/Makefile | 7 +-
.../hwtracing/coresight/coresight-cfg-afdo.c | 153 ++++
.../coresight/coresight-cfg-preload.c | 31 +
.../coresight/coresight-cfg-preload.h | 13 +
.../hwtracing/coresight/coresight-config.c | 275 ++++++
.../hwtracing/coresight/coresight-config.h | 253 ++++++
drivers/hwtracing/coresight/coresight-core.c | 12 +-
.../hwtracing/coresight/coresight-etm-perf.c | 150 +++-
.../hwtracing/coresight/coresight-etm-perf.h | 12 +-
.../hwtracing/coresight/coresight-etm4x-cfg.c | 182 ++++
.../hwtracing/coresight/coresight-etm4x-cfg.h | 30 +
.../coresight/coresight-etm4x-core.c | 38 +-
.../coresight/coresight-etm4x-sysfs.c | 3 +
.../coresight/coresight-syscfg-configfs.c | 396 +++++++++
.../coresight/coresight-syscfg-configfs.h | 45 +
.../hwtracing/coresight/coresight-syscfg.c | 804 ++++++++++++++++++
.../hwtracing/coresight/coresight-syscfg.h | 81 ++
include/linux/coresight.h | 7 +
20 files changed, 2716 insertions(+), 36 deletions(-)
create mode 100644 Documentation/trace/coresight/coresight-config.rst
create mode 100644 drivers/hwtracing/coresight/coresight-cfg-afdo.c
create mode 100644 drivers/hwtracing/coresight/coresight-cfg-preload.c
create mode 100644 drivers/hwtracing/coresight/coresight-cfg-preload.h
create mode 100644 drivers/hwtracing/coresight/coresight-config.c
create mode 100644 drivers/hwtracing/coresight/coresight-config.h
create mode 100644 drivers/hwtracing/coresight/coresight-etm4x-cfg.c
create mode 100644 drivers/hwtracing/coresight/coresight-etm4x-cfg.h
create mode 100644 drivers/hwtracing/coresight/coresight-syscfg-configfs.c
create mode 100644 drivers/hwtracing/coresight/coresight-syscfg-configfs.h
create mode 100644 drivers/hwtracing/coresight/coresight-syscfg.c
create mode 100644 drivers/hwtracing/coresight/coresight-syscfg.h
--
2.17.1