Good morning,
Is tracing a multi-threaded program a supported use case for perf cs-etm?
If yes, are there any flags that should be specified with perf?
Thanks,
Andrea
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
This patch series adds support for thread stack and callchain; this patch
set depends on the instruction sample fix patch set [1].
This patch set get more complex, so before divide into small groups, I'd
like to use this patch set version to include all relevant patches, hope
this can give whole context for related code change.
Briefly, this patch can be divided into three parts, which also can be
reviewed separately for every part:
Patches 01, 02 are used to fix samples for one corner case is for
accessing the branch's target address and trigger an exception.
Essentially, an extra branch sample is added to reflect this
mediate branch between the previous branch and exception entry.
Patches 03, 04, 05, 06 are coming from patch v4, which are used to
support thread stack and callchain.
Patches 07, 08, 09 are used to fixup for exception entry and exit. This
is mainly used to fix two cases, one part is to fixup the thread stack
and callchain for the case when access branch target address and trigger
exception; another part is to fixup the thread stack for instruction
emulation (and other single step cases).
This patch set has been tested on Juno-r2 after applied on perf/core
branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock
to maps__insert() error case"), and this patch set is also applied on
top of instruction sample fix patch set [1].
Test for option '-F,+callindent':
# perf script -F,+callindent
main 3258 1 branches: main ffffad684d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: _dl_fixup ffffad811b4c _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: _dl_lookup_symbol_x ffffad80c078 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: do_lookup_x ffffad80849c _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: check_match ffffad807bf0 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: strcmp ffffad807888 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
[...]
Test for option '--itrace=g':
# perf script --itrace=g16l64i100
main 3258 100 instructions:
ffffad816a80 memcpy+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad809468 _dl_new_object+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffffad80952c _dl_new_object+0x16c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffffad8018dc dl_main+0x814 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffff8000100878d0 el0_sync_handler+0x168 ([kernel.kallsyms])
ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms])
ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
Changes from v4:
* Addressed Mike's suggestion for performance improvement for function
cs_etm__instr_addr() for quick calculation for non T32;
* Removed the patch 'perf cs-etm: Synchronize instruction sample with
the thread stack' (Mike);
* Fixed the issue for exception is taken for branch target address
accessing, for the branch sample and stack thread handling, the
related patches are 01, 02, 07;
* Fixed the stack thread handling for instruction emulation and single
step with patches 08, 09.
Changes from v3:
* Split out separate patch set for instruction samples fixing.
* Rebased on latest perf/core branch.
Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
(Suzuki).
* Refined commit logs.
Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
another is for callchain support (Mathieu).
* Added a new patch to support branch filter.
[1] https://lkml.org/lkml/2020/2/18/1406
Leo Yan (9):
perf cs-etm: Defer to assign exception sample flag
perf cs-etm: Reflect branch prior to exception
perf cs-etm: Refactor instruction size handling
perf cs-etm: Support thread stack
perf cs-etm: Support branch filter
perf cs-etm: Support callchain for instruction sample
perf cs-etm: Fixup exception entry for thread stack
perf thread: Add helper to get top return address
perf cs-etm: Fixup exception exit for thread stack
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 +
tools/perf/util/cs-etm.c | 290 ++++++++++++++++--
tools/perf/util/thread-stack.c | 10 +
tools/perf/util/thread-stack.h | 1 +
4 files changed, 268 insertions(+), 34 deletions(-)
--
2.17.1
Hi Poonam,
Please CC the coresight mailing list (as I did) when asking questions
- there is a lot of well informed people on there that can also help
you.
On Thu, 23 Jan 2020 at 22:33, Poonam Aggrwal <poonam.aggrwal(a)nxp.com> wrote:
>
> Hello Mathieu
>
>
>
> Greetings!
>
>
>
> I have started to take a look at the Linux coresight framework, and get this enabled on a NXP ARMv8 device.
>
>
>
> Can you share some documentation on the configs required to be enabled and the device tree nodes?
For V8 we have to reference implementation - ARM Juno and the
dragonboard 410c. I highly recommend purchasing the latter (because
it is very cheap) in order to get an understanding of what a working
coresight system look like. It is much easier to start from a working
example than nothing at all. Other than that the coresight bindings
[1] are full of good examples. I would also have a look at the DT for
Juno [2] and the dragonboard[3]. The HOWTO.md [4] on github is a
really good starting point when you'll get to test things out.
[1]. https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bin…
[2]. https://elixir.bootlin.com/linux/latest/source/arch/arm64/boot/dts/arm/juno…
[3]. https://elixir.bootlin.com/linux/latest/source/arch/arm64/boot/dts/qcom/msm…
[4]. https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md
>
> To start I am looking to enable the ARMv8 ETM tracing.
Before going further I advise you to look at the source and sink
configuration on your platform. Up to now we've been working with
configurations where sources share a single sink (N:1 topology).
Newer SoC will have one source per sink (1:1 topology). At this time
only the former is supported by the framework. Supporting 1:1
topologies would require a fair amount of refactoring, something we
haven't had the opportunity to do for lack of HW platform to work
with.
Regards,
Mathieu
>
> Is there a reference which I can check in Linux for device tree and config.
>
>
>
> Many Thanks
>
> Poonam
This patch updates the documentation to better capture
the current status with the latest kernel releases and
defines new scripts for controlling strobing accordingly.
Signed-off-by: Andrea Brunato <andrea.brunato(a)arm.com>
---
HOWTO.md | 54 +++++++++---------
decoder/tests/auto-fdo/autofdo.md | 73 +++++++++++++++++++------
decoder/tests/auto-fdo/record.sh | 68 -----------------------
decoder/tests/auto-fdo/set_strobing.sh | 29 ++++++++++
decoder/tests/auto-fdo/show_strobing.sh | 6 ++
5 files changed, 115 insertions(+), 115 deletions(-)
delete mode 100644 decoder/tests/auto-fdo/record.sh
create mode 100755 decoder/tests/auto-fdo/set_strobing.sh
create mode 100755 decoder/tests/auto-fdo/show_strobing.sh
diff --git a/HOWTO.md b/HOWTO.md
index b16294a..ebf44eb 100644
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -6,35 +6,31 @@ HOWTO - using the library with perf {#howto_perf}
This HOWTO explains how to use the perf cmd line tools and the openCSD
library to collect and extract program flow traces generated by the
CoreSight IP blocks on a Linux system. The examples have been generated using
-an aarch64 Juno-r0 platform. All information is considered accurate and tested
-using the latest version of the library and the `master` branch on the
-[perf-opencsd github repository][1].
+an aarch64 Juno-r0 platform.
On Target Trace Acquisition - Perf Record
-----------------------------------------
-All the enhancement to the Perf tools that support the new `cs_etm` pmu have
-not been upstreamed yet. To get the required functionality branch
-`perf-opencsd-master` needs to be downloaded to the target system where
-traces are to be collected. This branch is a vanilla upstream kernel
-supplemented with modifications to the CoreSight framework and drivers to be
-usable by the Perf core. The remaining out of tree patches are being
-upstreamed incrementally.
-
-From there compiling the perf tools with `make -C tools/perf CORESIGHT=1` will
-yield a `perf` executable that will support CoreSight trace collection. Note
-that if traces are to be decompressed *off* target, there is no need to download
+
+Compile the perf tool from the same kernel source code version you are using,
+with `make -C tools/perf`.
+This will yield a `perf` executable that will support CoreSight trace collection.
+Note that if traces are to be decompressed *off* target, there is no need to download
and compile the openCSD library (on the target).
+If you are instead planning to use perf to record and decode the trace on the target,
+compile the perf tool linking against the openCSD library, in the following way:
+`make -C tools/perf VF=1 CORESIGHT=1`
+
+
Before launching a trace run a sink that will collect trace data needs to be
identified. All CoreSight blocks identified by the framework are registed in
sysFS:
linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/
- 20010000.etf 20040000.main_funnel 22040000.etm 22140000.etm
- 230c0000.A53_funnel 23240000.etm replicator@20020000 20030000.tpiu
- 20070000.etr 220c0000.A57_funnel 23040000.etm 23140000.etm 23340000.etm
+ etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0
+ etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0
CoreSight blocks are listed in the device tree for a specific system and
@@ -43,7 +39,7 @@ the sink that will recieve trace data needs to be identified and given as an
option on the perf command line. Once a sink has been identify trace collection
can start. An easy and yet interesting example is the `uname` command:
- linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/(a)20070000.etr/ --per-thread uname
+ linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@tmc_etr0/ --per-thread uname
This will generate a `perf.data` file where execution has been traced for both
user and kernel space. To narrow the field to either user or kernel space the
@@ -51,7 +47,7 @@ user and kernel space. To narrow the field to either user or kernel space the
traces to user space:
- linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/(a)20070000.etr/u --per-thread uname
+ linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@tmc_etr0/u --per-thread uname
Problems setting modules path maps, continuing anyway...
-----------------------------------------------------------
perf_event_attr:
@@ -131,9 +127,9 @@ falls within the specified range. Any work done by the CPU outside of that
range will not be traced. Address range filters can be specified for both
user and kernel space session:
- perf record -e cs_etm/(a)20070000.etr/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname
+ perf record -e cs_etm/@tmc_etr0/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname
- perf record -e cs_etm/(a)20070000.etr/u --filter 'filter 0x72c/0x40(a)/opt/lib/libcstest.so.1.0' --per-thread ./main
+ perf record -e cs_etm/@tmc_etr0/u --filter 'filter 0x72c/0x40(a)/opt/lib/libcstest.so.1.0' --per-thread ./main
When dealing with kernel space trace addresses are typically taken in the
'System.map' file. In user space addresses are relocatable and can be
@@ -171,20 +167,20 @@ equal to the start address. Incidentally traces stop being generated when the
insruction pointer is equal to the stop address. Anything that happens between
there to events is traced:
- perf record -e cs_etm/(a)20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname
+ perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname
- perf record -vvv -e cs_etm/(a)20070000.etr/u --filter 'start 0x72c(a)/opt/lib/libcstest.so.1.0, \
+ perf record -vvv -e cs_etm/@tmc_etr0/u --filter 'start 0x72c(a)/opt/lib/libcstest.so.1.0, \
stop 0x40082c@/home/linaro/main' \
- --per-thread ./main
+ --per-thread ./main
**Limitation on address filters:**
The only limitation on address filters is the amount of address comparator
found on an implementation and the mutual exclusion between range and
start stop filters. As such the following example would _not_ work:
- perf record -e cs_etm/(a)20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop
+ perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop
filter 0x72c/0x40(a)/opt/lib/libcstest.so.1.0' \ // address range
- --per-thread uname
+ --per-thread uname
Additional Trace Options
------------------------
@@ -198,7 +194,7 @@ Presently this threshold is fixed at 256 cycles for `perf record`.
Command line options in `perf record` to use these features are part of the options for the `cs_etm` event:
- perf record -e cs_etm/timestamp,cycacc,(a)20070000.etr/ --per-thread uname
+ perf record -e cs_etm/timestamp,cycacc,@tmc_etr0/ --per-thread uname
At current version, `perf record` and `perf script` do not use this additional information.
@@ -248,7 +244,7 @@ The openCSD library is not part of the perf tools. It is available on
[github][1] and needs to be compiled before the perf tools. Checkout the
required branch/tag version into a local directory.
- linaro@t430:~/linaro/coresight$ git clone -b v0.8 https://github.com/Linaro/OpenCSD.git my-opencsd
+ linaro@t430:~/linaro/coresight$ git clone https://github.com/Linaro/OpenCSD.git my-opencsd
Cloning into 'OpenCSD'...
remote: Counting objects: 2063, done.
remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063
@@ -629,7 +625,7 @@ Best regards,
*The Linaro CoreSight Team*
--------------------------------------
-[1]: https://github.com/Linaro/perf-opencsd "perf-opencsd Github"
+[1]: https://github.com/Linaro/OpenCSD
[2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
diff --git a/decoder/tests/auto-fdo/autofdo.md b/decoder/tests/auto-fdo/autofdo.md
index b1f2241..d7d37b1 100644
--- a/decoder/tests/auto-fdo/autofdo.md
+++ b/decoder/tests/auto-fdo/autofdo.md
@@ -99,6 +99,8 @@ You can include these backports in your kernel by either merging the
appropriate branch using git or generating patches (using `git
format-patch`).
+For 5.5 based kernel, the only patch which needs to be applied is the one enabling strobing - etm4x: `Enable strobing of ETM`.
+
For 4.9 based kernels, use the `coresight-4.9-etr-etm_strobe` branch:
```
@@ -129,7 +131,7 @@ git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
The CoreSight trace drivers must also be enabled in the kernel
configuration. This can be done using the configuration menu (`make
-menuconfig`), selecting `Kernel hacking` / `CoreSight Tracing Support` and
+menuconfig`), selecting `Kernel hacking` / `arm64 Debugging` /`CoreSight Tracing Support` and
enabling all options, or by setting the following in the configuration
file:
@@ -165,11 +167,15 @@ CoreSight devices, you should find the devices in sysfs:
```
# ls /sys/bus/coresight/devices/
-28440000.etm 28540000.etm 28640000.etm 28740000.etm
-28c03000.funnel 28c04000.etf 28c05000.replicator 28c06000.etr
-28c07000.tpiu
+etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0
+etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0
```
+The naming convention for etm devices can be different according to the kernel version you're using.
+For more information about the naming scheme, please check out the [Linux Kernel Documentation](https://www.kernel.org/doc/html/latest/trace/coresight/cores…
+
+If `/sys/bus/coresight/devices/` is empty, you may want to check out your Kernel configuration to make sure your .config file is including CoreSight dependencies, such as the clock.
+
### Perf tools
The perf tool is used to capture execution trace, configuring the trace
@@ -180,9 +186,12 @@ Arm recommends to use the perf version corresponding to the kernel running
on the target. This can be built from the same kernel sources with
```
-make -C tools/perf ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
+make -C tools/perf CORESIGHT=1 VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
```
+When specifying CORESIGHT=1, perf will be built using the installed OpenCSD library.
+If you are cross compiling, then additional setup is required to ensure the build process links against the correct version of the library.
+
If the post-processing (`perf inject`) of the captured data is not being
done on the target, then the OpenCSD library is not required for this build
of perf.
@@ -193,13 +202,22 @@ also be restricted to user space or kernel space with 'u' or 'k'
parameters. For example:
```
-perf record -e cs_etm/(a)28c06000.etr/u --per-thread -- /bin/ls
+perf record -e cs_etm/@tmc_etr0/u --per-thread -- /bin/ls
```
-Will record the userspace execution of '/bin/ls' into the ETR located at
-0x28c06000. Note the `--per-thread` option is required - perf currently
-only supports trace of a single thread of execution. CPU wide trace is a
-work in progresss.
+Will record the userspace execution of '/bin/ls' using tmc_etr0 as sink.
+
+## Capturing modes
+
+You can trace a single-threaded program in two different ways:
+
+1. By specifying `--per-thread`, and in this case the CoreSight subsystem will
+record only a trace relative to the given program.
+
+2. By NOT specifying `--per-thread`, and in this case CPU-wide tracing will
+be enabled. In this scenario the trace will contain both the target program trace
+and other workloads that were executing on the same CPU
+
## Processing trace and profiles
@@ -241,26 +259,42 @@ For example, a typical configuration is to use a window size of 5000 cycles
and a period of 10000 - this will collect 5000 cycles of trace every 50M
cycles. With these proof-of-concept patches, the strobe parameters are
configured via sysfs - each ETM will have `strobe_window` and
-`strobe_period` parameters in `/sys/bus/coresight/devices/NNNNNNNN.etm` and
+`strobe_period` parameters in `/sys/bus/coresight/devices/<sink>` and
these values will have to be written to each (In a future version, this
-will be integrated into the drivers and perf tool). The `record.sh`
-script in this directory [`<opencsd>/decoder/tests/auto-fdo`] automates this process.
+will be integrated into the drivers and perf tool).
+The `set_strobing.sh` script in this directory [`<opencsd>/decoder/tests/auto-fdo`] automates this process.
To collect trace from an application using ETM strobing, run:
```
-taskset -c 0 ./record.sh --strobe 5000 10000 28c06000.etr ./my_application arg1 arg2
+sudo ./set_strobing.sh 5000 10000
+perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>"
```
-The taskset command is used to ensure the process stays on the same CPU
-during execution.
-
The raw trace can be examined using the `perf report` command:
```
perf report -D -i perf.data --stdio
```
+Perf needs to be built from your linux kernel version souce code repository against the OpenCSD library in order to be able to properly read ETM-gathered samples and post-process them.
+If running `perf report` produces an error like:
+
+```
+0x1f8 [0x268]: failed to process type: 70 [Operation not permitted]
+Error:
+failed to process sample
+```
+or
+
+```
+"file uses a more recent and unsupported ABI (8 bytes extra). incompatible file format".
+```
+
+You are probably using a perf version which is not using this library: please make sure to install this project in your system by either compiling it from [Source Code]( <https://github.com/Linaro/OpenCSD>) from v0.9.1 or later and compile perf using this library.
+Otherwise, this project is packaged for debian (install the libopencsd0, libopencsd-dev packages).
+
+
For example:
```
@@ -295,6 +329,8 @@ an embedded target). The `perf inject` command
decodes the execution trace and generates periodic instruction samples,
with branch histories:
+!! Careful: if you are using a device different than the one used to collect the profiling data,
+you'll need to run `perf buildid-cache` as described below.
```
perf inject -i perf.data -o inj.data --itrace=i100000il
```
@@ -393,7 +429,8 @@ clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c
The basic commands to run an application and create a compiler profile are:
```
-taskset -c 0 ./record.sh --strobe 5000 10000 28c06000.etr ./my_application arg1 arg2
+sudo ./set_strobing.sh 5000 10000
+perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>"
perf inject -i perf.data -o inj.data --itrace=i100000il
create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof
```
diff --git a/decoder/tests/auto-fdo/record.sh b/decoder/tests/auto-fdo/record.sh
deleted file mode 100644
index 16d4ba2..0000000
--- a/decoder/tests/auto-fdo/record.sh
+++ /dev/null
@@ -1,68 +0,0 @@
-#!/bin/sh
-
-BUFFER_ETF_A53=ec802000.etf
-BUFFER_ETF_A73=ed002000.etf
-BUFFER_ETF_SYS=ec036000.etf
-BUFFER_ETR=ec033000.etr
-
-OUT_FILE=perf.data
-
-STROBE=
-
-while :; do
- case $1 in
- --strobe)
- STROBE=y
- WINDOW=$2
- PERIOD=$3
- shift 3
- ;;
-
- *)
- break ;;
- esac
-done
-
-case $1 in
- etr)
- BUFFER=$BUFFER_ETR
- ;;
-
- etf-sys)
- BUFFER=$BUFFER_ETF_SYS
- ;;
-
- "")
- BUFFER=$BUFFER_ETR
- ;;
-
- *)
- BUFFER=$1
- ;;
-esac
-
-shift 1
-
-case $0 in
- /*) F=$0 ;;
- *) F=$(pwd)/$0 ;;
-esac
-
-SCRIPT_DIR=$(dirname $F)
-
-if [ "$STROBE" ]; then
- for e in /sys/bus/coresight/devices/*.etm/; do
- printf "%x" $WINDOW | sudo tee $e/strobe_window > /dev/null
- printf "%x" $PERIOD | sudo tee $e/strobe_period > /dev/null
- done
-fi
-
-PERF=$SCRIPT_DIR/perf
-
-export LD_LIBRARY_PATH=$SCRIPT_DIR:$LD_LIBRARY_PATH
-
-sudo LD_LIBRARY_PATH=$SCRIPT_DIR:$LD_LIBRARY_PATH $PERF record $PERF_ARGS -e cs_etm/@$BUFFER/u --per-thread "$@"
-
-sudo chown $(id -u):$(id -g) $OUT_FILE
-
-
diff --git a/decoder/tests/auto-fdo/set_strobing.sh b/decoder/tests/auto-fdo/set_strobing.sh
new file mode 100755
index 0000000..e11e62d
--- /dev/null
+++ b/decoder/tests/auto-fdo/set_strobing.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+WINDOW=$1
+PERIOD=$2
+
+if [[ -z $WINDOW ]] || [[ -z $PERIOD ]]; then
+ echo "Window or Period not specified!"
+ echo "Example usage: ./set_strobing.sh <WINDOW VALUE> <PERIOD VALUE>"
+ echo "Example usage: ./set_strobing.sh 5000 10000"
+ exit
+fi
+
+
+if [[ $EUID != 0 ]]; then
+ echo "Please run as root"
+ exit
+fi
+
+for e in /sys/bus/coresight/devices/etm*/; do
+ printf "%x" $WINDOW | tee $e/strobe_window > /dev/null
+ printf "%x" $PERIOD | tee $e/strobe_period > /dev/null
+ echo "Strobing period for $e set to $((`cat $e/strobe_period`))"
+ echo "Strobing window for $e set to $((`cat $e/strobe_window`))"
+done
+
+## Shows the user a simple usage example
+echo ">> Done! <<"
+echo "You can now run perf to trace your application, for example:"
+echo "perf record -e cs_etm/@tmc_etr0/u -- <your app>"
diff --git a/decoder/tests/auto-fdo/show_strobing.sh b/decoder/tests/auto-fdo/show_strobing.sh
new file mode 100755
index 0000000..d80d84c
--- /dev/null
+++ b/decoder/tests/auto-fdo/show_strobing.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+for e in /sys/bus/coresight/devices/etm*/; do
+ echo "Strobing period for $e is $((`cat $e/strobe_period`))"
+ echo "Strobing window for $e is $((`cat $e/strobe_window`))"
+done
--
2.17.1
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
OpenCSD v0.14.0 is now released.
This update contains a re-work of the ETMv4 decoder to simplify,
reduce re-entancy and enable speculative tracing support.
Speculative tracing support has been verified by an architecture
development team within ARM.
Other bugfixes and minor enhancements as per the readme file.
Documentation is updated to reflect the latest versions of the kernel
drivers (5.x).
v0.15.0-dev version also released. This contains preliminary support
for Q elements.
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
From: Wojciech Zmuda <wzmuda(a)n7space.com>
Perf allows changing where the buildid cache directory is created.
Mention it in the howto document.
Signed-off-by: Wojciech Zmuda <wzmuda(a)n7space.com>
---
HOWTO.md | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/HOWTO.md b/HOWTO.md
index b16294a..a8b5ce9 100644
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -633,4 +633,5 @@ Best regards,
[2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
-[3]: Get in touch with us if you know a way to change this.
+[3]: It can be changed with: perf-config:
+ perf config --system buildid.dir=/my/own/buildid/dir
--
2.11.0
Hi All,
I follow this guide to do test on i.MX8MP,
https://elinux.org/images/b/b3/Hardware_Assisted_Tracing_on_ARM.pdf
I could see dump with perf report --dump (attached),
but when I use perf report --stdio, I met
Error:
The perf.data data has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
The perf I use to dump data is not compiled with CORESIGHT
support, but the host perf has been compiled with CORESIGHT support.
Do you know what might cause this issue?
Thanks,
Peng.
Hello,
I'm exploring possibilities of tracing two concurrent programs pinned to two CPU cores, sharing the same sink. I tried to spawn two concurrent perf sessions but my results are not satisfying, so I wonder if such possibility exists at all.
# taskset -c 1 ./progA
# taskset -c 2 ./progB
# perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progA_pid
# perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progB_pid
If progA and progB mostly sleep - I get trace data for both, which is fine. However If at least one of the programs gets more CPU-intensive (loops with arithmetic computations inside, no explicit sleeping nor waiting or IO), I get trace only for the more intensive one. If both are intensive, it seems random which one gets traced.
This observations suggests ETR buffer overflow. However, if CPU-intensive versions of progA and progB are scheduled on the same CPU - this method seem to work. I would expect ETR buffer being insufficient in this scenario as well.
To enlarge the ETR buffer I experimented with the -m flag, but I'm unable to use more than -m,16M. Independent RSZ polling shows that this value gets programmed, but anything above 16 MB makes TMC-ETR driver complain that cma_alloc failed to get that amount of memory. Anyway, enlarging buffer to 16MB doesn't seem to affect my issue. With bigger buffer my observations are identical.
Those observations make me suspect that another technical obstacle might exists, beside possible buffer overflow.
I also tried with CPU-wide mode and it seem to work:
taskset -c 1 ./progA
taskset -c 2 ./progB
perf record -e cs_etm/timestamp,@tmc_etr0/u -C 2,3
but this approach is quite limited as filters don't work in CPU-wide mode and perf itself is also traced (which is weird, as I tried setting CPU affinity of perf-record with taskset as well - didn't help).
To wrap-up:
1. Is it possible to trace two programs with two perf-record sessions at the same time, sharing a sink?
2. Is it possible to enlarge TMC-ETR buffer above 16MB? I guess SG mode might be an option here, but as I can't really modify my kernel and DT right now. Perhaps there's a possibility to make the kernel allocator work past the 16MB boundary?
Thank you and best regards,
Wojciech
PS Sorry I don't proceed with the Coresight@Zynq MPSoC support I started some time ago. My access to the board is limited recently and it's hard to proceed with kernel development remotely. I hope to get back to it soon.
This patch series is to address issues for synthesizing instruction
samples, especially when the instruction sample period is small enough,
the current logic cannot synthesize multiple instruction samples within
one instruction range packet.
Patch 0001 is to swap packets for instruction samples, so this allow
option '--itrace=iNNN' can work well.
Patch 0002 avoids to reset the last branches for every instruction
sample; if reset the last branches for every time generating sample, the
later samples in the same range packet cannot use the last branches
anymore.
Patch 0003 is the fixing for handling different instruction periods,
especially for small sample period.
Patch 0004 is an optimization for copying last branches; it only copies
last branches once if the instruction samples share the same last
branches.
Patch 0005 is a minor fix for unsigned variable comparison to zero.
This patch set has been rebased on the latest perf/core branch; and
verified on Juno board with below commands:
# perf script --itrace=i2
# perf script --itrace=i2il16
# perf inject --itrace=i2il16 -i perf.data -o perf.data.new
# perf inject --itrace=i100il16 -i perf.data -o perf.data.new
Changes from v4:
* Added Mike's review tag for patch 03;
* Added Mathieu's review tags for all patches.
Changes from v3:
* Refactored patch 0001 with new function cs_etm__packet_swap() (Mike);
* Refined instruction sample generation flow with single while loop,
which completely uses Mike's suggestions (Mike);
* Added Mike's review tags for patch 01/02/04/05.
Changes from v2:
* Added patch 0001 which is to fix swapping packets for instruction
samples;
* Refined minor commit logs and comments;
* Rebased on the latest perf/core branch.
Changes from v1:
* Rebased patch set on perf/core branch with latest commit 9fec3cd5fa4a
("perf map: Check if the map still has some refcounts on exit").
Leo Yan (5):
perf cs-etm: Swap packets for instruction samples
perf cs-etm: Continuously record last branch
perf cs-etm: Correct synthesizing instruction samples
perf cs-etm: Optimize copying last branches
perf cs-etm: Fix unsigned variable comparison to zero
tools/perf/util/cs-etm.c | 157 +++++++++++++++++++++++++++------------
1 file changed, 111 insertions(+), 46 deletions(-)
--
2.17.1