This patch updates the documentation to better capture the current status with the latest kernel releases and defines new scripts for controlling strobing accordingly.
Signed-off-by: Andrea Brunato andrea.brunato@arm.com --- HOWTO.md | 54 +++++++++--------- decoder/tests/auto-fdo/autofdo.md | 73 +++++++++++++++++++------ decoder/tests/auto-fdo/record.sh | 68 ----------------------- decoder/tests/auto-fdo/set_strobing.sh | 29 ++++++++++ decoder/tests/auto-fdo/show_strobing.sh | 6 ++ 5 files changed, 115 insertions(+), 115 deletions(-) delete mode 100644 decoder/tests/auto-fdo/record.sh create mode 100755 decoder/tests/auto-fdo/set_strobing.sh create mode 100755 decoder/tests/auto-fdo/show_strobing.sh
diff --git a/HOWTO.md b/HOWTO.md index b16294a..ebf44eb 100644 --- a/HOWTO.md +++ b/HOWTO.md @@ -6,35 +6,31 @@ HOWTO - using the library with perf {#howto_perf} This HOWTO explains how to use the perf cmd line tools and the openCSD library to collect and extract program flow traces generated by the CoreSight IP blocks on a Linux system. The examples have been generated using -an aarch64 Juno-r0 platform. All information is considered accurate and tested -using the latest version of the library and the `master` branch on the -[perf-opencsd github repository][1]. +an aarch64 Juno-r0 platform.
On Target Trace Acquisition - Perf Record ----------------------------------------- -All the enhancement to the Perf tools that support the new `cs_etm` pmu have -not been upstreamed yet. To get the required functionality branch -`perf-opencsd-master` needs to be downloaded to the target system where -traces are to be collected. This branch is a vanilla upstream kernel -supplemented with modifications to the CoreSight framework and drivers to be -usable by the Perf core. The remaining out of tree patches are being -upstreamed incrementally. - -From there compiling the perf tools with `make -C tools/perf CORESIGHT=1` will -yield a `perf` executable that will support CoreSight trace collection. Note -that if traces are to be decompressed *off* target, there is no need to download + +Compile the perf tool from the same kernel source code version you are using, +with `make -C tools/perf`. +This will yield a `perf` executable that will support CoreSight trace collection. +Note that if traces are to be decompressed *off* target, there is no need to download and compile the openCSD library (on the target).
+If you are instead planning to use perf to record and decode the trace on the target, +compile the perf tool linking against the openCSD library, in the following way: +`make -C tools/perf VF=1 CORESIGHT=1` + + Before launching a trace run a sink that will collect trace data needs to be identified. All CoreSight blocks identified by the framework are registed in sysFS:
linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/ - 20010000.etf 20040000.main_funnel 22040000.etm 22140000.etm - 230c0000.A53_funnel 23240000.etm replicator@20020000 20030000.tpiu - 20070000.etr 220c0000.A57_funnel 23040000.etm 23140000.etm 23340000.etm + etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0 + etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0
CoreSight blocks are listed in the device tree for a specific system and @@ -43,7 +39,7 @@ the sink that will recieve trace data needs to be identified and given as an option on the perf command line. Once a sink has been identify trace collection can start. An easy and yet interesting example is the `uname` command:
- linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@20070000.etr/ --per-thread uname + linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@tmc_etr0/ --per-thread uname
This will generate a `perf.data` file where execution has been traced for both user and kernel space. To narrow the field to either user or kernel space the @@ -51,7 +47,7 @@ user and kernel space. To narrow the field to either user or kernel space the traces to user space:
- linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@20070000.etr/u --per-thread uname + linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@tmc_etr0/u --per-thread uname Problems setting modules path maps, continuing anyway... ----------------------------------------------------------- perf_event_attr: @@ -131,9 +127,9 @@ falls within the specified range. Any work done by the CPU outside of that range will not be traced. Address range filters can be specified for both user and kernel space session:
- perf record -e cs_etm/@20070000.etr/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname + perf record -e cs_etm/@tmc_etr0/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname
- perf record -e cs_etm/@20070000.etr/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main + perf record -e cs_etm/@tmc_etr0/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
When dealing with kernel space trace addresses are typically taken in the 'System.map' file. In user space addresses are relocatable and can be @@ -171,20 +167,20 @@ equal to the start address. Incidentally traces stop being generated when the insruction pointer is equal to the stop address. Anything that happens between there to events is traced:
- perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname + perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname
- perf record -vvv -e cs_etm/@20070000.etr/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0, \ + perf record -vvv -e cs_etm/@tmc_etr0/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0, \ stop 0x40082c@/home/linaro/main' \ - --per-thread ./main + --per-thread ./main
**Limitation on address filters:** The only limitation on address filters is the amount of address comparator found on an implementation and the mutual exclusion between range and start stop filters. As such the following example would _not_ work:
- perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop + perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' \ // address range - --per-thread uname + --per-thread uname
Additional Trace Options ------------------------ @@ -198,7 +194,7 @@ Presently this threshold is fixed at 256 cycles for `perf record`.
Command line options in `perf record` to use these features are part of the options for the `cs_etm` event:
- perf record -e cs_etm/timestamp,cycacc,@20070000.etr/ --per-thread uname + perf record -e cs_etm/timestamp,cycacc,@tmc_etr0/ --per-thread uname
At current version, `perf record` and `perf script` do not use this additional information.
@@ -248,7 +244,7 @@ The openCSD library is not part of the perf tools. It is available on [github][1] and needs to be compiled before the perf tools. Checkout the required branch/tag version into a local directory.
- linaro@t430:~/linaro/coresight$ git clone -b v0.8 https://github.com/Linaro/OpenCSD.git my-opencsd + linaro@t430:~/linaro/coresight$ git clone https://github.com/Linaro/OpenCSD.git my-opencsd Cloning into 'OpenCSD'... remote: Counting objects: 2063, done. remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063 @@ -629,7 +625,7 @@ Best regards, *The Linaro CoreSight Team*
-------------------------------------- -[1]: https://github.com/Linaro/perf-opencsd "perf-opencsd Github" +[1]: https://github.com/Linaro/OpenCSD
[2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
diff --git a/decoder/tests/auto-fdo/autofdo.md b/decoder/tests/auto-fdo/autofdo.md index b1f2241..d7d37b1 100644 --- a/decoder/tests/auto-fdo/autofdo.md +++ b/decoder/tests/auto-fdo/autofdo.md @@ -99,6 +99,8 @@ You can include these backports in your kernel by either merging the appropriate branch using git or generating patches (using `git format-patch`).
+For 5.5 based kernel, the only patch which needs to be applied is the one enabling strobing - etm4x: `Enable strobing of ETM`. + For 4.9 based kernels, use the `coresight-4.9-etr-etm_strobe` branch:
``` @@ -129,7 +131,7 @@ git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
The CoreSight trace drivers must also be enabled in the kernel configuration. This can be done using the configuration menu (`make -menuconfig`), selecting `Kernel hacking` / `CoreSight Tracing Support` and +menuconfig`), selecting `Kernel hacking` / `arm64 Debugging` /`CoreSight Tracing Support` and enabling all options, or by setting the following in the configuration file:
@@ -165,11 +167,15 @@ CoreSight devices, you should find the devices in sysfs:
``` # ls /sys/bus/coresight/devices/ -28440000.etm 28540000.etm 28640000.etm 28740000.etm -28c03000.funnel 28c04000.etf 28c05000.replicator 28c06000.etr -28c07000.tpiu +etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0 +etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0 ```
+The naming convention for etm devices can be different according to the kernel version you're using. +For more information about the naming scheme, please check out the [Linux Kernel Documentation](https://www.kernel.org/doc/html/latest/trace/coresight/coresight.html#device...) + +If `/sys/bus/coresight/devices/` is empty, you may want to check out your Kernel configuration to make sure your .config file is including CoreSight dependencies, such as the clock. + ### Perf tools
The perf tool is used to capture execution trace, configuring the trace @@ -180,9 +186,12 @@ Arm recommends to use the perf version corresponding to the kernel running on the target. This can be built from the same kernel sources with
``` -make -C tools/perf ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- +make -C tools/perf CORESIGHT=1 VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- ```
+When specifying CORESIGHT=1, perf will be built using the installed OpenCSD library. +If you are cross compiling, then additional setup is required to ensure the build process links against the correct version of the library. + If the post-processing (`perf inject`) of the captured data is not being done on the target, then the OpenCSD library is not required for this build of perf. @@ -193,13 +202,22 @@ also be restricted to user space or kernel space with 'u' or 'k' parameters. For example:
``` -perf record -e cs_etm/@28c06000.etr/u --per-thread -- /bin/ls +perf record -e cs_etm/@tmc_etr0/u --per-thread -- /bin/ls ```
-Will record the userspace execution of '/bin/ls' into the ETR located at -0x28c06000. Note the `--per-thread` option is required - perf currently -only supports trace of a single thread of execution. CPU wide trace is a -work in progresss. +Will record the userspace execution of '/bin/ls' using tmc_etr0 as sink. + +## Capturing modes + +You can trace a single-threaded program in two different ways: + +1. By specifying `--per-thread`, and in this case the CoreSight subsystem will +record only a trace relative to the given program. + +2. By NOT specifying `--per-thread`, and in this case CPU-wide tracing will +be enabled. In this scenario the trace will contain both the target program trace +and other workloads that were executing on the same CPU +
## Processing trace and profiles @@ -241,26 +259,42 @@ For example, a typical configuration is to use a window size of 5000 cycles and a period of 10000 - this will collect 5000 cycles of trace every 50M cycles. With these proof-of-concept patches, the strobe parameters are configured via sysfs - each ETM will have `strobe_window` and -`strobe_period` parameters in `/sys/bus/coresight/devices/NNNNNNNN.etm` and +`strobe_period` parameters in `/sys/bus/coresight/devices/<sink>` and these values will have to be written to each (In a future version, this -will be integrated into the drivers and perf tool). The `record.sh` -script in this directory [`<opencsd>/decoder/tests/auto-fdo`] automates this process. +will be integrated into the drivers and perf tool). +The `set_strobing.sh` script in this directory [`<opencsd>/decoder/tests/auto-fdo`] automates this process.
To collect trace from an application using ETM strobing, run:
``` -taskset -c 0 ./record.sh --strobe 5000 10000 28c06000.etr ./my_application arg1 arg2 +sudo ./set_strobing.sh 5000 10000 +perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>" ```
-The taskset command is used to ensure the process stays on the same CPU -during execution. - The raw trace can be examined using the `perf report` command:
``` perf report -D -i perf.data --stdio ```
+Perf needs to be built from your linux kernel version souce code repository against the OpenCSD library in order to be able to properly read ETM-gathered samples and post-process them. +If running `perf report` produces an error like: + +``` +0x1f8 [0x268]: failed to process type: 70 [Operation not permitted] +Error: +failed to process sample +``` +or + +``` +"file uses a more recent and unsupported ABI (8 bytes extra). incompatible file format". +``` + +You are probably using a perf version which is not using this library: please make sure to install this project in your system by either compiling it from [Source Code]( https://github.com/Linaro/OpenCSD) from v0.9.1 or later and compile perf using this library. +Otherwise, this project is packaged for debian (install the libopencsd0, libopencsd-dev packages). + + For example:
``` @@ -295,6 +329,8 @@ an embedded target). The `perf inject` command decodes the execution trace and generates periodic instruction samples, with branch histories:
+!! Careful: if you are using a device different than the one used to collect the profiling data, +you'll need to run `perf buildid-cache` as described below. ``` perf inject -i perf.data -o inj.data --itrace=i100000il ``` @@ -393,7 +429,8 @@ clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c The basic commands to run an application and create a compiler profile are:
``` -taskset -c 0 ./record.sh --strobe 5000 10000 28c06000.etr ./my_application arg1 arg2 +sudo ./set_strobing.sh 5000 10000 +perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>" perf inject -i perf.data -o inj.data --itrace=i100000il create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof ``` diff --git a/decoder/tests/auto-fdo/record.sh b/decoder/tests/auto-fdo/record.sh deleted file mode 100644 index 16d4ba2..0000000 --- a/decoder/tests/auto-fdo/record.sh +++ /dev/null @@ -1,68 +0,0 @@ -#!/bin/sh - -BUFFER_ETF_A53=ec802000.etf -BUFFER_ETF_A73=ed002000.etf -BUFFER_ETF_SYS=ec036000.etf -BUFFER_ETR=ec033000.etr - -OUT_FILE=perf.data - -STROBE= - -while :; do - case $1 in - --strobe) - STROBE=y - WINDOW=$2 - PERIOD=$3 - shift 3 - ;; - - *) - break ;; - esac -done - -case $1 in - etr) - BUFFER=$BUFFER_ETR - ;; - - etf-sys) - BUFFER=$BUFFER_ETF_SYS - ;; - - "") - BUFFER=$BUFFER_ETR - ;; - - *) - BUFFER=$1 - ;; -esac - -shift 1 - -case $0 in - /*) F=$0 ;; - *) F=$(pwd)/$0 ;; -esac - -SCRIPT_DIR=$(dirname $F) - -if [ "$STROBE" ]; then - for e in /sys/bus/coresight/devices/*.etm/; do - printf "%x" $WINDOW | sudo tee $e/strobe_window > /dev/null - printf "%x" $PERIOD | sudo tee $e/strobe_period > /dev/null - done -fi - -PERF=$SCRIPT_DIR/perf - -export LD_LIBRARY_PATH=$SCRIPT_DIR:$LD_LIBRARY_PATH - -sudo LD_LIBRARY_PATH=$SCRIPT_DIR:$LD_LIBRARY_PATH $PERF record $PERF_ARGS -e cs_etm/@$BUFFER/u --per-thread "$@" - -sudo chown $(id -u):$(id -g) $OUT_FILE - - diff --git a/decoder/tests/auto-fdo/set_strobing.sh b/decoder/tests/auto-fdo/set_strobing.sh new file mode 100755 index 0000000..e11e62d --- /dev/null +++ b/decoder/tests/auto-fdo/set_strobing.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +WINDOW=$1 +PERIOD=$2 + +if [[ -z $WINDOW ]] || [[ -z $PERIOD ]]; then + echo "Window or Period not specified!" + echo "Example usage: ./set_strobing.sh <WINDOW VALUE> <PERIOD VALUE>" + echo "Example usage: ./set_strobing.sh 5000 10000" + exit +fi + + +if [[ $EUID != 0 ]]; then + echo "Please run as root" + exit +fi + +for e in /sys/bus/coresight/devices/etm*/; do + printf "%x" $WINDOW | tee $e/strobe_window > /dev/null + printf "%x" $PERIOD | tee $e/strobe_period > /dev/null + echo "Strobing period for $e set to $((`cat $e/strobe_period`))" + echo "Strobing window for $e set to $((`cat $e/strobe_window`))" +done + +## Shows the user a simple usage example +echo ">> Done! <<" +echo "You can now run perf to trace your application, for example:" +echo "perf record -e cs_etm/@tmc_etr0/u -- <your app>" diff --git a/decoder/tests/auto-fdo/show_strobing.sh b/decoder/tests/auto-fdo/show_strobing.sh new file mode 100755 index 0000000..d80d84c --- /dev/null +++ b/decoder/tests/auto-fdo/show_strobing.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +for e in /sys/bus/coresight/devices/etm*/; do + echo "Strobing period for $e is $((`cat $e/strobe_period`))" + echo "Strobing window for $e is $((`cat $e/strobe_window`))" +done -- 2.17.1
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.