Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- HOWTO.md | 204 ++++++++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 163 insertions(+), 41 deletions(-)
diff --git a/HOWTO.md b/HOWTO.md index 3f2a1399be76..aee5ef8b8b6e 100644 --- a/HOWTO.md +++ b/HOWTO.md @@ -7,23 +7,16 @@ This HOWTO explains how to use the perf cmd line tools and the openCSD library to collect and extract program flow traces generated by the CoreSight IP blocks on a Linux system. The examples have been generated using an aarch64 Juno-r0 platform. All information is considered accurate and tested -using library branches `opencsd-0v002` and `opencsd-0v003` (decode library only) -and the latest perf branch `perf-opencsd-4.7` (decode library + perf tools) +using library version v0.4.1 and the latest perf branch `perf-opencsd-4.8` on the [OpenCSD github repository][1].
-From v0.4 of the library releases appear as master branch tags. v0.4 requires -a patched version of the `perf-opencsd-4.7` perf tools to use the updated C API. -The unpatched `perf-opencsd-4.7` may be used with v0.4 if the build makefile -for the tools is altered to #define OPENCSD_INC_DEPRECATED_API which will -include the decprecated function call wrappers for the new generic API.
On Target Trace Acquisition - Perf Record ----------------------------------------- - All the enhancement to the Perf tools that support the new `cs_etm` pmu have not been upstreamed yet. To get the required functionality branch -`perf-opencsd-4.7` needs to be downloaded to the target system where -traces are to be collected. This branch is an upstream v4.7 kernel +`perf-opencsd-4.8` needs to be downloaded to the target system where +traces are to be collected. This branch is an upstream v4.8 kernel supplemented with modifications to the CoreSight framework and drivers to be usable by the Perf core. The remaining out of tree patches are being upstreamed incrementally. @@ -50,7 +43,7 @@ the sink that will recieve trace data needs to be identified and given as an option on the perf command line. Once a sink has been identify trace collection can start. An easy and yet interesting example is the `uname` command:
- linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@sink=20070000.etr/ --per-thread uname + linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@20070000.etr/ --per-thread uname
This will generate a `perf.data` file where execution has been traced for both user and kernel space. To narrow the field to either user or kernel space the @@ -58,7 +51,7 @@ user and kernel space. To narrow the field to either user or kernel space the traces to user space:
- linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@sink=20070000.etr/u --per-thread uname + linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@20070000.etr/u --per-thread uname Problems setting modules path maps, continuing anyway... ----------------------------------------------------------- perf_event_attr: @@ -121,10 +114,108 @@ traces to user space: drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 bin drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 lib
+Trace data filtering +-------------------- +The amount of traces generated by CoreSight tracers is staggering, event for +the most simple trace scenario. Reducing trace generation to specific areas +of interest is desirable to save trace buffer space and avoid getting lost in +the trace data that isn't relevant. Supplementing the 'k' and 'u' options +described above is the notion of address filters. + +On CoreSight two types of address filter have been implemented - address range +and start/stop filter: + +**Address range filters:** +With address range filters traces are generated if the instruction pointer +falls within the specified range. Any work done by the CPU outside of that +range will not be traced. Address range filters can be specified for both +user and kernel space session: + + perf record -e cs_etm/@20070000.etr/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname + + perf record -e cs_etm/@20070000.etr/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main + +When dealing with kernel space trace addresses are typically taken in the +'System.map' file. In user space addresses are relocatable and can be +extracted from an objdump output: + + $ aarch64-linux-gnu-objdump -d libcstest.so.1.0 + ... + ... + 000000000000072c <coresight_test1>: <------------ Beginning of traces + 72c: d10083ff sub sp, sp, #0x20 + 730: b9000fe0 str w0, [sp,#12] + 734: b9001fff str wzr, [sp,#28] + 738: 14000007 b 754 <coresight_test1+0x28> + 73c: b9400fe0 ldr w0, [sp,#12] + 740: 11000800 add w0, w0, #0x2 + 744: b9000fe0 str w0, [sp,#12] + 748: b9401fe0 ldr w0, [sp,#28] + 74c: 11000400 add w0, w0, #0x1 + 750: b9001fe0 str w0, [sp,#28] + 754: b9401fe0 ldr w0, [sp,#28] + 758: 7100101f cmp w0, #0x4 + 75c: 54ffff0d b.le 73c <coresight_test1+0x10> + 760: b9400fe0 ldr w0, [sp,#12] + 764: 910083ff add sp, sp, #0x20 + 768: d65f03c0 ret + ... + ... + +Following the address the amount of byte is specified and if tracing in user +space, the full path to the binary (or library) being traced. + +**Start/Stop filters:** +With start/stop filters traces are generated when the instruction pointer is +equal to the start address. Incidentally traces stop being generated when the +insruction pointer is equal to the stop address. Anything that happens between +there to events is traced: + + perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname + + perf record -vvv -e cs_etm/@20070000.etr/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0, \ + stop 0x40082c@/home/linaro/main' \ + --per-thread ./main + +**Limitation on address filters:** +The only limitation on address filters is the amount of address comparator +found on an implementation and the mutual exclusion between range and +start stop filters. As such the following example would _not_ work: + + perf record -e cs_etm/@20070000.etr/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop + filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' \ // address range + --per-thread uname + On Target Trace Collection -------------------------- The entire program flow will have been recorded in the `perf.data` file. -Information about libraries and executable is stored under `$HOME/.debug` . +Information about libraries and executable is stored under `$HOME/.debug`: + + linaro@linaro-nano:~/kernel$ tree ~/.debug + .debug + ├── [kernel.kallsyms] + │ └── 0542921808098d591a7acba5a1163e8991897669 + │ └── kallsyms + ├── [vdso] + │ └── 551fbbe29579eb63be3178a04c16830b8d449769 + │ └── vdso + ├── bin + │ └── uname + │ └── ed95e81f97c4471fb2ccc21e356b780eb0c92676 + │ └── elf + └── lib + └── aarch64-linux-gnu + ├── ld-2.21.so + │ └── 94912dc5a1dc8c7ef2c4e4649d4b1639b6ebc8b7 + │ └── elf + └── libc-2.21.so + └── 169a143e9c40cfd9d09695333e45fd67743cd2d6 + └── elf + + 13 directories, 5 files + linaro@linaro-nano:~/kernel$ + + All this information needs to be collected in order to successfully decode traces off target:
@@ -141,7 +232,7 @@ As of this writing the openCSD library is not part of the perf tools source. It is available on [github][1] and needs to be compiled before perf. Checkout the required branch/tag version into a local directory.
- linaro@t430:~/linaro/coresight$ git clone -b opencsd-0v003 https://github.com/Linaro/OpenCSD.git my-opencsd + linaro@t430:~/linaro/coresight$ git clone -b v0.4.1 https://github.com/Linaro/OpenCSD.git my-opencsd Cloning into 'OpenCSD'... remote: Counting objects: 2063, done. remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063 @@ -166,19 +257,20 @@ the host's (which has nothing to do with the target) architecture: linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls ../../lib/linux64/dbg/ libcstraced.a libcstraced_c_api.a libcstraced_c_api.so libcstraced.so
+ Off Target Perf Tools Compilation --------------------------------- As stated above not all the pieces of the solution have been upstreamed. To -get all the components branch `perf-opencsd-4.7` needs to be +get all the components branch `perf-opencsd-4.8` needs to be obtained:
- linaro@t430:~/linaro/coresight$ git clone -b perf-opencsd-4.7 https://github.com/Linaro/OpenCSD.git perf-opencsd-4.7 + linaro@t430:~/linaro/coresight$ git clone -b perf-opencsd-4.8 https://github.com/Linaro/OpenCSD.git perf-opencsd-4.8 ... ...
- linaro@t430:~/linaro/coresight$ ls perf-opencsd-4.7/ + linaro@t430:~/linaro/coresight$ ls perf-opencsd-4.8/ arch certs CREDITS Documentation firmware include ipc Kconfig lib Makefile net REPORTING-BUGS scripts sound usr - block COPYING crypto drivers fs init Kbuild kernel MAINTAINERS mm README samples security tools virt + block COPYING crypto drivers fs init Kbuild kernel MAINTAINERS mm README samples security tools virt
Since the openCSD library is not part of the perf tools, an environment variable telling the build scripts where to find the library is needed. If @@ -187,12 +279,12 @@ successful, but handling of CoreSight trace data won't be supported.
**See perf-test-scripts below for assistance in creating a build and test enviroment.**
- linaro@t430:~/linaro/coresight$ cd perf-opencsd-4.7 - linaro@t430:~/linaro/coresight/perf-opencsd-4.7$ export CSTRACE_PATH=~/linaro/coresight/my-opencsd/decoder - linaro@t430:~/linaro/coresight/perf-opencsd-4.7$ make -C tools/perf + linaro@t430:~/linaro/coresight$ cd perf-opencsd-4.8 + linaro@t430:~/linaro/coresight/perf-opencsd-4.8$ export CSTRACE_PATH=~/linaro/coresight/my-opencsd/decoder + linaro@t430:~/linaro/coresight/perf-opencsd-4.8$ make -C tools/perf ... ... - linaro@t430:~/linaro/coresight/perf-opencsd-4.7$ ls -l tools/perf/perf + linaro@t430:~/linaro/coresight/perf-opencsd-4.8$ ls -l tools/perf/perf -rwxrwxr-x 1 linaro linaro 6276360 Mar 3 10:05 tools/perf/perf
@@ -205,34 +297,33 @@ At the end of the compilation a new perf binary is available in `tools/perf/`
Trace Decoding with Perf Report ------------------------------- - Before working with custom traces it is suggested to use a trace bundle that is known to be working properly. A sample bundle has been made available here [2]. Trace bundles can be extracted anywhere and have no dependencies on where the perf tools and openCSD library have been compiled.
- linaro@t430:~/linaro/coresight$ mkdir feb24 - linaro@t430:~/linaro/coresight$ cd feb24 - linaro@t430:~/linaro/coresight/feb24$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.feb24.tgz - linaro@t430:~/linaro/coresight/feb24$ md5sum uname.v4.user.feb24.tgz - f53f11d687ce72bdbe9de2e67e960ec6 uname.v4.user.feb24.tgz - linaro@t430:~/linaro/coresight/feb24$ tar xf uname.v4.user.feb24.tgz - linaro@t430:~/linaro/coresight/feb24$ ls -la + linaro@t430:~/linaro/coresight$ mkdir sept20 + linaro@t430:~/linaro/coresight$ cd sept20 + linaro@t430:~/linaro/coresight/sept20$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz + linaro@t430:~/linaro/coresight/sept20$ md5sum uname.v4.user.sept20.tgz + f53f11d687ce72bdbe9de2e67e960ec6 uname.v4.user.sept20.tgz + linaro@t430:~/linaro/coresight/sept20$ tar xf uname.v4.user.sept20.tgz + linaro@t430:~/linaro/coresight/sept20$ ls -la total 1312 drwxrwxr-x 3 linaro linaro 4096 Mar 3 10:26 . drwxrwxr-x 5 linaro linaro 4096 Mar 3 10:13 .. drwxr-xr-x 7 linaro linaro 4096 Feb 24 12:21 .debug -rw------- 1 linaro linaro 78016 Feb 24 12:21 perf.data - -rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.feb24.tgz + -rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.sept20.tgz
Perf is expecting files related to the trace capture (`perf.data`) to be located under `~/.debug` [3]. This example will remove the current `~/.debug` directory to be sure everything is clean.
- linaro@t430:~/linaro/coresight/feb24$ rm -rf ~/.debug - linaro@t430:~/linaro/coresight/feb24$ cp -dpR .debug ~/ - linaro@t430:~/linaro/coresight/feb24$ export LD_LIBRARY_PATH=~/linaro/coresight/my-opencsd/decoder/lib/linux64/dbg/ - linaro@t430:~/linaro/coresight/feb24$ ../perf-opencsd-4.7/tools/perf/perf report --stdio + linaro@t430:~/linaro/coresight/sept20$ rm -rf ~/.debug + linaro@t430:~/linaro/coresight/sept20$ cp -dpR .debug ~/ + linaro@t430:~/linaro/coresight/sept20$ export LD_LIBRARY_PATH=~/linaro/coresight/my-opencsd/decoder/lib/linux64/dbg/ + linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf report --stdio
# To display the perf.data header info, please use --header/--header-only options. # @@ -276,7 +367,7 @@ to be sure everything is clean.
Additional data can be obtained, which contains a dump of the trace packets received using the command
- mjl@ubuntu-vbox:./perf-opencsd-4.7/coresight/tools/perf/perf report --stdio --dump + mjl@ubuntu-vbox:./perf-opencsd-4.8/coresight/tools/perf/perf report --stdio --dump
resulting a large amount of data, trace looking like:-
@@ -325,10 +416,10 @@ Trace Decoding with Perf Script Working with perf scripts needs more command line options but yields interesting results.
- linaro@t430:~/linaro/coresight/feb24$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-4.7/tools/perf/ - linaro@t430:~/linaro/coresight/feb24$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/ - linaro@t430:~/linaro/coresight/feb24$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/ - linaro@t430:~/linaro/coresight/feb24$ ../perf-opencsd-4.7/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump + linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-4.8/tools/perf/ + linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/ + linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/ + linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump
7f89f24d80: 910003e0 mov x0, sp 7f89f24d84: 94000d53 bl 7f89f282d0 <free@plt+0x3790> @@ -354,6 +445,37 @@ interesting results. 7f89f28304: eb01001f cmp x0, x1 7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0>
+Kernel Trace Decoding +--------------------- + +When dealing with kernel space traces the vmlinux file has to be communicated +explicitely to perf using the "--vmlinux" command line option: + + linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf report --stdio --vmlinux=./vmlinux + ... + ... + linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf script --vmlinux=./vmlinux + +When using scripts things get a little more convoluted. Using the same example +an above but for traces but for kernel traces, the command line becomes: + + linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-4.8/tools/perf/ + linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/ + linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/ + linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-4.8/tools/perf/perf --exec-path=${EXEC_PATH} script \ + --vmlinux=./vmlinux \ + --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- \ + -d ${XTOOLS_PATH}/aarch64-linux-gnu-objdump \ + -k ./vmlinux + ... + ... + +The option "--vmlinux=./vmlinux" is interpreted by the "perf script" command +the same way it if for "perf report". The option "-k ./vmlinux" is dependant +on the script being executed and has no related to the "--vmlinux", though it +is highly advised to keep them synchronized. + + Perf Test Environment Scripts -----------------------------
@@ -415,6 +537,6 @@ Best regards, -------------------------------------- [1]: https://github.com/Linaro/OpenCSD "OpenCSD Github"
-[2]: wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.feb24.tgz +[2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
[3]: Get in touch with us if you know a way to change this.