From: Robert Walker <robert.walker(a)arm.com>
Add notes on using perf to collect and analyze CoreSight trace
Signed-off-by: Robert Walker <robert.walker(a)arm.com>
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: coresight(a)lists.linaro.org
Cc: linux-arm-kernel(a)lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker@arm…
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
---
Documentation/trace/coresight.txt | 51 +++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88cd5d1d..6f0120c3a4f1 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -330,3 +330,54 @@ Details on how to use the generic STM API can be found here [2].
[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
[2]. Documentation/trace/stm.txt
+
+
+Using perf tools
+----------------
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g:
+
+ perf record -e cs_etm/(a)20070000.etr/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+ perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO. It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5. The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+
+ $ gcc-5 -O3 sort.c -o sort
+ $ taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 5910 ms
+
+ $ perf record -e cs_etm/(a)20070000.etr/u --per-thread taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 12543 ms
+ [ perf record: Woken up 35 times to write data ]
+ [ perf record: Captured and wrote 69.640 MB perf.data ]
+
+ $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+ $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+ $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+ $ taskset -c 2 ./sort_autofdo
+ Bubble sorting array of 30000 elements
+ 5806 ms
--
2.14.3
Hi,
These patches add support for using perf inject to generate branch events,
instruction events and branch stacks from CoreSight ETM traces.
They apply to linus's tree with the memory cleanup fix from
https://lkml.org/lkml/2018/1/25/432
Regards
Rob Walker
Robert Walker (2):
perf tools: inject capabilitity for CoreSight traces
perf inject: Emit instruction records on ETM trace discontinuity
Documentation/trace/coresight.txt | 51 +++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
tools/perf/util/cs-etm.c | 471 +++++++++++++++++++++---
4 files changed, 532 insertions(+), 66 deletions(-)
--
2.7.4
Hi,
These patches add support for using perf inject to generate branch events,
instruction events and branch stacks from CoreSight ETM traces.
They apply to linus's tree with the memory cleanup fix from
https://lkml.org/lkml/2018/1/25/432
Changes since v1:
* Split documentation update into separate patch
* Added null pointer check
* Moved some changes from patch 2 to patch 1
Regards
Rob Walker
Robert Walker (3):
perf tools: inject capabilitity for CoreSight traces
perf inject: Emit instruction records on ETM trace discontinuity
coresight: Update documentation for perf usage
Documentation/trace/coresight.txt | 51 +++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
tools/perf/util/cs-etm.c | 472 +++++++++++++++++++++---
4 files changed, 533 insertions(+), 66 deletions(-)
--
2.7.4
On Thu, 8 Feb 2018 15:17:33 +0000
"Christian Hansen (chansen3)" <chansen3(a)cisco.com> wrote:
> Is is possible to capture the address of memory accesses using perf on
> ARM? Initially, I thought perf-mem would do the trick, but apparently
> its functionality is entirely dependent on Intel CPUs with PEBS.
Right.
> Then I noticed that perf-record takes a -d flag (used by perf-mem).
> Although the description of that flag is vague (capture what addresses?
> ), when used as such "perf record -e armv8_cortex_a72/mem_access/u -d
> -p 16963 sleep 5” and then dumping the trace via “perf report —
> mem-mode” I get 0s in the data symbol column. So this also appears to
> have no effect on my hardware. As the command used reveals, I’m using
> perf on a Cortex A72 and on Linux 4.4.
I see ./perf report --help says:
--mem-mode
Use the data addresses of samples in addition to instruction addresses to build the histograms. To generate meaningful output,
the perf.data file must have been obtained using perf record -d -W and using a special event -e cpu/mem-loads/ or -e
cpu/mem-stores/. See perf mem for simpler access.
yet perf record's -W switch isn't on record's manpage, and trying the
invocation sequence on x86 using a perf built from today's acme's
perf/urgent branch:
$ ./perf version
perf version 4.13.rc5.g59410f5
$ ./perf record -e cpu/mem-loads/u -d -W -p 3722 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 8.978 MB perf.data ]
$ ./perf report --mem-mode --stdio
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
$
A 'perf mem record sleep 1; perf mem report' sequence produces samples
in its output, but 'mem record' doesn't take a -p switch for the PID,
rather, -p means --phys-data, "Record/Report sample physical
addresses", which also doesn't seem to work:
$ perf mem -p record sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu/mem-loads,ldlat=30/P).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
Nevertheless, on Arm, the armv8_cortex_a72/mem_access/ is a counting
PMU, so it doesn't record the address of the memory access, just
where in the code the access came from.
> I’m aware that for ARM there’s a Statistical Profiling Extension for
> which support went into the kernel recently and which could potentially
> support this information, but that requires ARMv8.2. There’s an
Ack.
> Embedded Trace Macrocell on my CPU and perf support is also in the
> kernel, but my understanding is that capturing a data trace is not
> available for A profile CPUs, which is what I have.
No, Cortex-As should be supported by the Coresight driver no problem.
Try acme's perf/core tree, where support for linking with decode
> Am I overlooking some software support for this in perf or am I simply asking the impossible?
You're on the right track: Coresight trace h/w is able to record memory
accesses, but I don't know its enablement status, so I'm adding the
coresight mailing list to cc in case anyone there can chime in and help.
Thanks,
Kim
Hi,
These patches add support for using perf inject to generate branch events,
instruction events and branch stacks from CoreSight ETM traces.
They apply to linus's tree with the memory cleanup fix from
https://lkml.org/lkml/2018/1/25/432
Regards
Rob Walker
Robert Walker (2):
perf tools: inject capabilitity for CoreSight traces
perf inject: Emit instruction records on ETM trace discontinuity
Documentation/trace/coresight.txt | 51 +++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
tools/perf/util/cs-etm.c | 471 +++++++++++++++++++++---
4 files changed, 532 insertions(+), 66 deletions(-)
--
2.7.4
Hi,
These patches add support for using perf inject to generate branch events
and branch stacks from CoreSight ETM traces.
They apply to the recently submitted perf support for CoreSight trace [1]
with the subsequent memory cleanup fix [2]
The first patch is Sebastian's original commits from [3] reworked to
apply to the refactored version now upstreamed, with some fixes for branch
events and my work on branch stacks posted last November [4], updated with
review comments.
The second patch is a new patch that handles discontinuities in the trace
stream, e.g. when the ETM is configured to only trace certain regions or
is only active some of the time.
These probably need to be squashed together before going upstream, but I've
left them as separate commits for initial review.
Regards
Rob Walker
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
tag perf-core-for-mingo-4.16-20180125
[2]: https://lkml.org/lkml/2018/1/25/432
[3]: https://github.com/Linaro/perf-opencsd/ autoFDO branch
[4]: https://lists.linaro.org/pipermail/coresight/2017-November/000955.html
Robert Walker (2):
perf tools: inject capabilitity for CoreSight traces
perf inject: Emit instruction records on ETM trace discontinuity
Documentation/trace/coresight.txt | 31 ++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 68 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
tools/perf/util/cs-etm.c | 471 +++++++++++++++++++++---
4 files changed, 509 insertions(+), 63 deletions(-)
--
1.9.1
Since libopencsd is not part of the 'mainstream' libraries upstream
maintainers have decided not to display its status on the system by
default. To do so the VF=1 option is required, something this patch
documents.
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
HOWTO.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/HOWTO.md b/HOWTO.md
index 16534f858a19..b13408720d43 100644
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -306,7 +306,7 @@ and needs to be installed on a system prior to compilation. Information about
the status of the openCSD library on a system is given at compile time by the
perf tools build script:
- linaro@t430:~/linaro/linux-kernel$ make -C tools/perf
+ linaro@t430:~/linaro/linux-kernel$ make VF=1 -C tools/perf
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
--
2.7.4
This patch set is to explore Coresight trace data for postmortem
debugging. When kernel panic happens, the Coresight panic kdump can
help save on-chip trace data and tracer metadata into DRAM, later
relies on kdump and crash/perf for "offline" analysis.
The documentation is important to understand the purpose of Coresight
panic kdump, the implementation and usage. Patches 0001/0002 are used
to relocate and add related documenation.
Patch 0003 introduces the simple panic kdump framework which can be
easily used by Coresight devices.
Patches 0004/0005 support panic kdump for ETB; Patch 0006 supports
the dump for ETMv4. As Mathieu suggested, patch 0006 distinguish two
different tracer enabling mode: sysFS interface and perf mode.
This patch set have been verified on 96boards Hikey with tracer
enabling by sysFS interface.
Changes from v2:
* Add the two patches for documentation.
* Following Mathieu suggestion, reworked the panic kdump framework,
removed the useless flag "PRE_PANIC".
* According to comment, changed to add and delete kdump node operations
in sink enable/disable functions;
* According to Mathieu suggestion, handle kdump node
addition/deletion/updating separately for sysFS interface and perf
method.
Changes from v1:
* Add support to dump ETMv4 meta data.
* Wrote 'crash' extension csdump.so so rely on it to generate 'perf'
format compatible file.
* Refactored panic dump driver to support pre & post panic dump.
Changes from RFC:
* Follow Mathieu's suggestion, use general framework to support dump
functionality.
* Changed to use perf to analyse trace data.
Leo Yan (6):
doc: Add Coresight documentation directory
doc: Add documentation for Coresight panic kdump
coresight: Support panic kdump functionality
coresight: tmc: Hook callback for panic kdump
coresight: Add and delete sink callback for panic kdump list
coresight: etm4x: Support panic kdump
Documentation/trace/coresight-cpu-debug.txt | 187 ------------
Documentation/trace/coresight.txt | 332 ---------------------
.../trace/coresight/coresight-cpu-debug.txt | 187 ++++++++++++
.../trace/coresight/coresight-panic-kdump.txt | 91 ++++++
Documentation/trace/coresight/coresight.txt | 332 +++++++++++++++++++++
MAINTAINERS | 5 +-
drivers/hwtracing/coresight/Kconfig | 9 +
drivers/hwtracing/coresight/Makefile | 1 +
drivers/hwtracing/coresight/coresight-etm-perf.c | 12 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 23 ++
drivers/hwtracing/coresight/coresight-etm4x.h | 15 +
.../hwtracing/coresight/coresight-panic-kdump.c | 154 ++++++++++
drivers/hwtracing/coresight/coresight-priv.h | 13 +
drivers/hwtracing/coresight/coresight-tmc-etf.c | 29 ++
drivers/hwtracing/coresight/coresight.c | 12 +
include/linux/coresight.h | 7 +
16 files changed, 887 insertions(+), 522 deletions(-)
delete mode 100644 Documentation/trace/coresight-cpu-debug.txt
delete mode 100644 Documentation/trace/coresight.txt
create mode 100644 Documentation/trace/coresight/coresight-cpu-debug.txt
create mode 100644 Documentation/trace/coresight/coresight-panic-kdump.txt
create mode 100644 Documentation/trace/coresight/coresight.txt
create mode 100644 drivers/hwtracing/coresight/coresight-panic-kdump.c
--
2.7.4