Hi,
We are trying to us the Open CSD for decoding a onchip trace in our ETB.
The trace was enabled and is captured in the ETB.
We read the trace back and dumped it into a text file.
I am attaching it.
How can I use the CSD tool to decode it?
Thanks
Ajith
Hello Mathieu,
Thank for the config file. It works. I was able to build the OpenCSD kernel (form the perf-opencsd-master branch) and install on the USB (I used the ArchLinuxARM-aarch64-latest.tar.gz). I also built the perf tool (make -C tools/perf). Everything is booting but the perf has some issues:
[root@alarm home]# ./perf record -vvv -e cs_etm/(a)20070000.etr/u --per-thread uname
map_groups__set_modules_path_dir: cannot open /lib/modules/4.13.0-rc1-ge565ad6 dir
Problems setting modules path maps, continuing anyway...
------------------------------------------------------------
perf_event_attr:
type 8
size 112
{ sample_period, sample_freq } 1
sample_type IP|TID|IDENTIFIER
read_format ID
disabled 1
exclude_kernel 1
exclude_hv 1
enable_on_exec 1
sample_id_all 1
------------------------------------------------------------
sys_perf_event_open: pid 2242 cpu -1 group_fd -1 flags 0x8 = 4
------------------------------------------------------------
perf_event_attr:
type 1
size 112
config 0x9
{ sample_period, sample_freq } 1
sample_type IP|TID|IDENTIFIER
read_format ID
disabled 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 2242 cpu -1 group_fd -1 flags 0x8 = 5
mmap size 528384B
AUX area mmap length 4194304
perf event ring buffer mmapped per thread
failed to mmap AUX area
failed to mmap with 12 (Cannot allocate memory)
I fixed the "map_groups__set_modules_path_dir: cannot open /lib/modules/4.13.0-rc1-ge565ad6 dir" issue by adding appropriate symbolic link but I still have an issue with the mmap. Any idea what can be wrong here (below limits that I have on my Juno)?
[root@alarm ~]# ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31798
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31798
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@alarm ~]#
Regards
Marek
W dniu 2017-08-18 16:54:42 użytkownik Mathieu Poirier <mathieu.poirier(a)linaro.org> napisał:
> On 18 August 2017 at 04:22, marekzmyslowski
> <marekzmyslowski(a)poczta.onet.pl> wrote:
> > Hello Mathieu,
> >
> > I've decided that currently I don't need Android. The Linux is enough.
>
> That is probably a better place to start.
>
> > However I have another issue. I've downloaded the perf-opencsd-master branch. I run the config with the ARCH=arm64 and CROSS_COMPLIE=aarch64-linux-gnu- and added support for Versatile board. Then I compiled kernel - everything was OK. Next I built the USB using the following instruction:
> > https://archlinuxarm.org/platforms/armv8/arm/juno (it works fine. The linux boot on the Juno).
> > Next I copied the Image file and juno.dtb into the USB but it doesn't boot. It hangs here:
> >
> > initrd: address 0x0
> > initrd: length 0x0
> > PEI 1132 ms
> > DXE 1695 ms
> > BDS 368934875444 ms
> > BDS 368934873448 ms
> > BDS 1535 ms
> > Total Time = 368934871781 ms
> >
> > linux: address 0x80080000
> > linux: length 0x1150200
> > fdt: address 0x9FE00000
> > fdt: length 0x5F54
> >
> > Any idea what I'm doing wrong? Any help will be appreciated (I'm so close to have Juno + CoreSight + perf :) )
>
> I can't help you with booting the board itself. The best I can do is
> advise to use u-boot instead of UEFI and give you my kernel .config
> file (attached). For the rest there is plenty of documentation out
> there.
>
> >
> > Regards
> > Marek
> >
> > W dniu 2017-08-16 23:08:04 użytkownik Mathieu Poirier <mathieu.poirier(a)linaro.org> napisał:
> >> Hello Marek,
> >>
> >> Please CC the CoreSight mailing list when asking questions as someone
> >> else may also be able to answer.
> >>
> >> First and foremost I advise using the official CoreSight kernel found
> >> on the openCSD site [1] rather than my personal branch [2] - you
> >> never know what you'll get with the latter.
> >>
> >> That being said the CoreSight kernel on the openCSD site is not an
> >> Android kernel - it is simply a mainline kernel supplemented with
> >> patches that haven't made their way to mainline yet. You will have to
> >> either add the android patches to the CoreSight kernel or the other
> >> way around (CoreSight patches on android kernel).
> >>
> >> Android user space is also different and does not include the
> >> perf-tools. You will have to add them manually along with the
> >> dependencies they require. I haven't gone through that process and as
> >> such can't advise more on that portion.
> >>
> >> Get back to me with your questions if the above isn't sufficient.
> >>
> >> Best regards,
> >> Mathieu
> >>
> >> [1]. https://github.com/Linaro/OpenCSD/tree/perf-opencsd-master
> >> [2]. https://git.linaro.org/people/mathieu.poirier/coresight.git/
> >>
> >> On 16 August 2017 at 14:32, marekzmyslowski
> >> <marekzmyslowski(a)poczta.onet.pl> wrote:
> >> > Hello Mathieu,
> >> >
> >> > I'm sorry for bothering but I think you may be person that can help me. I'm trying to install and run Android on Juno Board r0. I tested Android 17.05 from Linaro and it works. Now I'm trying to have a perf using Coresight but I'm little confused. Do I need to build Android from Linaro and the kernel from here https://git.linaro.org/people/mathieu.poirier/coresight.git/ or here https://github.com/Linaro/OpenCSD/tree/perf-opencsd-4.12.
> >> > Any help with this will be appreciated :)
> >> >
> >> > Regards
> >> > Marek Zmysłowski
> >>
> >
> >
> >
>
* Arnaldo Carvalho de Melo <acme(a)kernel.org> wrote:
> Hi Ingo,
>
> Please consider pulling, this is on top of tip/perf/urgent.
>
> - Arnaldo
>
> Test results at the end of this message, as usual.
>
> The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:
>
> kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216
>
> for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:
>
> perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)
>
> ----------------------------------------------------------------
> perf/core improvements and fixes:
>
> - Fix wrong jump arrow in systems with branch records with cycles,
> i.e. Intel's >= Skylake (Jin Yao)
>
> - Fix 'perf record --per-thread' problem introduced when
> implementing 'perf stat --per-thread (Jin Yao)
>
> - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
> that was using strcmp(symbol names) while the dso routines
> doing symbol lookups used the arch overridable one, making
> this test fail in architectures that overrided that function
> with something other than strcmp() (Jiri Olsa)
>
> - Add 'perf script --show-round-event' to display
> PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
>
> - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
>
> - Use ordered_events for 'perf report --tasks', otherwise we may get
> artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
> (when they got recorded in different CPUs) (Jiri Olsa)
>
> - Add support to display group output for non group events, i.e.
> now when one uses 'perf report --group' on a perf.data file
> recorded without explicitly grouping events with {} (e.g.
> "perf record -e '{cycles,instructions}'" get the same output
> that would produce, i.e. see all those non-grouped events in
> multiple columns, at the same time (Jiri Olsa)
>
> - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
>
> - Kernel maps fixes wrt perf.data(report) versus live system (top)
> (Jiri Olsa)
>
> - Fix memory corruption when using 'perf record -j call -g -a <application>'
> followed by 'perf report --branch-history' (Jiri Olsa)
>
> - ARM CoreSight fixes (Mathieu Poirier)
>
> - Add inject capability for CoreSight Traces (Robert Waker)
>
> - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
>
> - Man pages fixes (Sangwon Hong, Jaecheol Shin)
>
> - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
> changed with a glibc update) (Thomas Richter)
>
> - Add detailed CPUID info in the 'perf.data' headers for s/390 to
> then use it in 'perf annotate' (Thomas Richter)
>
> - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
> 'perf stat -I 1000 --interval-count 2' will show stats every
> 1000ms, two times (yuzhoujian)
>
> - Add 'perf stat --timeout Nms', that will run for that many
> milliseconds and then stop, printing the counters (yuzhoujian)
>
> - Fix description for 'perf report --mem-modex (Andi Kleen)
>
> - Use a wildcard to remove the vfs_getname probe in the
> 'perf test' shell based test cases (Arnaldo Carvalho de Melo)
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
>
> ----------------------------------------------------------------
> Andi Kleen (1):
> perf report: Fix description for --mem-mode
>
> Arnaldo Carvalho de Melo (1):
> perf tests shell lib: Use a wildcard to remove the vfs_getname probe
>
> Jaecheol Shin (1):
> perf annotate: Add missing arguments in Man page
>
> Jin Yao (2):
> perf tools: Use target->per_thread and target->system_wide flags
> perf report: Fix wrong jump arrow
>
> Jiri Olsa (18):
> perf record: Put new line after target override warning
> perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
> tools lib api fs: Add filename__read_xll function
> tools lib api fs: Add sysfs__read_xll function
> perf tests: Fix dwarf unwind for stripped binaries
> perf tools: Fix comment for sort__* compare functions
> perf report: Ask for ordered events for --tasks option
> perf report: Add support to display group output for non group events
> tools lib symbol: Skip non-address kallsyms line
> perf symbols: Check if we read regular file in dso__load()
> perf machine: Free root_dir in machine__init() error path
> perf machine: Move kernel mmap name into struct machine
> perf machine: Generalize machine__set_kernel_mmap()
> perf machine: Don't search for active kernel start in __machine__create_kernel_maps
> perf machine: Remove machine__load_kallsyms()
> perf tools: Do not create kernel maps in sample__resolve()
> perf tests: Use arch__compare_symbol_names to compare symbols
> perf report: Fix memory corruption in --branch-history mode --branch-history
>
> Mathieu Poirier (3):
> perf cs-etm: Freeing allocated memory
> perf auxtrace arm: Fixing uninitialised variable
> perf cs-etm: Properly deal with cpu maps
>
> Ravi Bangoria (3):
> tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
> perf powerpc: Generate system call table from asm/unistd.h
> perf trace powerpc: Use generated syscall table
>
> Robert Walker (3):
> perf cs-etm: Inject capabilitity for CoreSight traces
> perf inject: Emit instruction records on ETM trace discontinuity
> coresight: Update documentation for perf usage
>
> Sangwon Hong (2):
> perf kmem: Document a missing option & an argument
> perf mem: Document a missing option
>
> Thomas Richter (5):
> perf record: Provide detailed information on s390 CPU
> perf annotate: Scan cpuid for s390 and save machine type
> perf cpuid: Introduce a platform specific cpuid compare function
> perf test: Fix test case 23 for s390 z/VM or KVM guests
> perf test: Fix test case inet_pton to accept inlines.
>
> yuzhoujian (2):
> perf stat: Add support to print counts for fixed times
> perf stat: Add support to print counts after a period of time
>
> Documentation/trace/coresight.txt | 51 +++
> tools/arch/powerpc/include/uapi/asm/unistd.h | 402 +++++++++++++++++
> tools/lib/api/fs/fs.c | 44 +-
> tools/lib/api/fs/fs.h | 2 +
> tools/lib/symbol/kallsyms.c | 4 +
> tools/perf/Documentation/perf-annotate.txt | 6 +-
> tools/perf/Documentation/perf-kmem.txt | 6 +-
> tools/perf/Documentation/perf-mem.txt | 4 +
> tools/perf/Documentation/perf-report.txt | 5 +-
> tools/perf/Documentation/perf-script.txt | 3 +
> tools/perf/Documentation/perf-stat.txt | 10 +
> tools/perf/Makefile.config | 2 +
> tools/perf/arch/arm/util/auxtrace.c | 2 +-
> tools/perf/arch/arm/util/cs-etm.c | 51 ++-
> tools/perf/arch/powerpc/Makefile | 25 ++
> .../perf/arch/powerpc/entry/syscalls/mksyscalltbl | 37 ++
> tools/perf/arch/s390/annotate/instructions.c | 27 +-
> tools/perf/arch/s390/util/header.c | 148 ++++++-
> tools/perf/builtin-record.c | 2 +-
> tools/perf/builtin-report.c | 7 +-
> tools/perf/builtin-script.c | 17 +
> tools/perf/builtin-stat.c | 53 ++-
> tools/perf/check-headers.sh | 1 +
> tools/perf/tests/code-reading.c | 33 +-
> tools/perf/tests/dwarf-unwind.c | 46 +-
> tools/perf/tests/shell/lib/probe_vfs_getname.sh | 2 +-
> .../perf/tests/shell/trace+probe_libc_inet_pton.sh | 6 +-
> tools/perf/tests/vmlinux-kallsyms.c | 4 +-
> tools/perf/ui/browsers/annotate.c | 9 +-
> tools/perf/util/build-id.c | 10 +-
> tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
> tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
> tools/perf/util/cs-etm.c | 478 ++++++++++++++++++---
> tools/perf/util/event.c | 16 +-
> tools/perf/util/evlist.c | 21 +-
> tools/perf/util/header.h | 1 +
> tools/perf/util/hist.c | 4 +-
> tools/perf/util/hist.h | 1 -
> tools/perf/util/machine.c | 145 +++----
> tools/perf/util/machine.h | 6 +-
> tools/perf/util/pmu.c | 47 +-
> tools/perf/util/sort.c | 7 +-
> tools/perf/util/stat.h | 2 +
> tools/perf/util/symbol.c | 13 +-
> tools/perf/util/syscalltbl.c | 8 +
> tools/perf/util/thread_map.c | 4 +-
> tools/perf/util/thread_map.h | 2 +-
> 47 files changed, 1577 insertions(+), 273 deletions(-)
> create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
> create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
Pulled, thanks a lot Arnaldo!
Ingo
From: Robert Walker <robert.walker(a)arm.com>
Add notes on using perf to collect and analyze CoreSight trace
Signed-off-by: Robert Walker <robert.walker(a)arm.com>
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: coresight(a)lists.linaro.org
Cc: linux-arm-kernel(a)lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker@arm…
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
---
Documentation/trace/coresight.txt | 51 +++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88cd5d1d..6f0120c3a4f1 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -330,3 +330,54 @@ Details on how to use the generic STM API can be found here [2].
[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
[2]. Documentation/trace/stm.txt
+
+
+Using perf tools
+----------------
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g:
+
+ perf record -e cs_etm/(a)20070000.etr/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+ perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO. It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5. The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+
+ $ gcc-5 -O3 sort.c -o sort
+ $ taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 5910 ms
+
+ $ perf record -e cs_etm/(a)20070000.etr/u --per-thread taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 12543 ms
+ [ perf record: Woken up 35 times to write data ]
+ [ perf record: Captured and wrote 69.640 MB perf.data ]
+
+ $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+ $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+ $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+ $ taskset -c 2 ./sort_autofdo
+ Bubble sorting array of 30000 elements
+ 5806 ms
--
2.14.3
Hi,
These patches add support for using perf inject to generate branch events,
instruction events and branch stacks from CoreSight ETM traces.
They apply to linus's tree with the memory cleanup fix from
https://lkml.org/lkml/2018/1/25/432
Regards
Rob Walker
Robert Walker (2):
perf tools: inject capabilitity for CoreSight traces
perf inject: Emit instruction records on ETM trace discontinuity
Documentation/trace/coresight.txt | 51 +++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
tools/perf/util/cs-etm.c | 471 +++++++++++++++++++++---
4 files changed, 532 insertions(+), 66 deletions(-)
--
2.7.4
Hi,
These patches add support for using perf inject to generate branch events,
instruction events and branch stacks from CoreSight ETM traces.
They apply to linus's tree with the memory cleanup fix from
https://lkml.org/lkml/2018/1/25/432
Changes since v1:
* Split documentation update into separate patch
* Added null pointer check
* Moved some changes from patch 2 to patch 1
Regards
Rob Walker
Robert Walker (3):
perf tools: inject capabilitity for CoreSight traces
perf inject: Emit instruction records on ETM trace discontinuity
coresight: Update documentation for perf usage
Documentation/trace/coresight.txt | 51 +++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
tools/perf/util/cs-etm.c | 472 +++++++++++++++++++++---
4 files changed, 533 insertions(+), 66 deletions(-)
--
2.7.4
On Thu, 8 Feb 2018 15:17:33 +0000
"Christian Hansen (chansen3)" <chansen3(a)cisco.com> wrote:
> Is is possible to capture the address of memory accesses using perf on
> ARM? Initially, I thought perf-mem would do the trick, but apparently
> its functionality is entirely dependent on Intel CPUs with PEBS.
Right.
> Then I noticed that perf-record takes a -d flag (used by perf-mem).
> Although the description of that flag is vague (capture what addresses?
> ), when used as such "perf record -e armv8_cortex_a72/mem_access/u -d
> -p 16963 sleep 5” and then dumping the trace via “perf report —
> mem-mode” I get 0s in the data symbol column. So this also appears to
> have no effect on my hardware. As the command used reveals, I’m using
> perf on a Cortex A72 and on Linux 4.4.
I see ./perf report --help says:
--mem-mode
Use the data addresses of samples in addition to instruction addresses to build the histograms. To generate meaningful output,
the perf.data file must have been obtained using perf record -d -W and using a special event -e cpu/mem-loads/ or -e
cpu/mem-stores/. See perf mem for simpler access.
yet perf record's -W switch isn't on record's manpage, and trying the
invocation sequence on x86 using a perf built from today's acme's
perf/urgent branch:
$ ./perf version
perf version 4.13.rc5.g59410f5
$ ./perf record -e cpu/mem-loads/u -d -W -p 3722 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 8.978 MB perf.data ]
$ ./perf report --mem-mode --stdio
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
$
A 'perf mem record sleep 1; perf mem report' sequence produces samples
in its output, but 'mem record' doesn't take a -p switch for the PID,
rather, -p means --phys-data, "Record/Report sample physical
addresses", which also doesn't seem to work:
$ perf mem -p record sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu/mem-loads,ldlat=30/P).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
Nevertheless, on Arm, the armv8_cortex_a72/mem_access/ is a counting
PMU, so it doesn't record the address of the memory access, just
where in the code the access came from.
> I’m aware that for ARM there’s a Statistical Profiling Extension for
> which support went into the kernel recently and which could potentially
> support this information, but that requires ARMv8.2. There’s an
Ack.
> Embedded Trace Macrocell on my CPU and perf support is also in the
> kernel, but my understanding is that capturing a data trace is not
> available for A profile CPUs, which is what I have.
No, Cortex-As should be supported by the Coresight driver no problem.
Try acme's perf/core tree, where support for linking with decode
> Am I overlooking some software support for this in perf or am I simply asking the impossible?
You're on the right track: Coresight trace h/w is able to record memory
accesses, but I don't know its enablement status, so I'm adding the
coresight mailing list to cc in case anyone there can chime in and help.
Thanks,
Kim
Hi,
These patches add support for using perf inject to generate branch events,
instruction events and branch stacks from CoreSight ETM traces.
They apply to linus's tree with the memory cleanup fix from
https://lkml.org/lkml/2018/1/25/432
Regards
Rob Walker
Robert Walker (2):
perf tools: inject capabilitity for CoreSight traces
perf inject: Emit instruction records on ETM trace discontinuity
Documentation/trace/coresight.txt | 51 +++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
tools/perf/util/cs-etm.c | 471 +++++++++++++++++++++---
4 files changed, 532 insertions(+), 66 deletions(-)
--
2.7.4