Some hardware will ignore bit TRCPDCR.PU which is used to signal
to hardware that power should not be removed from the trace unit.
Let's mitigate against this by conditionally saving and restoring
the trace unit state when the CPU enters low power states.
This patchset introduces a firmware property named
'arm,coresight-needs-save-restore' - when this is present the
hardware state will be conditionally saved and restored.
A module parameter 'pm_save_enable' is also introduced which can
be configured to override the firmware property.
The hardware state is only ever saved and restored when the claim
tags indicate that coresight is in use.
Changes since v2:
- Move the PM notifier block from drvdata to file static
- Add section names to document references
- Add additional information to commit messages
- Remove trcdvcvr and trcdvcmr from saved state and add a comment to
describe why
- Ensure TRCPDCR_PU is set after restore and add a comment to explain
why we bother toggling TRCPDCR_PU on save/restore
- Reword the pm_save_enable options and add comments
- Miscellaneous style changes
- Move device tree binding documentation to its own patch
Changes since v1:
- Rebased onto coresight/next
- Correcly pass bit number rather than BIT macro to coresight_timeout
- Abort saving state if a timeout occurs
- Fix completely broken pm_notify handling and unregister handler on error
- Use state_needs_restore to ensure state is restored only once
- Add module parameter description to existing boot_enable parameter
and use module_param instead of module_param_named
- Add firmware bindings for coresight-needs-save-restore
- Rename 'disable_pm_save' to 'pm_save_enable' which allows for
disabled, enabled or firmware
- Update comment on etm4_os_lock, it incorrectly indicated that
the code unlocks the trace registers
- Add comments to explain use of OS lock during save/restore
- Fix incorrect error description whilst waiting for PM stable
- Add WARN_ON_ONCE when cpu isn't as expected during save/restore
- Various updates to commit messages
Andrew Murray (6):
coresight: etm4x: remove superfluous setting of os_unlock
coresight: etm4x: use explicit barriers on enable/disable
coresight: etm4x: use module_param instead of module_param_named
coresight: etm4x: improve clarity of etm4_os_unlock comment
coresight: etm4x: save/restore state across CPU low power states
dt-bindings: arm: coresight: Add support for
coresight-needs-save-restore
.../devicetree/bindings/arm/coresight.txt | 3 +
drivers/hwtracing/coresight/coresight-etm4x.c | 348 +++++++++++++++++-
drivers/hwtracing/coresight/coresight-etm4x.h | 62 ++++
drivers/hwtracing/coresight/coresight.c | 2 +-
include/linux/coresight.h | 8 +
5 files changed, 415 insertions(+), 8 deletions(-)
--
2.21.0
Dear Leo Yan,
I will like to ask if there are any better documentations that detail the
memory mapped addresses for Coresight registers in Hikey960.
I am currently referring to Chapter 2-9-1 in
http://mirror.lemaker.org/HiKey960_SoC_Reference_Manual.pdf. This is all
the information about coresight addresses that I could retrieve:
0xEC000000 0xED7FFFFF 24M CSSYS_APB
I will like to check if there are any datasheet for hikey960 that is as
detailed as that for Juno device (link here:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0515b/CHDIFA…
).
This is important to me as I am trying to acquire the device snapshot by
using CSAL API. I am not sure if there are other ways to do this since the
documentation hasn't been exactly straightforward.
The reason for wanting the snapshot is to possibly run openCSD and decode
my trace binary files.
I have followed the CSAL build documentations and ran `csls` on my
hikey960. The following output is seen but it crashed my device soon after:
```bash
hikey960:/ # csls
** CSLS: listing CoreSight config...
** CSLS: Using default ROM address 0xEC000000
00EC031000: 2.1 908 00000037 00/0F type= 4 - LINK [FUNNEL: 7 in ports]
00EC032000: 1.1 912 000000A0 00/0F type= 8 - SINK PORT [TPIU]
00EC033000: 1.2 961 00001B40 00/0F type= 7 - SINK BUFFER(ETR r/w size: 4K)
[TMC: ETR configuration]
00EC034000: 4.1 906 00040800 00/0F type=10 - CTI
00EC035000: 3.6 963 00010000 00/0F type= 3 - SOURCE SWSTIM(65536) [STM ext
ports only, 64-bit, 128 masters]
00EC036000: 2.3 961 00000480 00/0F type= 6 - LINK SINK BUFFER(4K) [TMC: ETF
configuration]
00EC037000: type=13 O TIMESTAMP
00EC038000: type=13 O TIMESTAMP
```
I will also like to understand why it might have crashed my device during
the execution of csls.
Please assist.
Yours Sincerely,
Jeremy
This series is a collection of fixes and cleanups I gathered from trying
to get coresight up on a new platform.
The TMC-ETR reports MemErr in the status register if there was an error
in in the AXI transaction. So far we have ignored it and assumed that we
are running on perfect platforms. Let us add the support for handling
the MemErr reports and discard the buffer in such case. Also verify that
the ETR can do non-secure transactions on the platform at probe time,
in order to avoid presenting the user with a non-useable ETR.
Suzuki K Poulose (5):
coresight: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
coresight: funnel: Convert pr_warn to dev_warn for obsolete bindings
coresight: etr_buf: Consolidate refcount initialization
coresight: tmc-etr: Handle memory errors
coresight: tmc-etr: Check if non-secure access is enabled
.../hwtracing/coresight/coresight-etm-perf.c | 1 +
.../hwtracing/coresight/coresight-funnel.c | 2 +-
.../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++--------
drivers/hwtracing/coresight/coresight-tmc.c | 12 +++++++++
drivers/hwtracing/coresight/coresight-tmc.h | 4 +++
5 files changed, 33 insertions(+), 12 deletions(-)
--
2.21.0
Good afternoon,
GDB has a valuable feature consisting of process record and replay. In
fact, GDB can record a log of process execution and save it. This record
can be loaded later on, and used for debugging. This is called offline
debugging. it offers the advantage that you can catch the issue once, and
replay it as much as needed to find the root cause and fix it. this is
extremely valuable for non reproduce-able or hard to reproduce bugs. you
can replay the record either forwards or backwards, which is very convivial
for observing and analyzing the software.
To realize this functionality, GDB is in fact executing the software, one
assembly instruction after another and recording relevant registers and
memory locations. This is a slow operation that can drastically change the
timing of process execution, and thus change the conditions that raise the
bug. To overcome this limitation, GDB can use available SoC IPs to
accelerate the operation. As per today, GDB has support for "Intel
Processor Trace" and " Branch Trace Store" IP on Intel processors.
ARM based SoCs have also IPs that can be used to assist process record,
namely CoreSight trace sources (ETM, PTM ..), trace links ( Funnels ...)
and trace sinks (ETB, ETR, TPIU...). They are now supported in Linux
kernel, through corresponding drivers and the extension of perf. A library
for decoding ETM traces (OpenCSD) is also available. The way is now paved
to bring acceleration of process record for ARM based SoC to GDB.
I am re-sending RFC and making it available as basis for discussions for
implementing this feature. it is also attached as text file
B.R.
Zied Guermazi
Non intrusive execution recording
for GDB using ARM CoreSight
Status of this Memo
This memo provides information for Linaro coresight and toolchain
communities.
Distribution of this memo is unlimited.
Abstract
A method of realizing execution recording in GDB in a non-intrusive way.
This method is based on the use of CoreSight hardware tracing, available on
ARM Cortex devices.
Table of Contents
1 Introduction
2 State of the art
3 Use cases
3.1 Self hosted debug monitor
3.2 Remote debug monitor
3.3 External debugger
4 Implementation needs
4.1 Self hosted debug monitor
4.2 Remote Debug monitor
4.3 External debugger
5 Remote protocol execution sequence
6 Remote protocol extensions
7 Solutions and alternatives .
7.1 Scope definition
7.2 CoreSight infrastructure exposure to the user
7.3 Parameters needed for parsing traces
1. Introduction
CoreSight technology offers a toolset for tracing the execution of a
program on a CPU, as well as routing the traces to an external trace port
analyzer or storing it in a dedicated internal memory. Those traces do not
affects system performance, and can be used as a record for program
execution.
GDB offers reverse debugging by recording program execution and storing
it. GDB offers either full record or program flow (branch) record. Records
can be replayed later-on for forwards or backwards debugging.
This request for comments is about realizing GDB record and replay
functionality using CoreSight technology. it presents typical use cases
and discuss different alternatives for realizing above mentioned feature.
2. State of the art
GDB currently supports two execution recording variants:
- full record: where registers as well as memory are recorded for each
instruction. in this case GDB collects the registers as well as involved
memory area after each instruction. currently this has no support for
hardware accelerators
- branch record: where only program flow is recorded. In this case GDB
collects program execution flow. currently branch record is implemented
either with or without hardware acceleration by using Intel branch trace
store "bts" and Intel processor trace "pt" hardware accelerator on
supported cpus.
3. Use cases
Programs running on ARM processors can be be debugged in many
configurations. three of them are selected in this RFC as base for
discussion :
3.1. Self hosted debug monitor
Those are systems where the debugger program runs on the same cpu as the
debugged program and monitors it. user interacts with the debugging session
on the target host itself.
Linux GDB is an example of such systems. in such a system following
setup is considered
- Target: a process running on an ARM cortex A
- Debugger: gnu GDB via ptrace API (arm-linux-gnueabihf-gdb)
+-----------------------------------+
| Target |
| +------------+ |
| +------+ | Coresight | |
| | | | components:| |
| | GDB |<--------->| | |
| | | ^ | DWT, ETM, | |
| +------+ | | ITM, TPIU | |
| ^ | | TMC, ETB | |
| | | +------------+ |
+----|---------|--------------------+
| |
| |
arm-linux- |
gnueabihf- |
gdb |
debug: ptrace
trace: perf/CoreSight drivers
3.2. Remote debug monitor
Those are systems where the debugger program runs on the same cpu as the
debugged program and monitors it. user interacts with the debugging session
remotely from a PC
Linux GDB is an example of such systems. in such a system following
setup is considered
- Target: a process running on an ARM cortex A
- GDB server: gnu gdbserver (arm-linux-gnueabihf-gdbserver)
- GDB client: gnu GDB (arm-linux-gnueabihf-gdb)
- UI: eclipse with needed plugins, MI interface is used.
+--------------------------+ +---------------------------------------+
| Host | | Target |
| | | +------------+ |
| +-----+ +------+ | | +------+ | Coresight | |
| | | | GDB | | | | GDB | | components:| |
| | UI |<--->| |<--->|<--->|<--->| |<--------->| | |
| | | ^ |Client| ^ | ^ | |Server| ^ | DWT, ETM, | |
| +-----+ | +------+ | | | | +------+ | | ITM, TPIU | |
| ^ | ^ | | | | ^ | | TMC, ETB | |
| | | | | | | | | | +------------+ |
+----|-----|-----|------|--+ | +--------|---------|--------------------+
| | | | | | |
| | | | | | |
Eclipse | arm-linux- | | arm-linux- |
| gnueabihf- | TCP/IP gnueabihf- |
| gdb | UART gdbserver |
GDB MI GDB remote debug: ptrace
protocol trace: perf/CoreSight drivers
3.3. External debugger
Those are systems where an external debugger is used. It accesses the
target using JTAG or SWD. Target is usually a bare metal embedded systems
or systems with an rtos.
as an example, following setup is considered:
- Target: firmware running on ARM cortex M.
- Debugger: external debug and trace device.
- GDB server: OpenOcd.
- GDB Client: arm-none-eabi-gdb.
- UI: eclipse with needed plugins, MI interface is used.
+--------------------------------------+ +-------+ +-------------+
| Host | | dbggr | | Target |
| | | | | |
| +-----+ +------+ +------+ | | | | Coresight |
| | | | GDB | | GDB | | | Debug | | components: |
| | UI |<--->| |<--->| |<-->|<--->| + |<--->| |
| | | ^ |Client| ^ |Server| | ^ | Trace | ^ | DWT, ETM, |
| +-----+ | +------+ | +------+ | | | | | | ITM, TPIU |
| ^ | ^ | ^ | | | | | | |
| | | | | | | | | | | | |
+----|-----|-----|------|-----|--------+ | +-------+ | +-------------+
| | | | | | |
| | | | | | |
Eclipse | arm-none- | OpenOcd | |
| eabi-gdb | PyOcd | |
| | | |
GDB MI GDB remote Ethernet debug: JTAG/SWD
protocol USB trace: Serial/Parallel
4. Implementation needs
4.1 Self hosted debug monitor
GDB : arm-linux-gnueabihf-gdb
the interface defined in btrace.h for capturing and processing traces
has to be implemented for arm CoreSight.
needed actions:
- in btrace-common.h: add needed structures for capturing and
handling etm traces
- in linux-btrace.h:
- add btrace_tinfo_etm
- amend btrace_target_info
- in linux-btrace.c: change following functions to support etm
traces
- linux_enable_btrace
- linux_disable_btrace
- linux_read_btrace
- linux_btrace_conf
- in arm-linux-nat.c:add an api to
- configure btrace
- enable btrace
- disable btrace
- read btrace
- in btrace.c
- btrace_add_pc btrace_fetch has to be implemented for
Coresight this means using opencsd library to parse etms and then
reconstruct executed instructions accordingly (btrace_compute_ftrace_1)
- in record-btrace.c
- add command for showing record btrace etm options
- add command for starting tracing with CoreSight and its
handler (cmd_record_btrace_etm_start)
- adapt cmd_show_record_btrace_cpu
...
perf:
needed actions:
- make sure that perf can start/stop tracing a process with its
threads, collect etm traces and deliver them to the user
4.2 Remote Debug monitor
changes described in 7.1 are needed. in addition, and to support remote
protocol following changes are needed
GDB server: arm-linux-gnueabihf-gdbserver
needed actions:
- in linux-low
- linux_low_read_btrace: add support for etm traces formatting.
- linux_low_btrace_conf: :add support for etm configuration
formatting.
GDB client: arm-linux-gnueabihf-gdb
needed actions:
- in remote.c
- adapt enable_btrace
- adapt disable_btrace
- in btrace.c
- parse_xml_btrace: update btrace.dtd [2] and related data
structures btrace_xxx
- parse_xml_btrace_conf: update btrace-conf.dtd [3] and related
data structures btrace_conf_xxx
- extend Remote protocol handling to support coresight etm traces
UI: eclipse
needed actions
make sure that the plugin for recoding execution and replaying it
is coping well in case of arm-linux
Remote protocol needs to be extended by
-1- Adding Qbtrace:CoreSight (or etm) to start collecting etm traces
-2- Amending 'Branch Trace Format' xml specification to consider etm
traces transfer
-3- Amending 'Branch Trace Configuration Format' xml specification to
consider parameters needed for etm
4.3 External debugger
changes described in 7.2 are needed. in addition, and to support tracing
a remote dealing with an external debugger (bare metal embedded system)
following changes are needed
GDB server: OpenOcd
needed actions:
- rework etm driver to make it up to date.
- add a driver for configuring trace interconnect IPs
- rework the driver for TPIU.
- integrate support for a Trace port analyzer.
- extend remote protocol implementation to support recording
Coresight infrastructure of the SoC is to be set in OpenOcd through
configuration files. Parameters that are not relevant for GDB are also
specified in configuration files (trace sink, trace protocol, port size,
trace synch frequency, cycle accurate tracing etc ...)
GDB client: arm-none-eabi-gdb
needed actions:
- extend Remote protocol to support coresight etm traces
- integrate etm trace parsing library
- interface the parser to record_btrace_target
Remote protocol needs -in addition to 7.2- to be extended by
- Adding Qbtrace-conf:CoreSight:core=value to support multicore SoC
- Adding btrace-conf:CoreSight:id=value to support demultiplexing
multiple trace sources
- Adding Qbtrace-conf:CoreSight:filter:context=value to support
filtering traces belonging to a given process/thread
- Adding Qbtrace-conf:CoreSight:filter:start-address=value
and Qbtrace-conf:CoreSight:filter:end-address=value to
support filtering traces for given functions/blocks/lib
- Adding Qbtrace-conf:CoreSight:trigger:on-address=value
and Qbtrace-conf:CoreSight:trigger:off-address=value to
support triggering tracing or stop tracing if a certain function/block/lib
is executed
alternatively some of configurations related to filtering and
triggering can be delegated to the GDB server.
UI: eclipse
test and verify that existing plugins cope well with GDB extensions
5. Remote protocol execution sequence
GDB and gdbserver are communicating using the GDB remote protocol.
on a semantic level a tracing session runs though following sequence
(1) GDB client queries gdb server support for branch trace
(2) GDB server answers with
- qXfer:btrace:read
- qXfer:btrace-conf:read
- Qbtrace:off
- Qbtrace:CoreSight
- Qbtrace-conf:CoreSight:xxx where xxx is the parameter name
(3) GDB client sends command to let start emitting and collecting
traces (Qbtrace:CoreSight)
(4) GDB server executes the commands
(5) GDB client sends command to stop emitting and collecting traces
(Qbtrace:off)
(6) GDB server exectues the command
(7) GDB client sends command to get collected traces from trace sink
(qXfer:btrace:read:annex:offset,length)
(8) GDB server executes the command and sends back collected traces
(9) GDB client parses the traces and reconstructs target states
6. Remote protocol extensions
the remote protocol needs be extended with following primitives to
support CoreSight tracing
- start tracing and traces capture using CoreSight (Qbtrace:CoreSight)
the remote protocol can be extended with following primitives to take
advantages of etm functionalities.
- select the core to trace on in the case of a multicore system
GDB client sends command to select the core to trace
(Qbtrace-conf:CoreSight:core=value)
- set the trace id for the traces
GDB client sends command to set trace id
(Qbtrace-conf:CoreSight:id=value)
- select the context to trace
GDB client sends command to select the context to trace
(Qbtrace-conf:CoreSight:filter:context=value)
- select the address range to trace
GDB client sends command to select the address range to trace
(Qbtrace-conf:CoreSight:filter:start-address=value)
(Qbtrace-conf:CoreSight:filter:end-address=value)
- set triggers for starting and stopping tracing
GDB client sends command to select the address to trigger tracing
(Qbtrace-conf:CoreSight:trigger:on-address=value)
(Qbtrace-conf:CoreSight:trigger:off-address=value)
7. alternatives and discussions
7.1. Scope definition
Coresight ETM IP comes in many versions and many implementations.
According to its capabilities, it can trace instructions only or
instructions and involved data/data address. All ETMs variants support
instructions tracing and can therefore be used for for branch tracing.
7.2. CoreSight infrastructure exposure to the user
it is here about assigning the responsibility of configuring Coresight
infrastructure to generate and route traces. two alternatives are possible:
- coresight infrastructure exposed to GDB client (and UI):
in this alternative the user or the UI is responsible for
configuring coresight IPs in the SoC, by accessing their registers
directly or via coresigh driver. Remote protocol is used to configure trace
sink (ETB or TPA) to start/stop collecting traces
- coresight infrastructure is not exposed outside of gdbserver.
in this case high level commands can be provided by gdbserver
remote protocol to setup and configure coresight IPs in the SoC.
My recommendation is to extend remote protocol to provide high level
commands to setup and configure coresight IPs in the SoC, or to use a
different channel to pass configuration parameters to GDB server
7.3. Parameters needed for parsing traces
Some configuration parameters like etm version, trace id ... (content of
registers ETMCR, ETMIDR, ETMCCER, ETMTRACEIDR) are needed for extracting
and parsing etm trace, those parameters needs to be exchanged between GDB
server and client. following alternatives are possible:
- extend the remote protocol to get those params with explicit queries
- add them to the content of the response to qXfer:btrace-conf:read
- add them to the content of the response to qXfer:btrace:read
Arm and arm64 architecture reserve some memory regions prior to the
symbol '_stext' and these memory regions later will be used by device
module and BPF jit. The current code misses to consider these memory
regions thus any address in the regions will be taken as user space
mode, but perf cannot find the corresponding dso with the wrong CPU
mode so we misses to generate samples for device module and BPF
related trace data.
This patch parse the link scripts to get the memory size prior to start
address and reduce this size from 'etmq->etm->kernel_start', then can
get a fixed up kernel start address which contain memory regions for
device module and BPF. Finally, cs_etm__cpu_mode() can return right
mode for these memory regions and perf can successfully generate
samples.
The reason for parsing the link scripts is Arm architecture changes text
offset dependent on different platforms, which define multiple text
offsets in $kernel/arch/arm/Makefile. This offset is decided when build
kernel and the final value is extended in the link script, so we can
extract the used value from the link script. We use the same way to
parse arm64 link script as well. If fail to find the link script, the
pre start memory size is assumed as zero, in this case it has no any
change caused with this patch.
Below is detailed info for testing this patch:
- Build LLVM/Clang 8.0 or later version;
- Configure perf with ~/.perfconfig:
root@debian:~# cat ~/.perfconfig
# this file is auto-generated.
[llvm]
clang-path = /mnt/build/llvm-build/build/install/bin/clang
kbuild-dir = /mnt/linux-kernel/linux-cs-dev/
clang-opt = "-g"
dump-obj = true
[trace]
show_zeros = yes
show_duration = no
no_inherit = yes
show_timestamp = no
show_arg_names = no
args_alignment = 40
show_prefix = yes
- Run 'perf trace' command with eBPF event:
root@debian:~# perf trace -e string \
-e $kernel/tools/perf/examples/bpf/augmented_raw_syscalls.c
- Read eBPF program memory mapping in kernel:
root@debian:~# echo 1 > /proc/sys/net/core/bpf_jit_kallsyms
root@debian:~# cat /proc/kallsyms | grep -E "bpf_prog_.+_sys_[enter|exit]"
ffff000000086a84 t bpf_prog_f173133dc38ccf87_sys_enter [bpf]
ffff000000088618 t bpf_prog_c1bd85c092d6e4aa_sys_exit [bpf]
- Launch any program which accesses file system frequently so can hit
the system calls trace flow with eBPF event;
- Capture CoreSight trace data with filtering eBPF program:
root@debian:~# perf record -e cs_etm/(a)20070000.etr/ \
--filter 'filter 0xffff000000086a84/0x800' -a sleep 5s
- Annotate for symbol 'bpf_prog_f173133dc38ccf87_sys_enter':
root@debian:~# perf report
Then select 'branches' samples and press 'a' to annotate symbol
'bpf_prog_f173133dc38ccf87_sys_enter', press 'P' to print to the
bpf_prog_f173133dc38ccf87_sys_enter.annotation file:
root@debian:~# cat bpf_prog_f173133dc38ccf87_sys_enter.annotation
bpf_prog_f173133dc38ccf87_sys_enter() bpf_prog_f173133dc38ccf87_sys_enter
Event: branches
Percent int sys_enter(struct syscall_enter_args *args)
stp x29, x30, [sp, #-16]!
int key = 0;
mov x29, sp
augmented_args = bpf_map_lookup_elem(&augmented_filename_map, &key);
stp x19, x20, [sp, #-16]!
augmented_args = bpf_map_lookup_elem(&augmented_filename_map, &key);
stp x21, x22, [sp, #-16]!
stp x25, x26, [sp, #-16]!
return bpf_get_current_pid_tgid();
mov x25, sp
return bpf_get_current_pid_tgid();
mov x26, #0x0 // #0
sub sp, sp, #0x10
return bpf_map_lookup_elem(pids, &pid) != NULL;
add x19, x0, #0x0
mov x0, #0x0 // #0
mov x10, #0xfffffffffffffff8 // #-8
if (pid_filter__has(&pids_filtered, getpid()))
str w0, [x25, x10]
probe_read(&augmented_args->args, sizeof(augmented_args->args), args);
add x1, x25, #0x0
probe_read(&augmented_args->args, sizeof(augmented_args->args), args);
mov x10, #0xfffffffffffffff8 // #-8
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args->args.syscall_nr);
add x1, x1, x10
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args->args.syscall_nr);
mov x0, #0xffff8009ffffffff // #-140694538682369
movk x0, #0x6698, lsl #16
movk x0, #0x3e00
mov x10, #0xffffffffffff1040 // #-61376
if (syscall == NULL || !syscall->enabled)
movk x10, #0x1023, lsl #16
if (syscall == NULL || !syscall->enabled)
movk x10, #0x0, lsl #32
loop_iter_first()
3.69 → blr bpf_prog_f173133dc38ccf87_sys_enter
loop_iter_first()
add x7, x0, #0x0
loop_iter_first()
add x20, x7, #0x0
int size = probe_read_str(&augmented_filename->value, filename_len, filename_arg);
mov x0, #0x1 // #1
[...]
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose(a)arm.com>
Cc: coresight(a)lists.linaro.org
Cc: linux-arm-kernel(a)lists.infradead.org
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/Makefile.config | 22 ++++++++++++++++++++++
tools/perf/util/cs-etm.c | 19 ++++++++++++++++++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 51dd00f65709..a58cd5a43a98 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -418,6 +418,28 @@ ifdef CORESIGHT
endif
LDFLAGS += $(LIBOPENCSD_LDFLAGS)
EXTLIBS += $(OPENCSDLIBS)
+ PRE_START_SIZE := 0
+ ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds),)
+ ifeq ($(SRCARCH),arm64)
+ # Extract info from lds:
+ # . = ((((((((0xffffffffffffffff)) - (((1)) << (48)) + 1) + (0)) + (0x08000000))) + (0x08000000))) + 0x00080000;
+ # PRE_START_SIZE := (0x08000000 + 0x08000000 + 0x00080000) = 0x10080000
+ PRE_START_SIZE := $(shell egrep ' \. \= \({8}0x[0-9a-fA-F]+\){2}' \
+ $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
+ sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
+ awk -F' ' '{printf "0x%x", $$6+$$7+$$8}' 2>/dev/null)
+ endif
+ ifeq ($(SRCARCH),arm)
+ # Extract info from lds:
+ # . = ((0xC0000000)) + 0x00208000;
+ # PRE_START_SIZE := 0x00208000
+ PRE_START_SIZE := $(shell egrep ' \. \= \({2}0x[0-9a-fA-F]+\){2}' \
+ $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
+ sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
+ awk -F' ' '{printf "0x%x", $$2}' 2>/dev/null)
+ endif
+ endif
+ CFLAGS += -DARM_PRE_START_SIZE=$(PRE_START_SIZE)
$(call detected,CONFIG_LIBOPENCSD)
ifdef CSTRACE_RAW
CFLAGS += -DCS_DEBUG_RAW
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 0c7776b51045..5fa0be3a3904 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -613,10 +613,27 @@ static void cs_etm__free(struct perf_session *session)
static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address)
{
struct machine *machine;
+ u64 fixup_kernel_start = 0;
machine = etmq->etm->machine;
- if (address >= etmq->etm->kernel_start) {
+ /*
+ * Since arm and arm64 specify some memory regions prior to
+ * 'kernel_start', kernel addresses can be less than 'kernel_start'.
+ *
+ * For arm architecture, the 16MB virtual memory space prior to
+ * 'kernel_start' is allocated to device modules, a PMD table if
+ * CONFIG_HIGHMEM is enabled and a PGD table.
+ *
+ * For arm64 architecture, the root PGD table, device module memory
+ * region and BPF jit region are prior to 'kernel_start'.
+ *
+ * To reflect the complete kernel address space, compensate these
+ * pre-defined regions for kernel start address.
+ */
+ fixup_kernel_start = etmq->etm->kernel_start - ARM_PRE_START_SIZE;
+
+ if (address >= fixup_kernel_start) {
if (machine__is_host(machine))
return PERF_RECORD_MISC_KERNEL;
else
--
2.17.1
Some hardware will ignore bit TRCPDCR.PU which is used to signal
to hardware that power should not be removed from the trace unit.
Let's mitigate against this by conditionally saving and restoring
the trace unit state when the CPU enters low power states.
This patchset introduces a firmware property named
'arm,coresight-needs-save-restore' - when this is present the
hardware state will be conditionally saved and restored.
A module parameter 'pm_save_enable' is also introduced which can
be configured to override the firmware property.
The hardware state is only ever saved and restored when the claim
tags indicate that self-hosted mode is in use.
Changes since v1:
- Rebased onto coresight/next
- Correcly pass bit number rather than BIT macro to coresight_timeout
- Abort saving state if a timeout occurs
- Fix completely broken pm_notify handling and unregister handler on error
- Use state_needs_restore to ensure state is restored only once
- Add module parameter description to existing boot_enable parameter
and use module_param instead of module_param_named
- Add firmware bindings for coresight-needs-save-restore
- Rename 'disable_pm_save' to 'pm_save_enable' which allows for
disabled, enabled or firmware
- Update comment on etm4_os_lock, it incorrectly indicated that
the code unlocks the trace registers
- Add comments to explain use of OS lock during save/restore
- Fix incorrect error description whilst waiting for PM stable
- Add WARN_ON_ONCE when cpu isn't as expected during save/restore
- Various updates to commit messages
Andrew Murray (5):
coresight: etm4x: remove superfluous setting of os_unlock
coresight: etm4x: use explicit barriers on enable/disable
coresight: etm4x: use module_param instead of module_param_named
coresight: etm4x: improve clarity of etm4_os_unlock comment
coresight: etm4x: save/restore state across CPU low power states
.../devicetree/bindings/arm/coresight.txt | 3 +
drivers/hwtracing/coresight/coresight-etm4x.c | 315 +++++++++++++++++-
drivers/hwtracing/coresight/coresight-etm4x.h | 66 ++++
drivers/hwtracing/coresight/coresight.c | 2 +-
include/linux/coresight.h | 8 +
5 files changed, 387 insertions(+), 7 deletions(-)
--
2.21.0
Good day Zied,
Apologies for the delay in responding to you - my mail client sent
your email to my spam folder. Moreover, I suggest to CC the coresight
mailing list when seeking guidance on this topic. There is a lot of
knowledgeable people on there that can help you as much as I can.
On Wed, 26 Jun 2019 at 06:47, zied guermazi <guermazi_zied(a)yahoo.com> wrote:
>
> hi Mathieu,
>
> I was tracking the progress of coresight group within Linaro in providing coresight tracing in linux kernel as well as tools around it (e.g. perf, opencsd). I noticed that it reached a good maturity level to enable other use cases for this feature, one of them is providing non intrusive instruction tracing in GDB using ETM. This can bring a huge benefit to ARM community using open source tools (reduce debugging time, record and replay buggy execution, test coverage, performance analysis etc ..). in addition it can open the doors for business targeting introducing debuggers with ETM tracing support in the market with more affordable price.
>
> I would like to present this opportunity to Linaro, and I am seeking getting their feedback and involvement. I have seen that you were active since the beginning in the coresight project at Linaro, so you probably went through such a process. I want to get your advise on how to proceed to reach this target.
> I am attaching an RFC where technical aspects are discussed, this can give you a better insight on the use case and its realizability.
I commend you for taking the time to put this RFC together.
Integration of coresight with GDB is something that has been on the
radar for a long time. Several people have looked at the feature but
it was never pursued further for various reasons. Your RFC has a lot
of details and you definitely took time to think about this. Other
than that it is not possible for me to cast further judgement on the
viability of the project without a small prototype to evaluate and
code to look at.
I suggest you come up with a proof of concept that covers a basic
scenario. That will make it easier for us to review your work and
assess the feasibility of the feature. I also advise to take time to
understand how coresight has been integrated with perf, how
interactions with the openCSD library are made and the complexity
inherent to coresight trace decoding.
Thanks,
Mathieu
>
> looking forward for your feedback
> Best Regards
> Zied Guermazi
>
Hello,
I am trying to trace a statically linked application on my ZedBoard (Zynq-7000 SoC).
Please find attached the ELF file and the assembler file of the application.
For the context:
- The CPU0 runs at 333 MHz
- The TPIU runs at 200 MHz
- The PL (FPGA) runs at 250 MHz
For tracing the application, I use the attached script.sh file, which mainly:
- Disable the CPU1
- Setting up the address comparator
- Activate the Branch Broadcast mode
- Specify the address range to trace
- Activate the PTM and the TPIU
- Execute the application
- Retrieve the trace in a dedicated BRAM memory in the PL.
When I trace the whole .text section (from 0x100e0 to 0x15140)
I receive incomplete and inconsistent traces, for example:
00 9c 148d4 15d4c 10db4 13ecc b6fa3f44 14568 b6f85720 148d4 b6fa5d4c 13ecc b6fa3f44
14568 b6f85720 148d4 b6fa5d4c 147c0 b6f78030 14784 b6f6f3c8 13ecc b6f9df44 14704
b6f6f3c8 14784 b6f6f3c8 13d08 b6f9ea70 13d88 b6f9ea70 13c68 b6f99484 14628 b6f994b4
1485c b6fa0174 1452c b6faa2c4 14094 b6fabcb0 142b0 b6fab754 14784 b6f6f3c8 13ecc
b6f9df44 14784 b6f6f3c8 14784 b6f6f3c8 13d88 b6f9ea70 13d88 b6f9ea70 13c68 b6f99484
14628 b6f994b4 1485c b6fa0174 1452c b6faa2c4 14094 b6fabcb0 142b0
b6fab754 10104 10128 105d0 10390 136a0 00 00 00 00 00 00 00 00
Note that the addresses (10104 10128 105d0 10390 136a0) in bold above correspond to:
10104 _start ( entry point )
10128 _ start_c
105d0 __libc_start_main
10390 __init_libc
136a0 memset
1) How the entry point can end up at the end of the traces?
But when I trace a tiny interval of the .text section, for example from 0x100e0 to 0x136ac, I receive correct traces.
00 10104 10128 105d0 10390 136a0 10338 103c4 103c4 103c4 103c4 103c4 103c4 103c4 103c4 103c4
103c4 103c4 103c4 103e8 103e8 103e8 103e8 103e8 103e8 103e8 103e8 103e8 103e8 103e8 103e8
103e8 103e8 103e8 103e8 103e8 10420 10468 10480 10464 10464 10464 10464 10464 10480 10464
10464 10464 10464 10464 10480 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464
10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464
10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10464 10488
138b4 10490 10388 10498 1048c 10404 1049c 10590 105f8 10598 100d4 100a4 10008 1fff0010 1fff009c
105bc 10214 1023c 10198 105c4 105fc 102c4 102a0 10300 1361c 13658 10310 10270 102b8 10330 10650
10698 10810 13318 108c8 108e4 108e0 108e0 108e0 108e0 108e0 108e0 108e0 108e0 108e0 108e0 10820
10810 10868 13318 108c8 108e4 10920 10968 10980 109c0 10a0c 10a68 10b20 10b34 10ba4 10ba0 10bec
10d20 13468 134e4 106ec 1070c 10810 13318 108c8 108e4 108e0 108e0 108e0 108e0 108e0 108e0 108e0
108e0 108e0 108e0 ………………
These addresses correspond to:
10104 _start
10128 _ start_c
105d0 __libc_start_main
10390 __init_libc
136a0 memset
10338
103c4 Loop in __init_libc
…..
103e8 Loop in __init_libc
….
102c4 main
…
2) Is there a limitation for the traced address range ? The only difference between those traces is the addr_range value:
For tracing the whole .text section :
echo -n 0x100e0 0x15140 > $PTM_CPU_0/addr_range
For tracing a tiny interval in the .text section
echo -n 0x100e0 0x136ac > $PTM_CPU_0/addr_range
If there is no limitation, what can explain this behaviour? A wrong configuration of the Coresight Components?
Kind Regards,
Mounir NASR ALLAH
Dear Mathieu,
I will like to clarify more on the `mode` file found in the mgmt of one of the ETM drivers.
The following is from the Documentations found in Documentation/ABI/testing/sysfs-bus-coresight-devices-etm4x
What: /sys/bus/coresight/devices/<memory_map>.etm/mode
Date: April 2015
KernelVersion: 4.01
Contact: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Description: (RW) Controls various modes supported by this ETM, for example
P0 instruction tracing, branch broadcast, cycle counting and
context ID tracing.
I am asking about the functionality of this file because I want to be able to see how my traces will differ when i change the mode.
I have tweaked different values in mode and I realized that if the value != 0, it will not trace my program instruction addresses. I will only retrieve instruction addresses from kernel symbols (CPU_do_idle is found to be the specific instruction delivered by the CPU)
This is the snippet of the decoded trace using ptm2human ( I am also trying to figure if the decoder is indeed decoding the correct instructions from the trace dumps):
Context - Context ID = 0x0,
VMID = 0x0,
Address - Instruction address 0xffffff800895af78, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff8008161594, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800898b3c0, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800898b3c4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff80081615c8, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800895af80, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800895b024, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff80081260e0, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800855cbd0, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800855cbf4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800855cbec, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff8008126104, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800855cbd0, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800855cbf4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800855cbec, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800812611c, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800895b030, Instruction set Aarch32 (Thumb)
Address - Instruction address 0xffffff800895b054, Instruction set Aarch32 (Thumb)
Exception - exception type IRQ, address 0xffffff800895b054
I will also upload the decoded trace dumps of the above (mode manipulated dump) in ed002000_mode.log
On another topic, I find it extremely amusing that I am able to trace my simple program. However,
according to the ETM options implemented, it should not be tracing any LDR/STR/conditional instructions yet
I have found the decoded traces to only be tracing LDR and STR. I have attached the source code of my math program and both the decoded (ec036000.etf.log) and the trace dump (ec036000.etf.bin). To make viewing of
the instructions better, please look at ec036000.etf_addr.log. A snippet of the decoded dump is found here:
Address - Instruction address 0x0000ffbefe7fc400, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555b9c, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555ba4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bac, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bb4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bbc, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bc4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bcc, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bd4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bdc, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555be4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bec, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bf4, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555bfc, Instruction set Aarch32 (Thumb)
Address - Instruction address 0x0000005555555c04, Instruction set Aarch32 (Thumb)
I am using gdb to compare the program instruction addresses against the trace instruction addresses.
I have also deactivated ASLR to ensure that the values are always comparable. A snippet
of the instructions of my program using gdb is attached below:
|0x5555555b9c <main+500> add x8, x8, x10
|0x5555555ba0 <main+504> ldr x8, [x8]
|0x5555555ba4 <main+508> ldr x10, [sp,#144]
|0x5555555ba8 <main+512> and x10, x10, x13
|0x5555555bac <main+516> lsl x10, x12, x10
|0x5555555bb0 <main+520> and x8, x8, x10
|0x5555555bb4 <main+524> cmp x8, x9
|0x5555555bb8 <main+528> cset w15, ne
|0x5555555bbc <main+532> orr w16, wzr, #0x1
|0x5555555bc0 <main+536> and w15, w15, w16
|0x5555555bc4 <main+540> str w15, [sp,#20]
|0x5555555bc8 <main+544> b 0x5555555bd4 <main+556>
|0x5555555bcc <main+548> mov w8, #0x0 // #0
|0x5555555bd0 <main+552> str w8, [sp,#20]
|0x5555555bd4 <main+556> ldr w8, [sp,#20]
|0x5555555bd8 <main+560> str w8, [sp,#140]
|0x5555555bdc <main+564> ldr w8, [sp,#140]
|0x5555555be0 <main+568> cbnz w8, 0x5555555be8 <main+576>
|0x5555555be4 <main+572> b 0x5555555c80 <main+728>
|0x5555555be8 <main+576> orr x0, xzr, #0x10
|0x5555555bec <main+580> orr w3, wzr, #0x4
|0x5555555bf0 <main+584> ldur w2, [x29,#-40]
|0x5555555bf4 <main+588> ldr x8, [sp,#80]
|0x5555555bf8 <main+592> str x0, [sp,#8]
|0x5555555bfc <main+596> mov x0, x8
|0x5555555c00 <main+600> ldr x1, [sp,#72]
|0x5555555c04 <main+604> bl 0x55555558d0 <fprintf@plt>
The following are the trcidr values of my ETM:
trcidr0: 0x28000ea1
trcidr1: 0x4100f404
trcidr2: 0x488
trcidr3: 0xd7b00004
trcidr4: 0x11170004
trcidr5: 0x28c7081e
trcidr8: 0x0
trcidr9: 0x0
trcidr10: 0x0
trcidr11: 0x0
trcidr12: 0x0
trcidr13: 0x0
Hope I have provided sufficient information. Please advise!
Yours Sincerely,
Jeremy
This email may contain confidential and/or proprietary information that is exempt from disclosure under applicable law and is intended for receipt and use solely by the addressee(s) named above. If you are not the intended recipient, you are notified that any use, dissemination, distribution, or copying of this email, or any attachment, is strictly prohibited. Please delete the email immediately and inform the sender. Thank You
The above message may contain confidential and/or proprietary information that is exempt from disclosure under applicable law and is intended for receipt and use solely by the addressee(s) named above. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this email in error, please inform the sender immediately by reply e-mail or telephone, reversing the charge if necessary. Please delete the message thereafter. Thank you.