This set adds support for CoreSight CPU-wide trace sessions. It borrows
most of its code from the per-thread implementation with exception that
range packets are processed and synthesised according to the time the
trace they contain has been executed.
This is done using the timestamp and contextID feature available on ETM4x
tracers (ETM3x/PTM aren't addressed yet). Decoding between processors is
done in chronological order using a min heap.
Of special interest is the way timestamp packets are used to account for
temporal execution of traced instructions. Since a timestamp typically
happen after range packets have been recorded, the timestamp from the
previous range is used as the start time of the current range. When a
timestamp for the previous range doesn't exist (i.e start of trace or
discontinuity) the start time is estimated.
Open question:
--------------
At this time the implementation supports tracing a single CPU since the
only HW we have exhibit an N:1 source/sink topology. The HW itself does
support collecting traces from more than one source but using the feature
in this way could be very confusing and mislead users.
For example the following:
# perf record -e cs_etm/20070000.etr/ -C 2,3 application1
would end up tracing everyting that is happening on CPU 2 and 3 for as long
as appliation1 is executing. Because the HW doesn't give us an interrupt
when buffers are full, traces from one CPU could easily clobber traces from
the other, giving the impression that nothing was executed on the latter.
So this would work:
# perf record -e cs_etm/20070000.etr/ -C 3 application1
I am open to discussion on the topic should someone think of something.
As with the cleanup set this code has been uploaded here [1].
Thanks,
Mathieu
[1].https://git.linaro.org/people/mathieu.poirier/coresight.git perf-opencsd-master-cpu-wide-support
Mathieu Poirier (12):
perf tools: Add defines for CONTEXTID configuration
perf tools: Configure contextID tracing in CPU-wide mode
perf tools: Configure timestsamp generation in CPU-wide mode
perf tools: Configure SWITCH_EVENTS in CPU-wide mode
perf tools: Add handling of itrace start events
perf tools: Add handling of switch-CPU-wide events
perf tools: Linking PE contextID with perf thread mechanic
perf tools: Allocate decoder tree as needed
perf tools: Make cs_etm__dump_event() work with CPU-wide scenarios
perf tools: Add notion of time to the decoding code
perf tools: Make function cs_etm_decoder__clear_buffer() public
perf tools: Add support for CPU-wide trace scenarios
include/linux/coresight-pmu.h | 2 +
tools/include/linux/coresight-pmu.h | 2 +
tools/perf/arch/arm/util/cs-etm.c | 174 ++++++++++--
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 140 +++++++++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 4 +-
tools/perf/util/cs-etm.c | 334 ++++++++++++++++++++++--
tools/perf/util/cs-etm.h | 17 ++
7 files changed, 623 insertions(+), 50 deletions(-)
--
2.7.4
The exception packet appears as one element with 'elem_type' ==
OCSD_GEN_TRC_ELEM_EXCEPTION or OCSD_GEN_TRC_ELEM_EXCEPTION_RET,
which present for exception entry and exit respectively. The decoder
set packet fields 'packet->exc' and 'packet->exc_ret' to indicate the
exception packets; but exception packets don't have dedicated sample
type and shares the same sample type CS_ETM_RANGE with normal
instruction packets.
As result, the exception packets are taken as normal instruction packets
and this introduces confusion to mix different packet types.
Furthermore, these instruction range packets will be processed for
branch sample only when 'packet->last_instr_taken_branch' is true,
otherwise they will be omitted, this can introduce mess for exception
and exception returning due we don't have complete address range info
for context switching.
To process exception packets properly, this patch introduce two new
sample type: CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET; for these two
kind packets, they will be handled by cs_etm__exception(). The func
cs_etm__exception() forces to set previous CS_ETM_RANGE packet flag
'prev_packet->last_instr_taken_branch' to true, this matches well with
the program flow when the exception is trapped from user space to kernel
space, no matter if the most recent flow has branch taken or not; this
is also safe for returning to user space after exception handling.
After exception packets have their own sample type, the packet fields
'packet->exc' and 'packet->exc_ret' aren't needed anymore, so remove
them.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 26 +++++++++++++++++------
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++-----
tools/perf/util/cs-etm.c | 28 +++++++++++++++++++++++++
3 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 24aabf0..c1715ff 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -264,8 +264,6 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[i].last_instr_taken_branch = false;
- decoder->packet_buffer[i].exc = false;
- decoder->packet_buffer[i].exc_ret = false;
decoder->packet_buffer[i].cpu = INT_MIN;
}
}
@@ -292,8 +290,6 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
decoder->packet_count++;
decoder->packet_buffer[et].sample_type = sample_type;
- decoder->packet_buffer[et].exc = false;
- decoder->packet_buffer[et].exc_ret = false;
decoder->packet_buffer[et].cpu = *((int *)inode->priv);
decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
@@ -345,6 +341,22 @@ cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
CS_ETM_TRACE_ON);
}
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
+ const uint8_t trace_chan_id)
+{
+ return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+ CS_ETM_EXCEPTION);
+}
+
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception_ret(struct cs_etm_decoder *decoder,
+ const uint8_t trace_chan_id)
+{
+ return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+ CS_ETM_EXCEPTION_RET);
+}
+
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
const void *context,
const ocsd_trc_index_t indx __maybe_unused,
@@ -370,10 +382,12 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION:
- decoder->packet_buffer[decoder->tail].exc = true;
+ resp = cs_etm_decoder__buffer_exception(decoder,
+ trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
- decoder->packet_buffer[decoder->tail].exc_ret = true;
+ resp = cs_etm_decoder__buffer_exception_ret(decoder,
+ trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
case OCSD_GEN_TRC_ELEM_EO_TRACE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 108dc9d..cb57756 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -25,9 +25,11 @@ struct cs_etm_buffer {
};
enum cs_etm_sample_type {
- CS_ETM_EMPTY = 0,
- CS_ETM_RANGE = 1 << 0,
- CS_ETM_TRACE_ON = 1 << 1,
+ CS_ETM_EMPTY = 0,
+ CS_ETM_RANGE = 1 << 0,
+ CS_ETM_TRACE_ON = 1 << 1,
+ CS_ETM_EXCEPTION = 1 << 2,
+ CS_ETM_EXCEPTION_RET = 1 << 3,
};
struct cs_etm_packet {
@@ -35,8 +37,6 @@ struct cs_etm_packet {
u64 start_addr;
u64 end_addr;
u8 last_instr_taken_branch;
- u8 exc;
- u8 exc_ret;
int cpu;
};
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 2ae6402..b85100b 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -942,6 +942,25 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
return 0;
}
+static int cs_etm__exception(struct cs_etm_queue *etmq)
+{
+ /*
+ * When the exception packet is inserted, whether the last instruction
+ * in previous range packet is taken branch or not, we need to force
+ * to set 'prev_packet->last_instr_taken_branch' to true. This ensures
+ * to generate branch sample for the instruction range before the
+ * exception is trapped to kernel or before the exception returning.
+ *
+ * The exception packet includes the dummy address values, so don't
+ * swap PACKET with PREV_PACKET. This keeps PREV_PACKET to be useful
+ * for generating instruction and branch samples.
+ */
+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE)
+ etmq->prev_packet->last_instr_taken_branch = true;
+
+ return 0;
+}
+
static int cs_etm__flush(struct cs_etm_queue *etmq)
{
int err = 0;
@@ -1057,6 +1076,15 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
*/
cs_etm__sample(etmq);
break;
+ case CS_ETM_EXCEPTION:
+ case CS_ETM_EXCEPTION_RET:
+ /*
+ * If the exception packet is coming,
+ * make sure the previous instruction
+ * range packet to be handled properly.
+ */
+ cs_etm__exception(etmq);
+ break;
case CS_ETM_TRACE_ON:
/*
* Discontinuity in trace, flush
--
2.7.4
Usually the start tracing packet is a CS_ETM_TRACE_ON packet, this
packet is passed to cs_etm__flush(); cs_etm__flush() will check the
condition 'prev_packet->sample_type == CS_ETM_RANGE' but 'prev_packet'
is allocated by zalloc() so 'prev_packet->sample_type' is zero in
initialization and this condition is false. So cs_etm__flush() will
directly bail out without handling the start tracing packet.
This patch is to introduce a new sample type CS_ETM_EMPTY, which is used
to indicate the packet is an empty packet. cs_etm__flush() will swap
packets when it finds the previous packet is empty, so this can record
the start tracing packet into 'etmq->prev_packet'.
Another minor change in cs_etm__flush() is to check the condition
'etmq->prev_packet->sample_type == CS_ETM_TRACE_ON', if the previous
packet is also a CS_ETM_TRACE_ON packet, the function will skip for
contiguous CS_ETM_TRACE_ON packet.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 1 +
tools/perf/util/cs-etm.c | 26 ++++++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 743f5f4..612b575 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -23,6 +23,7 @@ struct cs_etm_buffer {
};
enum cs_etm_sample_type {
+ CS_ETM_EMPTY = 0,
CS_ETM_RANGE = 1 << 0,
CS_ETM_TRACE_ON = 1 << 1,
};
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..67564c1 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -924,9 +924,18 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
int err = 0;
struct cs_etm_packet *tmp;
- if (etmq->etm->synth_opts.last_branch &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+ if (!etmq->prev_packet)
+ return 0;
+
+ /* Skip for contiguous CS_ETM_TRACE_ON packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
+ return 0;
+
+ /* Handle start tracing packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_EMPTY)
+ goto swap_packet;
+
+ if (etmq->etm->synth_opts.last_branch) {
/*
* Generate a last branch event for the branches left in the
* circular buffer at the end of the trace.
@@ -941,6 +950,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
etmq->period_instructions);
etmq->period_instructions = 0;
+ }
+
+swap_packet:
+ if (etmq->etm->synth_opts.last_branch) {
/*
* Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
* the next incoming packet.
@@ -1020,6 +1033,13 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
*/
cs_etm__flush(etmq);
break;
+ case CS_ETM_EMPTY:
+ /*
+ * Should not receive empty packet,
+ * report error.
+ */
+ pr_err("CS ETM Trace: empty packet\n");
+ return -EINVAL;
default:
break;
}
--
2.7.4
This patch series is to support for using 'perf script' for CoreSight
trace disassembler, for this purpose this patch series adds a new
python script to parse CoreSight tracing event and use command 'objdump'
for disassembled lines, finally this can generate readable program
execution flow for reviewing tracing data.
Patches 0001 ~ 0003 are to generate samples for the start packet,
CS_ETM_TRACE_ON packet and exception packets.
Patch 0004 is to introduce invalid address macro.
Patch 0005 is to add python script for trace disassembler.
Patch 0006 is to add doc to explain python script usage and give
example for it.
This patch series has been rebased on acme git tree [1] with the latest
commit e9175538c04f ("perf script python: Add addr into perf sample dict")
and tested on Hikey (ARM64 octa CA53 cores).
In this version the script has no dependency on ARM64 platform and is
expected to support ARM32 platform, but I am lacking ARM32 platform for
testing on it, so firstly upstream to support ARM64 platform.
This patch series is firstly to support 'per-thread' recording tracing
data, and it has been verified for kernel panic kdump tracing data.
Please note, this patch series (v4) is ONLY used for discussion for packet
handling, after we get solid result I will send to LKML for reviewing and
merging into mainline kernel.
Changes from v3:
* Split packet handling for three patches, one is for start tracing
packet, one is for CS_ETM_TRACE_ON packet and the last one patch is
for exception packet;
* Introduce invalid address macro.
Changes from v2:
* Synced with Rob for handling CS_ETM_TRACE_ON packet, so refined 0001
patch according to dicussion;
* Minor cleanup and fixes in 0003 patch for python script: remove 'svc'
checking.
Changes from v1:
* According to Mike and Rob suggestion, add the fixing to generate samples
for the start packet and exception packets.
* Simplify the python script to remove the exception prediction algorithm,
we can rely on the sane exception packets for disassembler.
Leo Yan (6):
perf cs-etm: Fix start tracing packet handling
perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
perf cs-etm: Generate branch sample for exception packet
perf cs-etm: Introduce invalid address macro
perf script python: Add script for CoreSight trace disassembler
coresight: Document for CoreSight trace disassembler
Documentation/trace/coresight.txt | 52 +++++
tools/perf/scripts/python/arm-cs-trace-disasm.py | 235 +++++++++++++++++++++++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 19 +-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 11 +-
tools/perf/util/cs-etm.c | 101 ++++++++--
5 files changed, 390 insertions(+), 28 deletions(-)
create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
--
2.7.4