Hi,
This patchset adds support for user space decoding of CoreSight traces [1] of the ARM architecture. Kernel support for configuring CoreSight tracers and collect the hardware trace data in the auxtrace section of the perf.data file is already integrated [2]. The user space implementation mirrors to a large degree that of the Intel Processor Trace (PT) [3] implementation, except that the decoder library itself is separate from the perf tool sources, and is built and maintained as a separate open source project [4]. Instead, this patch set includes the necessary code and build settings to interfaces to the decoder library, as well as a "stub" or "null" library for the case when the perf tool is built without the library.
The decoder library interface code in this patch set only supports ETMv4 trace decoding, though the library itself supports a broader range. Future patches will add support for more versions of the ARM ETM trace encoding.
Changes from v2:
Changed patch sequence to enable packet dump (-D or --dump-raw-trace option) first. Then build up to full packet decode. Also added functions to the trace decoder library interface as they are referenced by the functions in the main coresight processing file and combined them in those patches.
Changes from v3:
Introduced functions in cs-etm-decoder.c at the same time they are referenced in cs-etm.c, and not waiting until the very end to change the build script to compile the full decoder library interface.
Changes from v4:
Removed function to directly read vmlinux file text section, instead relying on perf dso access functions.
Changes from v5:
Refactored cs-etm-decoder-stub.c so that the different parts are introduced when the corresponding parts in cs-etm-decoder.c are added.
Changed error returns in cs-etm-decoder.c to use enums as opposed to literals.
Changed handling of how memory access function is added to decoder library. Now, instead of adding for each MMAP2 entry and the kernel only one entry is added for the entire address space. The functionality is equivalent to the previous, except the code is simpler.
Changes from v6:
Removed stub library, instead replaced by conditionally using a static inline function in cs-etm.h when the decoder library is not available.
Tor Jeremiassen (22): perf tools: Add initial hooks for decoding coresight traces perf tools: Add processing of coresight metadata perf tools: Add coresight trace decoder library interface perf tools: Add data block processing function perf tools: Add etmv4i packet printing capability perf tools: Add decoder new and free perf tools: Add trace packet print for dump_trace option perf tools: Add code to process the auxtrace perf event perf tools: Add function to read data from dsos perf tools: Add mapping from cpu to cs_etm_queue perf tools: Add functions to allocate and free queues perf tools: Add functions to setup and initialize queues perf tools: Add functions to allocate and free queues perf tools: Add function to get trace data from aux buffer perf tools: Add function to run the trace decoder and process samples perf tools: Add functions to process queues and run the trace decoder perf tools: Add perf event processing perf tools: Add processing of queues when events are flushed perf tools: Add synth_events and supporting functions perf tools: Add function to clear the decoder packet buffer perf tools: Add functions for full etmv4i packet decode MAINTAINERS: Adding entry for CoreSight trace decoding
MAINTAINERS | 3 +- tools/perf/Makefile.config | 26 + tools/perf/util/Build | 6 + tools/perf/util/auxtrace.c | 2 + tools/perf/util/cs-etm-decoder/Build | 2 + tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 526 ++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 133 +++ tools/perf/util/cs-etm.c | 1179 +++++++++++++++++++++++ tools/perf/util/cs-etm.h | 50 + 9 files changed, 1926 insertions(+), 1 deletion(-) create mode 100644 tools/perf/util/cs-etm-decoder/Build create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h create mode 100644 tools/perf/util/cs-etm.c
Adds new file cs-etm.c that adds auxtrace info event handler for cs-etm (coresight ETM) traces, event handling data structures and the necessary callback functions required to be provided in the auxtrace struct.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/Makefile.config | 5 + tools/perf/util/Build | 5 + tools/perf/util/auxtrace.c | 2 + tools/perf/util/cs-etm.c | 230 +++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 16 ++++ 5 files changed, 258 insertions(+) create mode 100644 tools/perf/util/cs-etm.c
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index 8354d04..d2c3f47 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -330,6 +330,11 @@ ifeq ($(feature-sched_getcpu), 1) CFLAGS += -DHAVE_SCHED_GETCPU_SUPPORT endif
+ifdef CSTRACE_PATH + CFLAGS-$(CONFIG_AUXTRACE) += -DHAVE_CSTRACE_SUPPORT +endif + + ifndef NO_LIBELF CFLAGS += -DHAVE_LIBELF_SUPPORT EXTLIBS += -lelf diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 79dea95..2377b9b 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -82,6 +82,11 @@ libperf-$(CONFIG_AUXTRACE) += auxtrace.o libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/ libperf-$(CONFIG_AUXTRACE) += intel-pt.o libperf-$(CONFIG_AUXTRACE) += intel-bts.o + +ifdef HAVE_CSTRACE_SUPPORT +libperf-$(CONFIG_AUXTRACE) += cs-etm.o +endif + libperf-y += parse-branch-options.o libperf-y += dump-insn.o libperf-y += parse-regs-options.o diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index 0daf63b..e1960ef 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -55,6 +55,7 @@ #include "debug.h" #include <subcmd/parse-options.h>
+#include "cs-etm.h" #include "intel-pt.h" #include "intel-bts.h"
@@ -904,6 +905,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused, case PERF_AUXTRACE_INTEL_BTS: return intel_bts_process_auxtrace_info(event, session); case PERF_AUXTRACE_CS_ETM: + return cs_etm__process_auxtrace_info(event, session); case PERF_AUXTRACE_UNKNOWN: default: return -EINVAL; diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c new file mode 100644 index 0000000..e3f4e86 --- /dev/null +++ b/tools/perf/util/cs-etm.c @@ -0,0 +1,230 @@ +/* + * Copyright(C) 2015-2017 Linaro Limited. All rights reserved. + * Author: Tor Jeremiassen tor@ti.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/bitops.h> +#include <linux/err.h> +#include <linux/kernel.h> +#include <linux/log2.h> +#include <linux/types.h> + +#include <stdlib.h> + +#include "auxtrace.h" +#include "color.h" +#include "cs-etm.h" +#include "debug.h" +#include "evlist.h" +#include "intlist.h" +#include "machine.h" +#include "perf.h" +#include "thread.h" +#include "thread_map.h" +#include "thread-stack.h" +#include "util.h" + +struct cs_etm_auxtrace { + struct auxtrace auxtrace; + struct auxtrace_queues queues; + struct auxtrace_heap heap; + u64 **metadata; + u32 auxtrace_type; + struct perf_session *session; + struct machine *machine; + struct perf_evsel *switch_evsel; + struct thread *unknown_thread; + uint32_t num_cpu; + bool timeless_decoding; + bool sampling_mode; + bool snapshot_mode; + bool data_queued; + bool synth_needs_swap; + bool sample_instructions; + u64 instructions_sample_type; + u64 instructions_sample_period; + u64 instructions_id; + struct itrace_synth_opts synth_opts; + unsigned int pmu_type; + u64 kernel_start; +}; + +struct cs_etm_queue { + struct cs_etm_auxtrace *etm; + unsigned int queue_nr; + struct auxtrace_buffer *buffer; + const struct cs_etm_state *state; + union perf_event *event_buf; + bool on_heap; + bool step_through_buffers; + bool use_buffer_pid_tid; + pid_t pid, tid; + int cpu; + struct thread *thread; + u64 time; + u64 timestamp; + bool stop; + struct cs_etm_decoder *decoder; + u64 offset; + bool eot; +}; + +static int cs_etm__flush_events(struct perf_session *session, + struct perf_tool *tool) +{ + (void) session; + (void) tool; + return 0; +} + +static void cs_etm__free_queue(void *priv) +{ + struct cs_etm_queue *etmq = priv; + + if (!etmq) + return; + + thread__zput(etmq->thread); + free(etmq); +} + +static void cs_etm__free_events(struct perf_session *session) +{ + struct cs_etm_auxtrace *aux = container_of(session->auxtrace, + struct cs_etm_auxtrace, + auxtrace); + struct auxtrace_queues *queues = &aux->queues; + unsigned int i; + + for (i = 0; i < queues->nr_queues; i++) { + cs_etm__free_queue(queues->queue_array[i].priv); + queues->queue_array[i].priv = NULL; + } + + auxtrace_queues__free(queues); +} + +static void cs_etm__free(struct perf_session *session) +{ + struct cs_etm_auxtrace *aux = container_of(session->auxtrace, + struct cs_etm_auxtrace, + auxtrace); + auxtrace_heap__free(&aux->heap); + cs_etm__free_events(session); + session->auxtrace = NULL; + + zfree(&aux); +} + +static int cs_etm__process_event(struct perf_session *session, + union perf_event *event, + struct perf_sample *sample, + struct perf_tool *tool) +{ + (void) session; + (void) event; + (void) sample; + (void) tool; + return 0; +} + +static int cs_etm__process_auxtrace_event(struct perf_session *session, + union perf_event *event, + struct perf_tool *tool) +{ + (void) session; + (void) event; + (void) tool; + return 0; +} + +int cs_etm__process_auxtrace_info(union perf_event *event, + struct perf_session *session) +{ + struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info; + size_t event_header_size = sizeof(struct perf_event_header); + size_t info_header_size; + size_t total_size = auxtrace_info->header.size; + struct cs_etm_auxtrace *etm = NULL; + int err = 0; + + /* + * sizeof(auxtrace_info_event::type) + + * sizeof(auxtrace_info_event::reserved) == 8 + */ + info_header_size = 8; + + if (total_size < (event_header_size + info_header_size)) + return -EINVAL; + + etm = zalloc(sizeof(*etm)); + + if (!etm) + return -ENOMEM; + + err = auxtrace_queues__init(&etm->queues); + if (err) + goto err_free_etm; + + etm->unknown_thread = thread__new(999999999, 999999999); + if (!etm->unknown_thread) { + err = -ENOMEM; + goto err_free_queues; + } + + err = thread__set_comm(etm->unknown_thread, "unknown", 0); + if (err) + goto err_delete_thread; + + etm->session = session; + etm->machine = &session->machines.host; + etm->kernel_start = machine__kernel_start(etm->machine); + + if (thread__init_map_groups(etm->unknown_thread, + etm->machine)) { + err = -ENOMEM; + goto err_delete_thread; + } + + etm->auxtrace_type = auxtrace_info->type; + + etm->auxtrace.process_event = cs_etm__process_event; + etm->auxtrace.process_auxtrace_event = cs_etm__process_auxtrace_event; + etm->auxtrace.flush_events = cs_etm__flush_events; + etm->auxtrace.free_events = cs_etm__free_events; + etm->auxtrace.free = cs_etm__free; + session->auxtrace = &etm->auxtrace; + + if (dump_trace) + return 0; + + err = auxtrace_queues__process_index(&etm->queues, session); + if (err) + goto err_delete_thread; + + etm->data_queued = etm->queues.populated; + + return 0; + +err_delete_thread: + thread__delete(etm->unknown_thread); +err_free_queues: + auxtrace_queues__free(&etm->queues); + session->auxtrace = NULL; +err_free_etm: + zfree(&etm); + + return err; +} diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 3cc6bc3..9e43ede 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -18,6 +18,9 @@ #ifndef INCLUDE__UTIL_PERF_CS_ETM_H__ #define INCLUDE__UTIL_PERF_CS_ETM_H__
+#include "util/event.h" +#include "util/session.h" + /* Versionning header in case things need tro change in the future. That way * decoding of old snapshot is still possible. */ @@ -71,4 +74,17 @@ static const u64 __perf_cs_etmv4_magic = 0x4040404040404040ULL; #define CS_ETMV3_PRIV_SIZE (CS_ETM_PRIV_MAX * sizeof(u64)) #define CS_ETMV4_PRIV_SIZE (CS_ETMV4_PRIV_MAX * sizeof(u64))
+#ifdef HAVE_CSTRACE_SUPPORT +int cs_etm__process_auxtrace_info(union perf_event *event, + struct perf_session *session); +#else +static inline int cs_etm__process_auxtrace_info(union perf_event *event, + struct perf_session *session) +{ + (void) event; + (void) session; + return -1; +} +#endif + #endif
The auxtrace_info section contains metadata that describes the number of trace capable CPUs, their ETM version and trace configuration, including trace id values. This information is required by the trace decoder in order to properly decode the compressed trace packets. This patch adds code to read and parse this metadata, and store it for use in configuring instances of the cs-etm trace decoder.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 196 ++++++++++++++++++++++++++++++++++++++++++++++- tools/perf/util/cs-etm.h | 3 + 2 files changed, 196 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index e3f4e86..85de61e 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -118,6 +118,8 @@ static void cs_etm__free_events(struct perf_session *session)
static void cs_etm__free(struct perf_session *session) { + size_t i; + struct int_node *inode, *tmp; struct cs_etm_auxtrace *aux = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace); @@ -125,6 +127,16 @@ static void cs_etm__free(struct perf_session *session) cs_etm__free_events(session); session->auxtrace = NULL;
+ /* First remove all traceID/CPU# nodes for the RB tree */ + intlist__for_each_entry_safe(inode, tmp, traceid_list) + intlist__remove(traceid_list, inode); + /* Then the RB tree itself */ + intlist__delete(traceid_list); + + for (i = 0; i < aux->num_cpu; i++) + zfree(&aux->metadata[i]); + + zfree(&aux->metadata); zfree(&aux); }
@@ -150,6 +162,53 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session, return 0; }
+static const char * const cs_etm_global_header_fmts[] = { + [CS_HEADER_VERSION_0] = " Header version %llx\n", + [CS_PMU_TYPE_CPUS] = " PMU type/num cpus %llx\n", + [CS_ETM_SNAPSHOT] = " Snapshot %llx\n", +}; + +static const char * const cs_etm_priv_fmts[] = { + [CS_ETM_MAGIC] = " Magic number %llx\n", + [CS_ETM_CPU] = " CPU %lld\n", + [CS_ETM_ETMCR] = " ETMCR %llx\n", + [CS_ETM_ETMTRACEIDR] = " ETMTRACEIDR %llx\n", + [CS_ETM_ETMCCER] = " ETMCCER %llx\n", + [CS_ETM_ETMIDR] = " ETMIDR %llx\n", +}; + +static const char * const cs_etmv4_priv_fmts[] = { + [CS_ETM_MAGIC] = " Magic number %llx\n", + [CS_ETM_CPU] = " CPU %lld\n", + [CS_ETMV4_TRCCONFIGR] = " TRCCONFIGR %llx\n", + [CS_ETMV4_TRCTRACEIDR] = " TRCTRACEIDR %llx\n", + [CS_ETMV4_TRCIDR0] = " TRCIDR0 %llx\n", + [CS_ETMV4_TRCIDR1] = " TRCIDR1 %llx\n", + [CS_ETMV4_TRCIDR2] = " TRCIDR2 %llx\n", + [CS_ETMV4_TRCIDR8] = " TRCIDR8 %llx\n", + [CS_ETMV4_TRCAUTHSTATUS] = " TRCAUTHSTATUS %llx\n", +}; + +static void cs_etm__print_auxtrace_info(u64 *val, size_t num) +{ + unsigned int i, j, cpu; + + for (i = 0; i < CS_HEADER_VERSION_0_MAX; i++) + fprintf(stdout, cs_etm_global_header_fmts[i], val[i]); + + for (i = CS_HEADER_VERSION_0_MAX, cpu = 0; cpu < num; cpu++) { + if (val[i] == __perf_cs_etmv3_magic) + for (j = 0; j < CS_ETM_PRIV_MAX; j++, i++) + fprintf(stdout, cs_etm_priv_fmts[j], val[i]); + else if (val[i] == __perf_cs_etmv4_magic) + for (j = 0; j < CS_ETMV4_PRIV_MAX; j++, i++) + fprintf(stdout, cs_etmv4_priv_fmts[j], val[i]); + else + /* failure.. return */ + return; + } +} + int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session) { @@ -157,8 +216,16 @@ int cs_etm__process_auxtrace_info(union perf_event *event, size_t event_header_size = sizeof(struct perf_event_header); size_t info_header_size; size_t total_size = auxtrace_info->header.size; + size_t priv_size = 0; + size_t num_cpu; struct cs_etm_auxtrace *etm = NULL; - int err = 0; + int err = 0, idx = -1; + u64 *ptr; + u64 *hdr = NULL; + u64 **metadata = NULL; + size_t i, j, k; + unsigned int pmu_type; + struct int_node *inode;
/* * sizeof(auxtrace_info_event::type) + @@ -169,11 +236,119 @@ int cs_etm__process_auxtrace_info(union perf_event *event, if (total_size < (event_header_size + info_header_size)) return -EINVAL;
- etm = zalloc(sizeof(*etm)); + priv_size = total_size - event_header_size - info_header_size; + + /* First the global part */ + ptr = (u64 *) auxtrace_info->priv; + + if (ptr[0] != 0) + return -EINVAL;
- if (!etm) + hdr = zalloc(sizeof(*hdr) * CS_HEADER_VERSION_0_MAX); + if (!hdr) return -ENOMEM;
+ for (i = 0; i < CS_HEADER_VERSION_0_MAX; i++) + hdr[i] = ptr[i]; + num_cpu = hdr[CS_PMU_TYPE_CPUS] & 0xffffffff; + pmu_type = (unsigned int) ((hdr[CS_PMU_TYPE_CPUS] >> 32) & + 0xffffffff); + + /* + * Create an RB tree for traceID-CPU# tuple. Since the conversion has + * to be made for each packet that gets decoded, optimizing access in + * anything other than a sequential array is worth doing. + */ + traceid_list = intlist__new(NULL); + if (!traceid_list) { + err = -ENOMEM; + goto err_free_hdr; + } + + metadata = zalloc(sizeof(*metadata) * num_cpu); + if (!metadata) { + err = -ENOMEM; + goto err_free_traceid_list; + } + + /* + * The metadata is stored in the auxtrace_info section and encodes + * the configuration of the ARM embedded trace macrocell which is + * required by the trace decoder to properly decode the trace due + * to its highly compressed nature. + */ + for (j = 0; j < num_cpu; j++) { + if (ptr[i] == __perf_cs_etmv3_magic) { + metadata[j] = zalloc(sizeof(*metadata[j]) * + CS_ETM_PRIV_MAX); + if (!metadata[j]) { + err = -ENOMEM; + goto err_free_metadata; + } + for (k = 0; k < CS_ETM_PRIV_MAX; k++) + metadata[j][k] = ptr[i + k]; + + /* The traceID is our handle */ + idx = metadata[j][CS_ETM_ETMTRACEIDR]; + i += CS_ETM_PRIV_MAX; + } else if (ptr[i] == __perf_cs_etmv4_magic) { + metadata[j] = zalloc(sizeof(*metadata[j]) * + CS_ETMV4_PRIV_MAX); + if (!metadata[j]) { + err = -ENOMEM; + goto err_free_metadata; + } + for (k = 0; k < CS_ETMV4_PRIV_MAX; k++) + metadata[j][k] = ptr[i + k]; + + /* The traceID is our handle */ + idx = metadata[j][CS_ETMV4_TRCTRACEIDR]; + i += CS_ETMV4_PRIV_MAX; + } + + /* Get an RB node for this CPU */ + inode = intlist__findnew(traceid_list, idx); + + /* Something went wrong, no need to continue */ + if (!inode) { + err = PTR_ERR(inode); + goto err_free_metadata; + } + + /* + * The node for that CPU should not be taken. + * Back out if that's the case. + */ + if (inode->priv) { + err = -EINVAL; + goto err_free_metadata; + } + /* All good, associate the traceID with the CPU# */ + inode->priv = &metadata[j][CS_ETM_CPU]; + } + + /* + * Each of CS_HEADER_VERSION_0_MAX, CS_ETM_PRIV_MAX and + * CS_ETMV4_PRIV_MAX mark how many double words are in the + * global metadata, and each cpu's metadata respectively. + * The following tests if the correct number of double words was + * present in the auxtrace info section. + */ + if (i * 8 != priv_size) { + err = -EINVAL; + goto err_free_metadata; + } + + if (dump_trace) + cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu); + + etm = zalloc(sizeof(*etm)); + + if (!etm) { + err = -ENOMEM; + goto err_free_metadata; + } + err = auxtrace_queues__init(&etm->queues); if (err) goto err_free_etm; @@ -198,6 +373,12 @@ int cs_etm__process_auxtrace_info(union perf_event *event, goto err_delete_thread; }
+ etm->num_cpu = num_cpu; + etm->pmu_type = pmu_type; + etm->snapshot_mode = (hdr[CS_ETM_SNAPSHOT] != 0); + etm->timeless_decoding = true; + etm->sampling_mode = false; + etm->metadata = metadata; etm->auxtrace_type = auxtrace_info->type;
etm->auxtrace.process_event = cs_etm__process_event; @@ -225,6 +406,15 @@ int cs_etm__process_auxtrace_info(union perf_event *event, session->auxtrace = NULL; err_free_etm: zfree(&etm); +err_free_metadata: + /* No need to check @metadata[j], free(NULL) is supported */ + for (j = 0; j < num_cpu; j++) + free(metadata[j]); + zfree(&metadata); +err_free_traceid_list: + intlist__delete(traceid_list); +err_free_hdr: + zfree(&hdr);
return err; } diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 9e43ede..fb5a2de 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -64,6 +64,9 @@ enum { CS_ETMV4_PRIV_MAX, };
+/* RB tree for quick conversion between traceID and CPUs */ +struct intlist *traceid_list; + #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024)
The actual decode of the binary trace is performed by a separate open source library that is linked in when enabled in the build process. The code that interacts with the decoder library is factored out into the cs-etm-decoder subdirectory, and has it's own api that is called from the auxtrace handling performed in cs-etm.c
This patch defines the initial basic data structures for the library interface.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/Makefile.config | 21 +++++ tools/perf/util/Build | 1 + tools/perf/util/cs-etm-decoder/Build | 2 + tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 45 ++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 107 ++++++++++++++++++++++++ 5 files changed, 176 insertions(+) create mode 100644 tools/perf/util/cs-etm-decoder/Build create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index d2c3f47..397ffbd 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -332,6 +332,27 @@ endif
ifdef CSTRACE_PATH CFLAGS-$(CONFIG_AUXTRACE) += -DHAVE_CSTRACE_SUPPORT + ifeq (${IS_64_BIT}, 1) + CSTRACE_LNX = linux64 + ifeq (${ARCH}, arm64) + CSTRACE_LNX = linux-arm64 + endif + else + CSTRACE_LNX = linux + ifeq (${ARCH}, arm) + CSTRACE_LNX = linux-arm + endif + endif + ifeq (${DEBUG}, 1) + LIBCSTRACE = -lcstraced_c_api -lcstraced + CSTRACE_LIB_PATH = $(CSTRACE_PATH)/lib/$(CSTRACE_LNX)/dbg + else + LIBCSTRACE = -lcstraced_c_api -lcstraced + CSTRACE_LIB_PATH = $(CSTRACE_PATH)/lib/$(CSTRACE_LNX)/rel + endif + $(call detected,CSTRACE) + $(call detected_var,CSTRACE_PATH) + EXTLIBS += -L$(CSTRACE_LIB_PATH) $(LIBCSTRACE) -lstdc++ endif
diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 2377b9b..1b39769 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -85,6 +85,7 @@ libperf-$(CONFIG_AUXTRACE) += intel-bts.o
ifdef HAVE_CSTRACE_SUPPORT libperf-$(CONFIG_AUXTRACE) += cs-etm.o +libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder/ endif
libperf-y += parse-branch-options.o diff --git a/tools/perf/util/cs-etm-decoder/Build b/tools/perf/util/cs-etm-decoder/Build new file mode 100644 index 0000000..d926514 --- /dev/null +++ b/tools/perf/util/cs-etm-decoder/Build @@ -0,0 +1,2 @@ +CFLAGS_cs-etm-decoder.o += -I$(CSTRACE_PATH)/include +libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder.o diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c new file mode 100644 index 0000000..ee213a1 --- /dev/null +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -0,0 +1,45 @@ +/* + * Copyright(C) 2015-2017 Linaro Limited. All rights reserved. + * Author: Tor Jeremiassen tor@ti.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General + * Public License for more details. + * + * You should have received a copy of the GNU GEneral Public License along + * with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/err.h> +#include <stdlib.h> + +#include "cs-etm.h" +#include "cs-etm-decoder.h" +#include "c_api/opencsd_c_api.h" +#include "etmv4/trc_pkt_types_etmv4.h" +#include "ocsd_if_types.h" +#include "util.h" + + +#define MAX_BUFFER 1024 + +struct cs_etm_decoder { + struct cs_etm_state state; + dcd_tree_handle_t dcd_tree; + void (*packet_printer)(const char *); + cs_etm_mem_cb_type mem_access; + ocsd_datapath_resp_t prev_return; + size_t prev_processed; + bool trace_on; + bool discontinuity; + struct cs_etm_packet packet_buffer[MAX_BUFFER]; + uint32_t packet_count; + uint32_t head; + uint32_t tail; + uint32_t end_tail; +}; diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h new file mode 100644 index 0000000..ff4a2c6 --- /dev/null +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -0,0 +1,107 @@ +/* + * Copyright(C) 2015-2017 Linaro Limited. All rights reserved. + * Author: Tor Jeremiassen tor@ti.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General + * Public License for more details. + * + * You should have received a copy of the GNU GEneral Public License along + * with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef INCLUDE__CS_ETM_DECODER_H__ +#define INCLUDE__CS_ETM_DECODER_H__ + +#include <linux/types.h> +#include <stdio.h> + +struct cs_etm_decoder; + +struct cs_etm_buffer { + const unsigned char *buf; + size_t len; + uint64_t offset; + uint64_t ref_timestamp; +}; + +enum cs_etm_sample_type { + CS_ETM_RANGE = 1 << 0, +}; + +struct cs_etm_state { + int err; + void *data; + unsigned int isa; + uint64_t start; + uint64_t end; + uint64_t timestamp; +}; + +struct cs_etm_packet { + enum cs_etm_sample_type sample_type; + uint64_t start_addr; + uint64_t end_addr; + bool exc; + bool exc_ret; + int cpu; +}; + +struct cs_etm_queue; + +typedef uint32_t (*cs_etm_mem_cb_type)(struct cs_etm_queue *, uint64_t, + size_t, uint8_t *); + +struct cs_etm_trace_params { + void *etmv4i_packet_handler; + uint32_t reg_idr0; + uint32_t reg_idr1; + uint32_t reg_idr2; + uint32_t reg_idr8; + uint32_t reg_configr; + uint32_t reg_traceidr; + int protocol; +}; + +struct cs_etm_decoder_params { + int operation; + void (*packet_printer)(const char *); + cs_etm_mem_cb_type mem_acc_cb; + bool formatted; + bool fsyncs; + bool hsyncs; + bool frame_aligned; + void *data; +}; + + +/* Error return codes */ +enum { + CS_ETM_ERR_NOMEM = 1, + CS_ETM_ERR_NODATA, + CS_ETM_ERR_PARAM, + CS_ETM_ERR_OVERFLOW, + CS_ETM_ERR_DECODER, +}; + +/* + * The following enums are indexed starting with 1 to align with the + * open source coresight trace decoder library. + */ + +enum { + CS_ETM_PROTO_ETMV3 = 1, + CS_ETM_PROTO_ETMV4i, + CS_ETM_PROTO_ETMV4d, +}; + +enum { + CS_ETM_OPERATION_PRINT = 1, + CS_ETM_OPERATION_DECODE, +}; +#endif /* INCLUDE__CS_ETM_DECODER_H__ */
The trace decoder library, after it has been initialized, works by being repeatedly called to process data and given a pointer into a data buffer. Each call consumes 0 or more of the buffer, and may result in one or more callbacks to the trace packet handler registered with the decoder. Multiple calls may occur as a trace stream atom can encode multiple branch events, and thus generate multiple trace packets in the decoded trace. The function returns a result code that determines whether it needs to be flushed or decoding can continue (up to the point at which the data buffer has been exhausted).
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 60 +++++++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 5 +++ 2 files changed, 65 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index ee213a1..b5ee127 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -43,3 +43,63 @@ struct cs_etm_decoder { uint32_t tail; uint32_t end_tail; }; + +const struct cs_etm_state * +cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, + uint64_t indx, const uint8_t *buf, + size_t len, size_t *consumed) +{ + int ret = 0; + ocsd_datapath_resp_t dp_ret = decoder->prev_return; + size_t processed = 0; + + if (!decoder) + return NULL; + + if (decoder->packet_count > 0) { + decoder->state.err = ret; + *consumed = processed; + return &decoder->state; + } + + while ((processed < len) && (ret == 0)) { + if (OCSD_DATA_RESP_IS_WAIT(dp_ret)) { + dp_ret = ocsd_dt_process_data(decoder->dcd_tree, + OCSD_OP_FLUSH, + 0, + 0, + NULL, + NULL); + break; + } else if (OCSD_DATA_RESP_IS_CONT(dp_ret)) { + uint32_t count; + + dp_ret = ocsd_dt_process_data(decoder->dcd_tree, + OCSD_OP_DATA, + indx + processed, + len - processed, + &buf[processed], + &count); + processed += count; + } else { + ret = -CS_ETM_ERR_DECODER; + } + + } + /* + * Adjust the counts of processed and previously processed + * data based on the return code and previous return code.. + */ + if (OCSD_DATA_RESP_IS_WAIT(dp_ret)) { + if (OCSD_DATA_RESP_IS_CONT(decoder->prev_return)) + decoder->prev_processed = processed; + processed = 0; + } else if (OCSD_DATA_RESP_IS_WAIT(decoder->prev_return)) { + processed = decoder->prev_processed; + decoder->prev_processed = 0; + } + *consumed = processed; + decoder->prev_return = dp_ret; + decoder->state.err = ret; + return &decoder->state; +} diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index ff4a2c6..9420f0f 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -104,4 +104,9 @@ enum { CS_ETM_OPERATION_PRINT = 1, CS_ETM_OPERATION_DECODE, }; + +const struct cs_etm_state * +cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, + uint64_t indx, const uint8_t *buf, + size_t len, size_t *consumed); #endif /* INCLUDE__CS_ETM_DECODER_H__ */
The decoder library can be configured to do full trace decoding or just parse the compressed trace stream to generate string representation of the individual trace packets that can be printed.
This patch adds code to the decoder library interface that generates basic configuration for ETMv4 trace decoding and configures the trace decoder library to perform basic packet printing for ETMv4, using a callback function registered from the coresight ETM code.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 106 ++++++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 5 ++ 2 files changed, 111 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index b5ee127..6850afb 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -103,3 +103,109 @@ cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, decoder->state.err = ret; return &decoder->state; } + +static void cs_etm_decoder__gen_etmv4_config(struct cs_etm_trace_params *params, + ocsd_etmv4_cfg *config) +{ + config->reg_configr = params->reg_configr; + config->reg_traceidr = params->reg_traceidr; + config->reg_idr0 = params->reg_idr0; + config->reg_idr1 = params->reg_idr1; + config->reg_idr2 = params->reg_idr2; + config->reg_idr8 = params->reg_idr8; + config->reg_idr9 = 0; + config->reg_idr10 = 0; + config->reg_idr11 = 0; + config->reg_idr12 = 0; + config->reg_idr13 = 0; + config->arch_ver = ARCH_V8; + config->core_prof = profile_CortexA; +} + +static ocsd_datapath_resp_t +cs_etm_decoder__etmv4i_packet_printer(const void *context, + const ocsd_datapath_op_t op, + const ocsd_trc_index_t indx, + const ocsd_etmv4_i_pkt *pkt) +{ + const size_t PACKET_STR_LEN = 1024; + ocsd_datapath_resp_t ret = OCSD_RESP_CONT; + char packet_str[PACKET_STR_LEN]; + size_t offset; + struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context; + + sprintf(packet_str, "%ld: ", (long int) indx); + offset = strlen(packet_str); + + switch (op) { + case OCSD_OP_DATA: + if (ocsd_pkt_str(OCSD_PROTOCOL_ETMV4I, + (void *)pkt, packet_str + offset, + PACKET_STR_LEN - offset) != OCSD_OK) + ret = OCSD_RESP_FATAL_INVALID_PARAM; + break; + case OCSD_OP_EOT: + sprintf(packet_str, "**** END OF TRACE ****\n"); + break; + case OCSD_OP_FLUSH: + case OCSD_OP_RESET: + default: + break; + } + + decoder->packet_printer(packet_str); + + return ret; +} + +static int cs_etm_decoder__create_etmv4i_packet_printer( + struct cs_etm_decoder_params *d_params, + struct cs_etm_trace_params *t_params, + struct cs_etm_decoder *decoder) +{ + ocsd_etmv4_cfg trace_config; + int ret = 0; + unsigned char CSID; /* CSID extracted from the config data */ + + if (!d_params->packet_printer) + return -CS_ETM_ERR_PARAM; + + cs_etm_decoder__gen_etmv4_config(t_params, &trace_config); + + decoder->packet_printer = d_params->packet_printer; + + ret = ocsd_dt_create_decoder(decoder->dcd_tree, + OCSD_BUILTIN_DCD_ETMV4I, + OCSD_CREATE_FLG_PACKET_PROC, + (void *)&trace_config, &CSID); + + if (ret != 0) + return -CS_ETM_ERR_DECODER; + + ret = ocsd_dt_attach_packet_callback(decoder->dcd_tree, + CSID, OCSD_C_API_CB_PKT_SINK, + cs_etm_decoder__etmv4i_packet_printer, + decoder); + + if (ret != 0) + return -CS_ETM_ERR_DECODER; + + return 0; +} + +int +cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, + struct cs_etm_trace_params *t_params, + struct cs_etm_decoder *decoder) +{ + int ret; + + if (d_params->operation == CS_ETM_OPERATION_PRINT) + ret = cs_etm_decoder__create_etmv4i_packet_printer(d_params, + t_params, + decoder); + else + ret = -CS_ETM_ERR_PARAM; + + return ret; +} diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 9420f0f..88aa2b4 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -109,4 +109,9 @@ const struct cs_etm_state * cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, uint64_t indx, const uint8_t *buf, size_t len, size_t *consumed); + +int +cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, + struct cs_etm_trace_params *t_params, + struct cs_etm_decoder *decoder); #endif /* INCLUDE__CS_ETM_DECODER_H__ */
This patch adds functions in to create and destroy instances of the decoder library interface.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 68 +++++++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 7 +++ 2 files changed, 75 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 6850afb..814d7ff 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -209,3 +209,71 @@ cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params,
return ret; } + +struct cs_etm_decoder * +cs_etm_decoder__new(uint32_t num_cpu, struct cs_etm_decoder_params *d_params, + struct cs_etm_trace_params t_params[]) +{ + struct cs_etm_decoder *decoder; + ocsd_dcd_tree_src_t format; + uint32_t flags; + int ret; + size_t i; + + if ((!t_params) || (!d_params)) + return NULL; + + decoder = zalloc(sizeof(*decoder)); + + if (!decoder) + return NULL; + + decoder->state.data = d_params->data; + decoder->prev_return = OCSD_RESP_CONT; + format = (d_params->formatted ? OCSD_TRC_SRC_FRAME_FORMATTED : + OCSD_TRC_SRC_SINGLE); + flags = 0; + flags |= (d_params->fsyncs ? OCSD_DFRMTR_HAS_FSYNCS : 0); + flags |= (d_params->hsyncs ? OCSD_DFRMTR_HAS_HSYNCS : 0); + flags |= (d_params->frame_aligned ? OCSD_DFRMTR_FRAME_MEM_ALIGN : 0); + + /* Create decode tree for the data source */ + decoder->dcd_tree = ocsd_create_dcd_tree(format, flags); + + if (decoder->dcd_tree == 0) + goto err_free_decoder; + + for (i = 0; i < num_cpu; i++) { + switch (t_params[i].protocol) { + case CS_ETM_PROTO_ETMV4i: + ret = cs_etm_decoder__create_etmv4i_decoder( + d_params, + &t_params[i], + decoder); + if (ret != 0) + goto err_free_decoder_tree; + break; + default: + goto err_free_decoder_tree; + } + } + + return decoder; + +err_free_decoder_tree: + ocsd_destroy_dcd_tree(decoder->dcd_tree); +err_free_decoder: + free(decoder); + return NULL; +} + +void cs_etm_decoder__free(struct cs_etm_decoder *decoder) +{ + if (!decoder) + return; + + ocsd_destroy_dcd_tree(decoder->dcd_tree); + decoder->dcd_tree = NULL; + + free(decoder); +} diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 88aa2b4..f460618 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -110,6 +110,13 @@ cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, uint64_t indx, const uint8_t *buf, size_t len, size_t *consumed);
+struct cs_etm_decoder * +cs_etm_decoder__new(uint32_t num_cpu, + struct cs_etm_decoder_params *d_params, + struct cs_etm_trace_params t_params[]); + +void cs_etm_decoder__free(struct cs_etm_decoder *decoder); + int cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params,
perf report may be called with the -D option to dump the raw trace in ASCII.
This patch adds two functions. One to print the string representation of a trace packet. The other takes an auxtrace buffer and instantiates a trace decoder configured only to parse the trace stream into trace packets and print the packets using the first function. Then pushes the trace data through the instantiated decoder to generate the output.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 6 +++++ 2 files changed, 73 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 85de61e..e5d02cf 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -81,6 +81,73 @@ struct cs_etm_queue { bool eot; };
+static void cs_etm__packet_dump(const char *pkt_string) +{ + const char *color = PERF_COLOR_BLUE; + + color_fprintf(stdout, color, " %s\n", pkt_string); + fflush(stdout); +} + +void cs_etm__dump_event(struct cs_etm_auxtrace *etm, + struct auxtrace_buffer *buffer) +{ + const char *color = PERF_COLOR_BLUE; + struct cs_etm_decoder_params d_params; + struct cs_etm_trace_params *t_params; + struct cs_etm_decoder *decoder; + size_t buffer_used = 0; + size_t i; + + fprintf(stdout, "\n"); + color_fprintf(stdout, color, + ". ... CoreSight ETM Trace data: size %zu bytes\n", + buffer->size); + + /* Use metadata to fill in trace parameters for trace decoder */ + t_params = zalloc(sizeof(*t_params) * etm->num_cpu); + for (i = 0; i < etm->num_cpu; i++) { + t_params[i].protocol = CS_ETM_PROTO_ETMV4i; + t_params[i].reg_idr0 = etm->metadata[i][CS_ETMV4_TRCIDR0]; + t_params[i].reg_idr1 = etm->metadata[i][CS_ETMV4_TRCIDR1]; + t_params[i].reg_idr2 = etm->metadata[i][CS_ETMV4_TRCIDR2]; + t_params[i].reg_idr8 = etm->metadata[i][CS_ETMV4_TRCIDR8]; + t_params[i].reg_configr = etm->metadata[i][CS_ETMV4_TRCCONFIGR]; + t_params[i].reg_traceidr = + etm->metadata[i][CS_ETMV4_TRCTRACEIDR]; + } + + /* Set decoder parameters to simply print the trace packets */ + d_params.packet_printer = cs_etm__packet_dump; + d_params.operation = CS_ETM_OPERATION_PRINT; + d_params.formatted = true; + d_params.fsyncs = false; + d_params.hsyncs = false; + d_params.frame_aligned = true; + + decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params); + + zfree(&t_params); + + if (!decoder) + return; + do { + size_t consumed; + const struct cs_etm_state *state; + + state = cs_etm_decoder__process_data_block( + decoder, buffer->offset, + &((uint8_t *)buffer->data)[buffer_used], + buffer->size - buffer_used, &consumed); + if (state && state->err) + break; + + buffer_used += consumed; + } while (buffer_used < buffer->size); + + cs_etm_decoder__free(decoder); +} + static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool) { diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index fb5a2de..6139280 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -90,4 +90,10 @@ static inline int cs_etm__process_auxtrace_info(union perf_event *event, } #endif
+struct cs_etm_auxtrace; +struct auxtrace_buffer; + +void cs_etm__dump_event(struct cs_etm_auxtrace *etm, + struct auxtrace_buffer *buffer); + #endif
This patch adds a function called by the perf auxtrace subsystem to process the auxtrace event for coresight etm. In particular it adds the event and trace data to the auxtrace queues for further processing. In the case where -D has been specified, the raw trace packets are dumped.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index e5d02cf..dede50c 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -223,9 +223,38 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session, union perf_event *event, struct perf_tool *tool) { - (void) session; - (void) event; + struct cs_etm_auxtrace *etm = container_of(session->auxtrace, + struct cs_etm_auxtrace, + auxtrace); + (void) tool; + + if (!etm->data_queued) { + struct auxtrace_buffer *buffer; + off_t data_offset; + int fd = perf_data_file__fd(session->file); + bool is_pipe = perf_data_file__is_pipe(session->file); + int err; + + if (is_pipe) + data_offset = 0; + else { + data_offset = lseek(fd, 0, SEEK_CUR); + if (data_offset == -1) + return -errno; + } + + err = auxtrace_queues__add_event(&etm->queues, session, + event, data_offset, &buffer); + if (err) + return err; + + if (dump_trace) + if (auxtrace_buffer__get_data(buffer, fd)) { + cs_etm__dump_event(etm, buffer); + auxtrace_buffer__put_data(buffer); + } + } return 0; }
The trace decoder library requires access to the instruction opcodes of the program that was traced in order to be able to decode the trace stream into a series of executed instruction ranges.
This patch implements a function that will be used by the trace decoder interface to enable the trace decoder library to access instruction opcodes stored in DSOs.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 4 ++++ 2 files changed, 53 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index dede50c..b81b7f4 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -207,6 +207,55 @@ static void cs_etm__free(struct perf_session *session) zfree(&aux); }
+uint32_t cs_etm__mem_access(struct cs_etm_queue *etmq, + uint64_t address, + size_t size, + uint8_t *buffer) +{ + struct addr_location al; + uint64_t offset; + struct thread *thread; + struct machine *machine; + uint8_t cpumode; + int len; + + if (!etmq) + return -1; + + machine = etmq->etm->machine; + if (address >= etmq->etm->kernel_start) + cpumode = PERF_RECORD_MISC_KERNEL; + else + cpumode = PERF_RECORD_MISC_USER; + + thread = etmq->thread; + if (!thread) { + if (cpumode != PERF_RECORD_MISC_KERNEL) + return -EINVAL; + thread = etmq->etm->unknown_thread; + } + + thread__find_addr_map(thread, cpumode, MAP__FUNCTION, address, &al); + + if (!al.map || !al.map->dso) + return 0; + + if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR && + dso__data_status_seen(al.map->dso, DSO_DATA_STATUS_SEEN_ITRACE)) + return 0; + + offset = al.map->map_ip(al.map, address); + + map__load(al.map); + + len = dso__data_read_offset(al.map->dso, machine, offset, buffer, size); + + if (len <= 0) + return 0; + + return len; +} + static int cs_etm__process_event(struct perf_session *session, union perf_event *event, struct perf_sample *sample, diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 6139280..d0205a3 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -96,4 +96,8 @@ struct auxtrace_buffer; void cs_etm__dump_event(struct cs_etm_auxtrace *etm, struct auxtrace_buffer *buffer);
+struct cs_etm_queue; + +uint32_t cs_etm__mem_access(struct cs_etm_queue *etmq, uint64_t address, + size_t size, uint8_t *buffer); #endif
The coresight trace event queues are organized one to a cpu. This patch adds a function to map a cpu number to a specific event queue.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 32 ++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 2 ++ 2 files changed, 34 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index b81b7f4..f92ceae 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -207,6 +207,38 @@ static void cs_etm__free(struct perf_session *session) zfree(&aux); }
+struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm, int cpu) +{ + int q, j; + + if (etm->queues.nr_queues == 0) + return NULL; + + /* make sure q is in range even if cpu is not */ + if (cpu < 0) + q = 0; + else if ((unsigned int) cpu >= etm->queues.nr_queues) + q = etm->queues.nr_queues - 1; + else + q = cpu; + + /* try the obvious one first */ + if (etm->queues.queue_array[q].cpu == cpu) + return etm->queues.queue_array[q].priv; + + /* search for a match in queues with index < q */ + for (j = q - 1; j >= 0; j--) + if (etm->queues.queue_array[j].cpu == cpu) + return etm->queues.queue_array[j].priv; + + /* search for a match in the queues with index > q */ + for (j = q + 1; j < (int) etm->queues.nr_queues; j++) + if (etm->queues.queue_array[j].cpu == cpu) + return etm->queues.queue_array[j].priv; + + return NULL; +} + uint32_t cs_etm__mem_access(struct cs_etm_queue *etmq, uint64_t address, size_t size, diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index d0205a3..846cd48 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -100,4 +100,6 @@ struct cs_etm_queue;
uint32_t cs_etm__mem_access(struct cs_etm_queue *etmq, uint64_t address, size_t size, uint8_t *buffer); + +struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm, int cpu); #endif
Auxtrace events are associated with auxtrace_queues. The auxtrace queues are extended using the "priv" field to point to a cs-etm specific structure associated with each queue.
This patch adds functions to allocate and free these structures, including instantiating the trace decoder library interface, and providing the function used to register a memory access function with the decoder library.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 32 ++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 4 ++ tools/perf/util/cs-etm.c | 80 +++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 4 ++ 4 files changed, 120 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 814d7ff..57390ae 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -44,6 +44,38 @@ struct cs_etm_decoder { uint32_t end_tail; };
+static uint32_t cs_etm_decoder__mem_access(const void *context, + const ocsd_vaddr_t address, + const ocsd_mem_space_acc_t mem_space, + const uint32_t req_size, + uint8_t *buffer) +{ + struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context; + (void) mem_space; + + return decoder->mem_access(decoder->state.data, + address, + req_size, + buffer); +} + +int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, + uint64_t start, uint64_t end, + cs_etm_mem_cb_type cb_func) +{ + int err; + + decoder->mem_access = cb_func; + err = ocsd_dt_add_callback_mem_acc(decoder->dcd_tree, start, end, + OCSD_MEM_SPACE_ANY, + cs_etm_decoder__mem_access, + decoder); + if (err) + return -CS_ETM_ERR_DECODER; + + return 0; +} + const struct cs_etm_state * cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, uint64_t indx, const uint8_t *buf, diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index f460618..60ce594 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -117,6 +117,10 @@ cs_etm_decoder__new(uint32_t num_cpu,
void cs_etm_decoder__free(struct cs_etm_decoder *decoder);
+int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, + uint64_t start, uint64_t end, + cs_etm_mem_cb_type cb_func); + int cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f92ceae..6e73ce7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -164,6 +164,8 @@ static void cs_etm__free_queue(void *priv) return;
thread__zput(etmq->thread); + cs_etm_decoder__free(etmq->decoder); + zfree(&etmq->event_buf); free(etmq); }
@@ -207,6 +209,84 @@ static void cs_etm__free(struct perf_session *session) zfree(&aux); }
+struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, + unsigned int queue_nr) +{ + struct cs_etm_decoder_params d_params; + struct cs_etm_trace_params *t_params; + struct cs_etm_queue *etmq; + size_t i; + + etmq = zalloc(sizeof(*etmq)); + if (!etmq) + return NULL; + + etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE); + if (!etmq->event_buf) + goto out_free; + + etmq->etm = etm; + etmq->queue_nr = queue_nr; + etmq->pid = -1; + etmq->tid = -1; + etmq->cpu = -1; + etmq->stop = false; + + /* Use metadata to fill in trace parameters for trace decoder */ + t_params = zalloc(sizeof(*t_params) * etm->num_cpu); + + if (!t_params) + goto out_free; + + for (i = 0; i < etm->num_cpu; i++) { + t_params[i].reg_idr0 = etm->metadata[i][CS_ETMV4_TRCIDR0]; + t_params[i].reg_idr1 = etm->metadata[i][CS_ETMV4_TRCIDR1]; + t_params[i].reg_idr2 = etm->metadata[i][CS_ETMV4_TRCIDR2]; + t_params[i].reg_idr8 = etm->metadata[i][CS_ETMV4_TRCIDR8]; + t_params[i].reg_configr = etm->metadata[i][CS_ETMV4_TRCCONFIGR]; + t_params[i].reg_traceidr = + etm->metadata[i][CS_ETMV4_TRCTRACEIDR]; + t_params[i].protocol = CS_ETM_PROTO_ETMV4i; + } + + /* Set decoder parameters to simply print the trace packets */ + d_params.packet_printer = cs_etm__packet_dump; + d_params.operation = CS_ETM_OPERATION_DECODE; + d_params.formatted = true; + d_params.fsyncs = false; + d_params.hsyncs = false; + d_params.frame_aligned = true; + d_params.data = etmq; + + etmq->decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params); + + zfree(&t_params); + + if (!etmq->decoder) + goto out_free; + + /* + * Register a function to handle all memory accesses required by + * the trace decoder library. + */ + if (cs_etm_decoder__add_mem_access_cb(etmq->decoder, + 0x0L, ((u64) -1L), + cs_etm__mem_access)) + goto out_free_decoder; + + etmq->offset = 0; + etmq->eot = false; + + return etmq; + +out_free_decoder: + cs_etm_decoder__free(etmq->decoder); +out_free: + zfree(&etmq->event_buf); + free(etmq); + return NULL; +} + struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm, int cpu) { int q, j; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 846cd48..0bb35d6 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -102,4 +102,8 @@ uint32_t cs_etm__mem_access(struct cs_etm_queue *etmq, uint64_t address, size_t size, uint8_t *buffer);
struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm, int cpu); + +struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, + unsigned int queue_nr); + #endif
As events are processed, based on the time stamp of the event (or if timeless decoding is enabled), the coresight event queues are updated (or initialized on the first call), prior to any traces being decoded.
This patch adds two functions to manage the setup and update of the coresight event queues.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 2 ++ 2 files changed, 57 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 6e73ce7..6fcd985 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -287,6 +287,61 @@ struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, return NULL; }
+static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, + struct auxtrace_queue *queue, + unsigned int queue_nr) +{ + struct cs_etm_queue *etmq = queue->priv; + + if (list_empty(&queue->head)) + return 0; + + if (!etmq) { + etmq = cs_etm__alloc_queue(etm, queue_nr); + + if (!etmq) + return -ENOMEM; + + queue->priv = etmq; + + if (queue->cpu != -1) + etmq->cpu = queue->cpu; + + etmq->tid = queue->tid; + + if (etm->sampling_mode) { + if (etm->timeless_decoding) + etmq->step_through_buffers = true; + if (etm->timeless_decoding) + etmq->use_buffer_pid_tid = true; + } + } + + return 0; +} + +static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) +{ + unsigned int i; + int ret; + + for (i = 0; i < etm->queues.nr_queues; i++) { + ret = cs_etm__setup_queue(etm, &etm->queues.queue_array[i], i); + if (ret) + return ret; + } + return 0; +} + +int cs_etm__update_queues(struct cs_etm_auxtrace *etm) +{ + if (etm->queues.new_data) { + etm->queues.new_data = false; + return cs_etm__setup_queues(etm); + } + return 0; +} + struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm, int cpu) { int q, j; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0bb35d6..d1cb049 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -106,4 +106,6 @@ struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm, int cpu); struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, unsigned int queue_nr);
+int cs_etm__update_queues(struct cs_etm_auxtrace *etm); + #endif
Each decoded trace sample encodes the execution of a sequence of instructions between to "waypoints", typically a branch target and the next taken branch. This sample has to be converted into a perf sample struct before it can be passed on to the perf subsystem.
This patch adds two functions that take a decoded trace packet and populate the perf sample structure and pass it on to perf session event processing. It also adds a function in the decoder library interface to return a decoded trace packet.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 ++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 3 ++ tools/perf/util/cs-etm.c | 64 +++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 2 + 4 files changed, 90 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 57390ae..27cf6c2 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -76,6 +76,27 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, return 0; }
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, + struct cs_etm_packet *packet) +{ + if (!decoder) + return -CS_ETM_ERR_PARAM; + + if (decoder->packet_count == 0) + return -CS_ETM_ERR_NODATA; + + if (!packet) + return -CS_ETM_ERR_PARAM; + + *packet = decoder->packet_buffer[decoder->head]; + + decoder->head = (decoder->head + 1) & (MAX_BUFFER - 1); + + decoder->packet_count--; + + return 0; +} + const struct cs_etm_state * cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, uint64_t indx, const uint8_t *buf, diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 60ce594..47d9bcf 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -121,6 +121,9 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, uint64_t start, uint64_t end, cs_etm_mem_cb_type cb_func);
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, + struct cs_etm_packet *packet); + int cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 6fcd985..63431b6 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -333,6 +333,70 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; }
+/* + * The cs etm packet encodes an instruction range between a branch target + * and the next taken branch. Generate sample accordingly. + */ +static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, + struct cs_etm_packet *packet) +{ + int ret = 0; + struct cs_etm_auxtrace *etm = etmq->etm; + union perf_event *event = etmq->event_buf; + struct perf_sample sample = {.ip = 0,}; + uint64_t start_addr = packet->start_addr; + uint64_t end_addr = packet->end_addr; + + event->sample.header.type = PERF_RECORD_SAMPLE; + event->sample.header.misc = PERF_RECORD_MISC_USER; + event->sample.header.size = sizeof(struct perf_event_header); + + sample.ip = start_addr; + sample.pid = etmq->pid; + sample.tid = etmq->tid; + sample.addr = end_addr; + sample.id = etmq->etm->instructions_id; + sample.stream_id = etmq->etm->instructions_id; + /* approximate the period to be the number of words in the range */ + sample.period = (end_addr - start_addr) >> 2; + sample.cpu = packet->cpu; + sample.flags = 0; + sample.insn_len = 1; + sample.cpumode = event->header.misc; + + ret = perf_session__deliver_synth_event(etm->session, event, &sample); + + if (ret) + pr_err( + "CS ETM Trace: failed to deliver instruction event, error %d\n", + ret); + + return ret; +} + +int cs_etm__sample(struct cs_etm_queue *etmq) +{ + struct cs_etm_packet packet; + int err; + + err = cs_etm_decoder__get_packet(etmq->decoder, &packet); + /* if there is no sample, it returns err = -1, no real error */ + if (err) + return err; + + /* + * if the packet contains an instruction range, generate + * an instruction sequence event + */ + if (packet.sample_type & CS_ETM_RANGE) { + err = cs_etm__synth_instruction_sample(etmq, &packet); + if (err) + return err; + } + + return 0; +} + int cs_etm__update_queues(struct cs_etm_auxtrace *etm) { if (etm->queues.new_data) { diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index d1cb049..aa7e0d4 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -108,4 +108,6 @@ struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
+int cs_etm__sample(struct cs_etm_queue *etmq); + #endif
The raw trace data is stored in an auxtrace buffer. This patch adds functions to associate the trace data with the buffer in the coresight queue, prior to decoding.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 4 +++ 2 files changed, 82 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 63431b6..d9d3eeb 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -209,6 +209,84 @@ static void cs_etm__free(struct perf_session *session) zfree(&aux); }
+static void cs_etm__use_buffer_pid_tid(struct cs_etm_queue *etmq, + struct auxtrace_queue *queue, + struct auxtrace_buffer *buffer) +{ + if ((queue->cpu == -1) && (buffer->cpu != -1)) + etmq->cpu = buffer->cpu; + + etmq->pid = buffer->pid; + etmq->tid = buffer->tid; + + thread__zput(etmq->thread); + + if (etmq->tid != -1) { + etmq->thread = machine__findnew_thread(etmq->etm->machine, + etmq->pid, etmq->tid); + } +} + +int cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq) +{ + struct auxtrace_buffer *aux_buffer = etmq->buffer; + struct auxtrace_buffer *old_buffer = aux_buffer; + struct auxtrace_queue *queue; + + if (etmq->stop) { + buff->len = 0; + return 0; + } + + queue = &etmq->etm->queues.queue_array[etmq->queue_nr]; + + aux_buffer = auxtrace_buffer__next(queue, aux_buffer); + + /* if no more data, drop the previous auxtrace_buffer and return */ + if (!aux_buffer) { + if (old_buffer) + auxtrace_buffer__drop_data(old_buffer); + buff->len = 0; + return 0; + } + + etmq->buffer = aux_buffer; + + /* if the aux_buffer doesn't have data associated, try to load it */ + if (!aux_buffer->data) { + /* get the file desc associated with the perf data file */ + int fd = perf_data_file__fd(etmq->etm->session->file); + + aux_buffer->data = auxtrace_buffer__get_data(aux_buffer, fd); + if (!aux_buffer->data) + return -ENOMEM; + } + + /* if valid, drop the previous buffer */ + if (old_buffer) + auxtrace_buffer__drop_data(old_buffer); + + buff->offset = aux_buffer->offset; + if (aux_buffer->use_data) { + buff->len = aux_buffer->use_size; + buff->buf = aux_buffer->use_data; + } else { + buff->len = aux_buffer->size; + buff->buf = aux_buffer->data; + } + buff->ref_timestamp = aux_buffer->reference; + + if (etmq->use_buffer_pid_tid && + ((etmq->pid != aux_buffer->pid) || + (etmq->tid != aux_buffer->tid))) + cs_etm__use_buffer_pid_tid(etmq, queue, aux_buffer); + + if (etmq->step_through_buffers) + etmq->stop = true; + + return buff->len; +} + struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, unsigned int queue_nr) { diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index aa7e0d4..54d667c 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -92,6 +92,7 @@ static inline int cs_etm__process_auxtrace_info(union perf_event *event,
struct cs_etm_auxtrace; struct auxtrace_buffer; +struct cs_etm_buffer;
void cs_etm__dump_event(struct cs_etm_auxtrace *etm, struct auxtrace_buffer *buffer); @@ -110,4 +111,7 @@ int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
int cs_etm__sample(struct cs_etm_queue *etmq);
+int cs_etm__get_trace(struct cs_etm_buffer *buff, + struct cs_etm_queue *etmq); + #endif
The trace is decoded by a separate trace decoder library. This patch adds a function that gets a block of trace data and calls the decoder library interface code to decode the trace data. The decoded trace packets are returned in calls to call-back functions that were passed in when the decoder was configured.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 15 +++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 ++ tools/perf/util/cs-etm.c | 44 +++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 2 ++ 4 files changed, 63 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 27cf6c2..73218a9 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -76,6 +76,21 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, return 0; }
+int cs_etm_decoder__reset(struct cs_etm_decoder *decoder) +{ + ocsd_datapath_resp_t dp_ret; + + if (!decoder) + return -CS_ETM_ERR_PARAM; + + dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET, + 0, 0, NULL, NULL); + if (OCSD_DATA_RESP_IS_FATAL(dp_ret)) + return -CS_ETM_ERR_DECODER; + + return 0; +} + int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, struct cs_etm_packet *packet) { diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 47d9bcf..fb21f6c 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -124,6 +124,8 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, struct cs_etm_packet *packet);
+int cs_etm_decoder__reset(struct cs_etm_decoder *decoder); + int cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index d9d3eeb..4073238 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -475,6 +475,50 @@ int cs_etm__sample(struct cs_etm_queue *etmq) return 0; }
+int cs_etm__run_decoder(struct cs_etm_queue *etmq) +{ + struct cs_etm_buffer buffer; + size_t buffer_used; + int err = 0; + + /* Go through each buffer in the queue and decode them one by one */ +more: + buffer_used = 0; + memset(&buffer, 0, sizeof(buffer)); + err = cs_etm__get_trace(&buffer, etmq); + if (err <= 0) + return err; + /* + * cannot assume consecutive blocks in the data file are contiguous + * trace as will have start/stopped. Reset the decoder to force re-sync + */ + err = cs_etm_decoder__reset(etmq->decoder); + if (err != 0) + return err; + + /* run trace decoder until buffer consumed or end of trace */ + do { + size_t processed = 0; + + etmq->state = cs_etm_decoder__process_data_block( + etmq->decoder, + etmq->offset, + &buffer.buf[buffer_used], + buffer.len - buffer_used, + &processed); + err = (!etmq->state) ? -1 : etmq->state->err; + etmq->offset += processed; + buffer_used += processed; + if (err) + return err; + cs_etm__sample(etmq); + + } while (!etmq->eot && (buffer.len > buffer_used)); + +goto more; + return err; +} + int cs_etm__update_queues(struct cs_etm_auxtrace *etm) { if (etm->queues.new_data) { diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 54d667c..9bf0c2b 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -114,4 +114,6 @@ int cs_etm__sample(struct cs_etm_queue *etmq); int cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq);
+int cs_etm__run_decoder(struct cs_etm_queue *etmq); + #endif
This patch adds functions to process the queues at the end of processing each perf event. The functions set any thread, proc id, before running the trace decoder for each queue.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 5 +++ 2 files changed, 96 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 4073238..ba60be8 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -156,6 +156,27 @@ static int cs_etm__flush_events(struct perf_session *session, return 0; }
+static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, + struct auxtrace_queue *queue) +{ + struct cs_etm_queue *etmq = queue->priv; + + if (queue->tid == -1) { + etmq->tid = machine__get_current_tid(etm->machine, etmq->cpu); + thread__zput(etmq->thread); + } + + if ((!etmq->thread) && (etmq->tid != -1)) + etmq->thread = machine__find_thread(etm->machine, -1, + etmq->tid); + + if (etmq->thread) { + etmq->pid = etmq->thread->pid_; + if (queue->cpu == -1) + etmq->cpu = etmq->thread->cpu; + } +} + static void cs_etm__free_queue(void *priv) { struct cs_etm_queue *etmq = priv; @@ -528,6 +549,76 @@ int cs_etm__update_queues(struct cs_etm_auxtrace *etm) return 0; }
+int cs_etm__process_queues(struct cs_etm_auxtrace *etm, u64 timestamp) +{ + unsigned int queue_nr; + u64 ts; + int ret; + + while (1) { + struct auxtrace_queue *queue; + struct cs_etm_queue *etmq; + + if (!etm->heap.heap_cnt) + return 0; + + if (etm->heap.heap_array[0].ordinal >= timestamp) + return 0; + + queue_nr = etm->heap.heap_array[0].queue_nr; + queue = &etm->queues.queue_array[queue_nr]; + etmq = queue->priv; + + auxtrace_heap__pop(&etm->heap); + + if (etm->heap.heap_cnt) { + ts = etm->heap.heap_array[0].ordinal + 1; + if (ts > timestamp) + ts = timestamp; + } else { + ts = timestamp; + } + + cs_etm__set_pid_tid_cpu(etm, queue); + + ret = cs_etm__run_decoder(etmq); + + if (ret < 0) { + auxtrace_heap__add(&etm->heap, queue_nr, ts); + return ret; + } + + if (!ret) { + ret = auxtrace_heap__add(&etm->heap, queue_nr, ts); + if (ret < 0) + return ret; + } else { + etmq->on_heap = false; + } + } + return 0; +} + +int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, + pid_t tid, + u64 time_) +{ + struct auxtrace_queues *queues = &etm->queues; + unsigned int i; + + for (i = 0; i < queues->nr_queues; i++) { + struct auxtrace_queue *queue = &etm->queues.queue_array[i]; + struct cs_etm_queue *etmq = queue->priv; + + if (etmq && ((tid == -1) || (etmq->tid == tid))) { + etmq->time = time_; + cs_etm__set_pid_tid_cpu(etm, queue); + cs_etm__run_decoder(etmq); + } + } + return 0; +} + struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm, int cpu) { int q, j; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 9bf0c2b..5e921a0 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -116,4 +116,9 @@ int cs_etm__get_trace(struct cs_etm_buffer *buff,
int cs_etm__run_decoder(struct cs_etm_queue *etmq);
+int cs_etm__process_queues(struct cs_etm_auxtrace *etm, u64 timestamp); + +int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, + pid_t tid, + u64 time_); #endif
Each perf event is passed to the coresight etm event processing function, so that coresight specific functionality can be invoked.
This patch adds the coresight etm event processing function which manages the handling of the events that need special handling for trace decoding.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 3 -- tools/perf/util/cs-etm.c | 45 ++++++++++++++++++++++--- 2 files changed, 40 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 73218a9..f3c6625 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -80,9 +80,6 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder) { ocsd_datapath_resp_t dp_ret;
- if (!decoder) - return -CS_ETM_ERR_PARAM; - dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET, 0, 0, NULL, NULL); if (OCSD_DATA_RESP_IS_FATAL(dp_ret)) diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index ba60be8..c769da9 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -30,6 +30,7 @@ #include "evlist.h" #include "intlist.h" #include "machine.h" +#include "map.h" #include "perf.h" #include "thread.h" #include "thread_map.h" @@ -705,11 +706,45 @@ static int cs_etm__process_event(struct perf_session *session, struct perf_sample *sample, struct perf_tool *tool) { - (void) session; - (void) event; - (void) sample; - (void) tool; - return 0; + struct cs_etm_auxtrace *etm = container_of(session->auxtrace, + struct cs_etm_auxtrace, + auxtrace); + struct cs_etm_queue *etmq; + u64 timestamp; + int err = 0; + + if (dump_trace) + return 0; + + if (!tool->ordered_events) { + pr_err("CoreSight ETM Trace requires ordered events\n"); + return -EINVAL; + } + + if (sample->time && (sample->time != (u64) -1)) + timestamp = sample->time; + else + timestamp = 0; + + if (timestamp || etm->timeless_decoding) { + err = cs_etm__update_queues(etm); + if (err) + return err; + } + + etmq = cs_etm__cpu_to_etmq(etm, sample->cpu); + if (!etmq) + return -1; + + if (etm->timeless_decoding) { + if (event->header.type == PERF_RECORD_EXIT) + err = cs_etm__process_timeless_queues(etm, + event->fork.tid, + sample->time); + } else if (timestamp) + err = cs_etm__process_queues(etm, timestamp); + + return err; }
static int cs_etm__process_auxtrace_event(struct perf_session *session,
The flush_events callback may be used by the perf auxtrace subsystem and signals that any events/data stored in the coresight queues should be processed in its entirety.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index c769da9..836cfb6 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -37,6 +37,8 @@ #include "thread-stack.h" #include "util.h"
+#define MAX_TIMESTAMP (~0ULL) + struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues; @@ -152,9 +154,29 @@ void cs_etm__dump_event(struct cs_etm_auxtrace *etm, static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool) { - (void) session; - (void) tool; - return 0; + struct cs_etm_auxtrace *etm = container_of(session->auxtrace, + struct cs_etm_auxtrace, + auxtrace); + + int ret; + + if (dump_trace) + return 0; + + if (!tool->ordered_events) + return -EINVAL; + + ret = cs_etm__update_queues(etm); + + if (ret < 0) + return ret; + + if (etm->timeless_decoding) + return cs_etm__process_timeless_queues(etm, -1, + MAX_TIMESTAMP - 1); + + return cs_etm__process_queues(etm, MAX_TIMESTAMP); + }
static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
This patch adds the functions needed to generate configuration for the instruction events that are generated from the trace decoding, including which fields are populated. In particular this is for hardware instruction events.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 836cfb6..a7eb1fc9 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -496,6 +496,105 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, return ret; }
+struct cs_etm_synth { + struct perf_tool dummy_tool; + struct perf_session *session; +}; + +static int cs_etm__event_synth(struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + struct cs_etm_synth *cs_etm_synth = + container_of(tool, struct cs_etm_synth, dummy_tool); + + (void) sample; + (void) machine; + + return perf_session__deliver_synth_event(cs_etm_synth->session, + event, NULL); +} + +static int cs_etm__synth_event(struct perf_session *session, + struct perf_event_attr *attr, u64 id) +{ + struct cs_etm_synth cs_etm_synth; + + memset(&cs_etm_synth, 0, sizeof(struct cs_etm_synth)); + cs_etm_synth.session = session; + + return perf_event__synthesize_attr(&cs_etm_synth.dummy_tool, attr, 1, + &id, cs_etm__event_synth); +} + +static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, + struct perf_session *session) +{ + struct perf_evlist *evlist = session->evlist; + struct perf_evsel *evsel; + struct perf_event_attr attr; + bool found = false; + u64 id; + int err; + + evlist__for_each_entry(evlist, evsel) { + if (evsel->attr.type == etm->pmu_type) { + found = true; + break; + } + } + + if (!found) { + pr_debug("No selected events with CoreSight Trace data\n"); + return 0; + } + + memset(&attr, 0, sizeof(struct perf_event_attr)); + attr.size = sizeof(struct perf_event_attr); + attr.type = PERF_TYPE_HARDWARE; + attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK; + attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID | + PERF_SAMPLE_PERIOD; + if (etm->timeless_decoding) + attr.sample_type &= ~(u64)PERF_SAMPLE_TIME; + else + attr.sample_type |= PERF_SAMPLE_TIME; + + attr.exclude_user = evsel->attr.exclude_user; + attr.exclude_kernel = evsel->attr.exclude_kernel; + attr.exclude_hv = evsel->attr.exclude_hv; + attr.exclude_host = evsel->attr.exclude_host; + attr.exclude_guest = evsel->attr.exclude_guest; + attr.sample_id_all = evsel->attr.sample_id_all; + attr.read_format = evsel->attr.read_format; + + /* create new id val to be a fixed offset from evsel id */ + id = evsel->id[0] + 1000000000; + + if (!id) + id = 1; + + if (etm->synth_opts.instructions) { + attr.config = PERF_COUNT_HW_INSTRUCTIONS; + attr.sample_period = etm->synth_opts.period; + etm->instructions_sample_period = attr.sample_period; + err = cs_etm__synth_event(session, &attr, id); + + if (err) { + pr_err("%s: failed to synthesize 'instructions' event type\n", + __func__); + return err; + } + etm->sample_instructions = true; + etm->instructions_sample_type = attr.sample_type; + etm->instructions_id = id; + } + + etm->synth_needs_swap = evsel->needs_swap; + return 0; +} + int cs_etm__sample(struct cs_etm_queue *etmq) { struct cs_etm_packet packet; @@ -1037,6 +1136,20 @@ int cs_etm__process_auxtrace_info(union perf_event *event, if (dump_trace) return 0;
+ if (session->itrace_synth_opts && session->itrace_synth_opts->set) + etm->synth_opts = *session->itrace_synth_opts; + else + itrace_synth_opts__set_default(&etm->synth_opts); + + etm->synth_opts.branches = false; + etm->synth_opts.callchain = false; + etm->synth_opts.calls = false; + etm->synth_opts.returns = false; + + err = cs_etm__synth_events(etm, session); + if (err) + goto err_delete_thread; + err = auxtrace_queues__process_index(&etm->queues, session); if (err) goto err_delete_thread;
This patch adds a function to the decoder interface code to initialize the content of the packet buffer and set it to empty.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index f3c6625..0c36db3 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -109,6 +109,23 @@ int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, return 0; }
+static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) +{ + int i; + + decoder->head = 0; + decoder->tail = 0; + decoder->end_tail = 0; + decoder->packet_count = 0; + for (i = 0; i < MAX_BUFFER; i++) { + decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL; + decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL; + decoder->packet_buffer[i].exc = false; + decoder->packet_buffer[i].exc_ret = false; + decoder->packet_buffer[i].cpu = INT_MIN; + } +} + const struct cs_etm_state * cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder, uint64_t indx, const uint8_t *buf, @@ -295,6 +312,7 @@ cs_etm_decoder__new(uint32_t num_cpu, struct cs_etm_decoder_params *d_params,
decoder->state.data = d_params->data; decoder->prev_return = OCSD_RESP_CONT; + cs_etm_decoder__clear_buffer(decoder); format = (d_params->formatted ? OCSD_TRC_SRC_FRAME_FORMATTED : OCSD_TRC_SRC_SINGLE); flags = 0;
The full decode of a coresight trace requires the decoder library be properly configured, with a "generic" packet handler function. The packets are emitted into a buffer with a delay so as to be able to add additional information to the most recently decoded packet to encode information about taking/returning from interrupts/exceptions.
This patch adds the code to configure the trace decoder for full packet decode, provides the function to handle the trace packets, buffer, and annotate them when necessary.
Signed-off-by: Tor Jeremiassen tor@ti.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 164 ++++++++++++++++++++++++ 1 file changed, 164 insertions(+)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 0c36db3..d21416b 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -22,6 +22,7 @@ #include "cs-etm-decoder.h" #include "c_api/opencsd_c_api.h" #include "etmv4/trc_pkt_types_etmv4.h" +#include "intlist.h" #include "ocsd_if_types.h" #include "util.h"
@@ -76,6 +77,24 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, return 0; }
+static int cs_etm_decoder__flush_packet(struct cs_etm_decoder *decoder) +{ + int err = 0; + + if (!decoder) + return -CS_ETM_ERR_PARAM; + + if (decoder->packet_count >= MAX_BUFFER - 1) + return -CS_ETM_ERR_OVERFLOW; + + if (decoder->tail != decoder->end_tail) { + decoder->tail = (decoder->tail + 1) & (MAX_BUFFER - 1); + decoder->packet_count++; + } + + return err; +} + int cs_etm_decoder__reset(struct cs_etm_decoder *decoder) { ocsd_datapath_resp_t dp_ret; @@ -204,6 +223,120 @@ static void cs_etm_decoder__gen_etmv4_config(struct cs_etm_trace_params *params, config->core_prof = profile_CortexA; }
+static int cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, + const ocsd_generic_trace_elem *elem, + const uint8_t trace_chan_id, + enum cs_etm_sample_type sample_type) +{ + int err = 0; + uint32_t et = 0; + struct int_node *inode = NULL; + + if (!decoder) + return -CS_ETM_ERR_PARAM; + + if (decoder->packet_count >= MAX_BUFFER - 1) + return -CS_ETM_ERR_OVERFLOW; + + err = cs_etm_decoder__flush_packet(decoder); + + if (err) + return err; + + et = decoder->end_tail; + /* Search the RB tree for the cpu associated with this traceID */ + inode = intlist__find(traceid_list, trace_chan_id); + if (!inode) + return PTR_ERR(inode); + + decoder->packet_buffer[et].sample_type = sample_type; + decoder->packet_buffer[et].start_addr = elem->st_addr; + decoder->packet_buffer[et].end_addr = elem->en_addr; + decoder->packet_buffer[et].exc = false; + decoder->packet_buffer[et].exc_ret = false; + decoder->packet_buffer[et].cpu = *((int *)inode->priv); + + et = (et + 1) & (MAX_BUFFER - 1); + + decoder->end_tail = et; + + return err; +} + +static int cs_etm_decoder__mark_exception(struct cs_etm_decoder *decoder) +{ + int err = 0; + + if (!decoder) + return -CS_ETM_ERR_PARAM; + + decoder->packet_buffer[decoder->end_tail].exc = true; + + return err; +} + +static int cs_etm_decoder__mark_exception_return(struct cs_etm_decoder *decoder) +{ + int err = 0; + + if (!decoder) + return -CS_ETM_ERR_PARAM; + + decoder->packet_buffer[decoder->end_tail].exc_ret = true; + + return err; +} + +static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( + const void *context, + const ocsd_trc_index_t indx, + const uint8_t trace_chan_id, + const ocsd_generic_trace_elem *elem) +{ + ocsd_datapath_resp_t resp = OCSD_RESP_CONT; + struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context; + + (void) indx; + (void) trace_chan_id; + + switch (elem->elem_type) { + case OCSD_GEN_TRC_ELEM_UNKNOWN: + break; + case OCSD_GEN_TRC_ELEM_NO_SYNC: + decoder->trace_on = false; + break; + case OCSD_GEN_TRC_ELEM_TRACE_ON: + decoder->trace_on = true; + break; + case OCSD_GEN_TRC_ELEM_INSTR_RANGE: + cs_etm_decoder__buffer_packet(decoder, elem, + trace_chan_id, CS_ETM_RANGE); + resp = OCSD_RESP_WAIT; + break; + case OCSD_GEN_TRC_ELEM_EXCEPTION: + cs_etm_decoder__mark_exception(decoder); + break; + case OCSD_GEN_TRC_ELEM_EXCEPTION_RET: + cs_etm_decoder__mark_exception_return(decoder); + break; + case OCSD_GEN_TRC_ELEM_PE_CONTEXT: + case OCSD_GEN_TRC_ELEM_EO_TRACE: + case OCSD_GEN_TRC_ELEM_ADDR_NACC: + case OCSD_GEN_TRC_ELEM_TIMESTAMP: + case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: + case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN: + case OCSD_GEN_TRC_ELEM_EVENT: + case OCSD_GEN_TRC_ELEM_SWTRACE: + case OCSD_GEN_TRC_ELEM_CUSTOM: + default: + break; + } + + decoder->state.err = 0; + + return resp; +} + static ocsd_datapath_resp_t cs_etm_decoder__etmv4i_packet_printer(const void *context, const ocsd_datapath_op_t op, @@ -275,6 +408,33 @@ static int cs_etm_decoder__create_etmv4i_packet_printer( return 0; }
+static int cs_etm_decoder__create_etmv4i_packet_decoder( + struct cs_etm_decoder_params *d_params, + struct cs_etm_trace_params *t_params, + struct cs_etm_decoder *decoder) +{ + ocsd_etmv4_cfg trace_config; + int ret = 0; + unsigned char CSID; /* CSID extracted from the config data */ + + decoder->packet_printer = d_params->packet_printer; + + cs_etm_decoder__gen_etmv4_config(t_params, &trace_config); + + ret = ocsd_dt_create_decoder(decoder->dcd_tree, + OCSD_BUILTIN_DCD_ETMV4I, + OCSD_CREATE_FLG_FULL_DECODER, + (void *)&trace_config, &CSID); + + if (ret != 0) + return -CS_ETM_ERR_DECODER; + + ret = ocsd_dt_set_gen_elem_outfn(decoder->dcd_tree, + cs_etm_decoder__gen_trace_elem_printer, + decoder); + return ret; +} + int cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, struct cs_etm_trace_params *t_params, @@ -286,6 +446,10 @@ cs_etm_decoder__create_etmv4i_decoder(struct cs_etm_decoder_params *d_params, ret = cs_etm_decoder__create_etmv4i_packet_printer(d_params, t_params, decoder); + else if (d_params->operation == CS_ETM_OPERATION_DECODE) + ret = cs_etm_decoder__create_etmv4i_packet_decoder(d_params, + t_params, + decoder); else ret = -CS_ETM_ERR_PARAM;
Adding maintainers for Coresight trace decoding via perf tools.
Signed-off-by: Tor Jeremiassen tor@ti.com --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS index d443258..3a76ee5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1216,7 +1216,8 @@ F: tools/perf/arch/arm/util/pmu.c F: tools/perf/arch/arm/util/auxtrace.c F: tools/perf/arch/arm/util/cs-etm.c F: tools/perf/arch/arm/util/cs-etm.h -F: tools/perf/util/cs-etm.h +F: tools/perf/util/cs-etm.* +F: tools/perf/util/cs-etm-decoder/*
ARM/CORGI MACHINE SUPPORT M: Richard Purdie rpurdie@rpsys.net
On Tue, 19 Sep 2017 03:33:16 -0500 Tor Jeremiassen tor@ti.com wrote:
This patchset adds support for user space decoding of CoreSight traces [1]
These [x] references don't exist at the bottom of this email: did you forget to add them?
of the ARM architecture. Kernel support for configuring CoreSight tracers and collect the hardware trace data in the auxtrace section of the perf.data file is already integrated [2]. The user space implementation mirrors to a large degree that of the Intel Processor Trace (PT) [3] implementation, except that the decoder library itself is separate from the perf tool sources, and is built and maintained as a separate open source project [4]. Instead, this patch set includes the necessary code and build settings to interfaces to the decoder library, as well as a "stub" or "null" library for the case when the perf tool is built without the library.
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
The decoder library interface code in this patch set only supports ETMv4 trace decoding, though the library itself supports a broader range. Future patches will add support for more versions of the ARM ETM trace encoding.
Changes from v2:
I haven't seen any prior versions submitted to this list; should this per-version change info be stripped, given according to the people on this list, it's the first version we're seeing?
It would be nice to know what branch of what tree this series is supposed to be applied to, or even have a URL for a git repo where they've been already suitably applied? I see one of the OpenCSD forks on github is owned by user Tor: Can they be pushed there for easy access?
Thanks,
Kim
On 19 September 2017 at 17:26, Kim Phillips kim.phillips@arm.com wrote:
On Tue, 19 Sep 2017 03:33:16 -0500 Tor Jeremiassen tor@ti.com wrote:
This patchset adds support for user space decoding of CoreSight traces [1]
These [x] references don't exist at the bottom of this email: did you forget to add them?
of the ARM architecture. Kernel support for configuring CoreSight tracers and collect the hardware trace data in the auxtrace section of the perf.data file is already integrated [2]. The user space implementation mirrors to a large degree that of the Intel Processor Trace (PT) [3] implementation, except that the decoder library itself is separate from the perf tool sources, and is built and maintained as a separate open source project [4]. Instead, this patch set includes the necessary code and build settings to interfaces to the decoder library, as well as a "stub" or "null" library for the case when the perf tool is built without the library.
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive.
I have a different opinion. The perf tools can be compile with or without the library and it is not mandatory that it is included in the kernel tree.
Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
As per our conversation in Los Angeles, we currently do not have a plan to push the openCSD library to the kernel tree.
The decoder library interface code in this patch set only supports ETMv4 trace decoding, though the library itself supports a broader range. Future patches will add support for more versions of the ARM ETM trace encoding.
Changes from v2:
I haven't seen any prior versions submitted to this list; should this per-version change info be stripped, given according to the people on this list, it's the first version we're seeing?
It would be nice to know what branch of what tree this series is supposed to be applied to, or even have a URL for a git repo where they've been already suitably applied? I see one of the OpenCSD forks on github is owned by user Tor: Can they be pushed there for easy access?
Thanks,
Kim
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
They may prefer it that way but they have already accepted an out-of-tree dependency in the Python interpreter, and OpenCSD could be done that way. I.e. provide a NO_LIBOPENCSD alongside NO_LIBPYTHON. Make it as straightforward as possible to build in the decoder and encourage distributions to do this (which they don't necessarily do with Python).
Not saying upstreaming the library into the kernel is necessarily wrong, just that perf already accepts the idea of an out-of-tree dependency.
Al
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, 20 Sep 2017 02:20:38 -0500 Al Grant Al.Grant@arm.com wrote:
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
They may prefer it that way but they have already accepted an out-of-tree dependency in the Python interpreter, and OpenCSD could be done that way. I.e. provide a NO_LIBOPENCSD alongside NO_LIBPYTHON. Make it as straightforward as possible to build in the decoder and encourage distributions to do this (which they don't necessarily do with Python).
A Python interpreter is a well-recognized requirement for many pieces of software, in this case, some optional advanced scripting features in perf.
OpenCSD is a proprietary h/w trace decoder, necessary for perf to decode what it recorded into perf events that can then get reported, injected, etc. Intel's decoders (and disassemblers even) are fully included into the perf sources.
[Note that perf report should run on any arch and correctly interpret any *other* arch's perf.data files, so an Arm arch perf binary should be able to run an inject on an Intel PT perf.data file, and vice versa.]
OpenCSD is nowhere near as ubiquitous as python, and will take some time to be included in distributions, and that's if it's meaningful to do so: Is there some other purpose than perf ETM aux decode that the distribution keepers will be convinced of? I'm not aware of any, and if it's only for perf, I wouldn't be surprised if the distributions didn't ask why we wouldn't include it directly into perf's sources in the first place.
Not saying upstreaming the library into the kernel is necessarily wrong, just that perf already accepts the idea of an out-of-tree dependency.
I don't see a proprietary h/w trace decoder being among them, but we can talk about this all day.
Instead, there is one way we can find out whether it'll be accepted for sure: by submitting at least the first patch of this series "perf tools: Add initial hooks for decoding coresight" as an RFC or whatever to the upstream perf tool maintainers (acme, etc.). The sooner we do that, the less time we'll spend speculating about it.
Kim
On 20 September 2017 at 08:09, Kim Phillips kim.phillips@arm.com wrote:
On Wed, 20 Sep 2017 02:20:38 -0500 Al Grant Al.Grant@arm.com wrote:
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
They may prefer it that way but they have already accepted an out-of-tree dependency in the Python interpreter, and OpenCSD could be done that way. I.e. provide a NO_LIBOPENCSD alongside NO_LIBPYTHON. Make it as straightforward as possible to build in the decoder and encourage distributions to do this (which they don't necessarily do with Python).
A Python interpreter is a well-recognized requirement for many pieces of software, in this case, some optional advanced scripting features in perf.
OpenCSD is a proprietary h/w trace decoder, necessary for perf to decode what it recorded into perf events that can then get reported, injected, etc. Intel's decoders (and disassemblers even) are fully included into the perf sources.
[Note that perf report should run on any arch and correctly interpret any *other* arch's perf.data files, so an Arm arch perf binary should be able to run an inject on an Intel PT perf.data file, and vice versa.]
Patches are always appreciated.
OpenCSD is nowhere near as ubiquitous as python, and will take some time to be included in distributions, and that's if it's meaningful to do so: Is there some other purpose than perf ETM aux decode that the distribution keepers will be convinced of? I'm not aware of any, and if it's only for perf, I wouldn't be surprised if the distributions didn't ask why we wouldn't include it directly into perf's sources in the first place.
Not saying upstreaming the library into the kernel is necessarily wrong, just that perf already accepts the idea of an out-of-tree dependency.
I don't see a proprietary h/w trace decoder being among them, but we can talk about this all day.
Instead, there is one way we can find out whether it'll be accepted for sure: by submitting at least the first patch of this series "perf tools: Add initial hooks for decoding coresight" as an RFC or whatever to the upstream perf tool maintainers (acme, etc.). The sooner we do that, the less time we'll spend speculating about it.
Support for perf record will go out as a patchset so that people can see that a full solution is available.
As for including the library in the kernel tree I am happy to welcome new resources to the team. Otherwise and as stated last week, we do not have the resources.
Kim _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On Wed, 20 Sep 2017 08:32:54 -0600 Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 20 September 2017 at 08:09, Kim Phillips kim.phillips@arm.com wrote:
On Wed, 20 Sep 2017 02:20:38 -0500 Al Grant Al.Grant@arm.com wrote:
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
They may prefer it that way but they have already accepted an out-of-tree dependency in the Python interpreter, and OpenCSD could be done that way. I.e. provide a NO_LIBOPENCSD alongside NO_LIBPYTHON. Make it as straightforward as possible to build in the decoder and encourage distributions to do this (which they don't necessarily do with Python).
A Python interpreter is a well-recognized requirement for many pieces of software, in this case, some optional advanced scripting features in perf.
OpenCSD is a proprietary h/w trace decoder, necessary for perf to decode what it recorded into perf events that can then get reported, injected, etc. Intel's decoders (and disassemblers even) are fully included into the perf sources.
[Note that perf report should run on any arch and correctly interpret any *other* arch's perf.data files, so an Arm arch perf binary should be able to run an inject on an Intel PT perf.data file, and vice versa.]
Patches are always appreciated.
I know they are, but can't tell what you're getting at in this particular context.
OpenCSD is nowhere near as ubiquitous as python, and will take some time to be included in distributions, and that's if it's meaningful to do so: Is there some other purpose than perf ETM aux decode that the distribution keepers will be convinced of? I'm not aware of any, and if it's only for perf, I wouldn't be surprised if the distributions didn't ask why we wouldn't include it directly into perf's sources in the first place.
Not saying upstreaming the library into the kernel is necessarily wrong, just that perf already accepts the idea of an out-of-tree dependency.
I don't see a proprietary h/w trace decoder being among them, but we can talk about this all day.
Instead, there is one way we can find out whether it'll be accepted for sure: by submitting at least the first patch of this series "perf tools: Add initial hooks for decoding coresight" as an RFC or whatever to the upstream perf tool maintainers (acme, etc.). The sooner we do that, the less time we'll spend speculating about it.
Support for perf record will go out as a patchset so that people can see that a full solution is available.
I thought this was about perf report, not record? The degree of the perf report solution ('full' or otherwise) depends on the degree of perf integration with the decoder.
As for including the library in the kernel tree I am happy to welcome new resources to the team. Otherwise and as stated last week, we do not have the resources.
Let's see how the upstream perf tool maintainers respond to this patchseries - or at least patch 1 of 22 - before discussing resources (which probably belongs off-list anyway).
Kim
On 20 September 2017 at 14:25, Kim Phillips kim.phillips@arm.com wrote:
On Wed, 20 Sep 2017 08:32:54 -0600 Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 20 September 2017 at 08:09, Kim Phillips kim.phillips@arm.com wrote:
On Wed, 20 Sep 2017 02:20:38 -0500 Al Grant Al.Grant@arm.com wrote:
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
They may prefer it that way but they have already accepted an out-of-tree dependency in the Python interpreter, and OpenCSD could be done that way. I.e. provide a NO_LIBOPENCSD alongside NO_LIBPYTHON. Make it as straightforward as possible to build in the decoder and encourage distributions to do this (which they don't necessarily do with Python).
A Python interpreter is a well-recognized requirement for many pieces of software, in this case, some optional advanced scripting features in perf.
OpenCSD is a proprietary h/w trace decoder, necessary for perf to decode what it recorded into perf events that can then get reported, injected, etc. Intel's decoders (and disassemblers even) are fully included into the perf sources.
[Note that perf report should run on any arch and correctly interpret any *other* arch's perf.data files, so an Arm arch perf binary should be able to run an inject on an Intel PT perf.data file, and vice versa.]
Patches are always appreciated.
I know they are, but can't tell what you're getting at in this particular context.
OpenCSD is nowhere near as ubiquitous as python, and will take some time to be included in distributions, and that's if it's meaningful to do so: Is there some other purpose than perf ETM aux decode that the distribution keepers will be convinced of? I'm not aware of any, and if it's only for perf, I wouldn't be surprised if the distributions didn't ask why we wouldn't include it directly into perf's sources in the first place.
Not saying upstreaming the library into the kernel is necessarily wrong, just that perf already accepts the idea of an out-of-tree dependency.
I don't see a proprietary h/w trace decoder being among them, but we can talk about this all day.
Instead, there is one way we can find out whether it'll be accepted for sure: by submitting at least the first patch of this series "perf tools: Add initial hooks for decoding coresight" as an RFC or whatever to the upstream perf tool maintainers (acme, etc.). The sooner we do that, the less time we'll spend speculating about it.
Support for perf record will go out as a patchset so that people can see that a full solution is available.
I thought this was about perf report, not record? The degree of the perf report solution ('full' or otherwise) depends on the degree of perf integration with the decoder.
Yes, I was talking about the report.
As for including the library in the kernel tree I am happy to welcome new resources to the team. Otherwise and as stated last week, we do not have the resources.
Let's see how the upstream perf tool maintainers respond to this patchseries - or at least patch 1 of 22 - before discussing resources (which probably belongs off-list anyway).
Kim
Forgot to reply 'all'.
-----Original Message----- From: Jeremiassen, Tor Sent: Wednesday, September 20, 2017 10:58 AM To: 'Kim Phillips' Subject: RE: [EXTERNAL] Re: [PATCH v7 00/22] Add support for CoreSight trace decoding
I think one reason for not upstreaming the decoder library itself is that it has applications outside of perf and the linux ecosystem. As you know Arm trace has been used across a large range of ARM embedded processors to support embedded software development. As such, the need for a trace decoder isn't new. However, within an embedded toolchain the actual trace decoder really doesn't provide a real value add for any software tools provider, it's more what you do with the trace output. Therefore, having an open sourced generic ARM trace decoder is useful in a much broader range of use cases than processing traces within perf. Additionally, the perf use case primarily centers on collecting traces on the Cortex-A processors, while the open source trace decoder should have as a goal to provide decoding support to both M-class and R-class ARM processors, though priorities may dictate the order in which such support is added.
My concern is that making the trace decoder library of the source tree would jeopardize the broader use case in favor of a Cortex-A linux focused solution, and fail to provide a single open sourced decoder library across the ARM cores.
This being said, I agree that this would need to be communicated more clearly in the write-up.
Best regards,
Tor Jeremiassen
-----Original Message----- From: Kim Phillips [mailto:kim.phillips@arm.com] Sent: Wednesday, September 20, 2017 9:10 AM To: Al Grant Cc: Jeremiassen, Tor; coresight@lists.linaro.org Subject: [EXTERNAL] Re: [PATCH v7 00/22] Add support for CoreSight trace decoding
On Wed, 20 Sep 2017 02:20:38 -0500 Al Grant Al.Grant@arm.com wrote:
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
They may prefer it that way but they have already accepted an out-of-tree dependency in the Python interpreter, and OpenCSD could be done that way. I.e. provide a NO_LIBOPENCSD alongside NO_LIBPYTHON. Make it as straightforward as possible to build in the decoder and encourage distributions to do this (which they don't necessarily do with Python).
A Python interpreter is a well-recognized requirement for many pieces of software, in this case, some optional advanced scripting features in perf.
OpenCSD is a proprietary h/w trace decoder, necessary for perf to decode what it recorded into perf events that can then get reported, injected, etc. Intel's decoders (and disassemblers even) are fully included into the perf sources.
[Note that perf report should run on any arch and correctly interpret any *other* arch's perf.data files, so an Arm arch perf binary should be able to run an inject on an Intel PT perf.data file, and vice versa.]
OpenCSD is nowhere near as ubiquitous as python, and will take some time to be included in distributions, and that's if it's meaningful to do so: Is there some other purpose than perf ETM aux decode that the distribution keepers will be convinced of? I'm not aware of any, and if it's only for perf, I wouldn't be surprised if the distributions didn't ask why we wouldn't include it directly into perf's sources in the first place.
Not saying upstreaming the library into the kernel is necessarily wrong, just that perf already accepts the idea of an out-of-tree dependency.
I don't see a proprietary h/w trace decoder being among them, but we can talk about this all day.
Instead, there is one way we can find out whether it'll be accepted for sure: by submitting at least the first patch of this series "perf tools: Add initial hooks for decoding coresight" as an RFC or whatever to the upstream perf tool maintainers (acme, etc.). The sooner we do that, the less time we'll spend speculating about it.
Kim
On Wed, 20 Sep 2017 15:59:58 +0000 "Jeremiassen, Tor" tor@ti.com wrote:
From: Jeremiassen, Tor Sent: Wednesday, September 20, 2017 10:58 AM
I think one reason for not upstreaming the decoder library itself is that it has applications outside of perf and the linux ecosystem.
By upstreaming the decoder library, I did not mean to imply that the standalone library somehow disappear. My comments below (please don't top-post, so we both know what context we're talking about) were wrt what linux distributions and 'apps' would do if the decoder library were somehow packaged along the same lines as a python interpreter.
As you know Arm trace has been used across a large range of ARM embedded processors to support embedded software development. As such, the need for a trace decoder isn't new. However, within an embedded toolchain the actual trace decoder really doesn't provide a real value add for any software tools provider, it's more what you do with the trace output. Therefore, having an open sourced generic ARM trace decoder is useful in a much broader range of use cases than processing traces within perf. Additionally, the perf use case primarily centers on collecting traces on the Cortex-A processors, while the open source trace decoder should have as a goal to provide decoding support to both M-class and R-class ARM processors, though priorities may dictate the order in which such support is added.
Certainly.
My concern is that making the trace decoder library of the source tree would jeopardize the broader use case in favor of a Cortex-A linux focused solution, and fail to provide a single open sourced decoder library across the ARM cores.
Like I said, didn't mean to suggest the existing code should disappear, rather we just enable perf quicker better by providing what it needs from the decoder up-front, esp. in the case where the maintainers don't accept an outside decoder library.
This being said, I agree that this would need to be communicated more clearly in the write-up.
Yes, IMO since this is a linux submission, it - and my comments - are geared more toward the Cortex-A series, but I don't think the linux guys would be interested in reading that much about what non-linux environments are doing with what projects.
Kim
Best regards,
Tor Jeremiassen
-----Original Message----- From: Kim Phillips [mailto:kim.phillips@arm.com] Sent: Wednesday, September 20, 2017 9:10 AM To: Al Grant Cc: Jeremiassen, Tor; coresight@lists.linaro.org Subject: [EXTERNAL] Re: [PATCH v7 00/22] Add support for CoreSight trace decoding
On Wed, 20 Sep 2017 02:20:38 -0500 Al Grant Al.Grant@arm.com wrote:
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
They may prefer it that way but they have already accepted an out-of-tree dependency in the Python interpreter, and OpenCSD could be done that way. I.e. provide a NO_LIBOPENCSD alongside NO_LIBPYTHON. Make it as straightforward as possible to build in the decoder and encourage distributions to do this (which they don't necessarily do with Python).
A Python interpreter is a well-recognized requirement for many pieces of software, in this case, some optional advanced scripting features in perf.
OpenCSD is a proprietary h/w trace decoder, necessary for perf to decode what it recorded into perf events that can then get reported, injected, etc. Intel's decoders (and disassemblers even) are fully included into the perf sources.
[Note that perf report should run on any arch and correctly interpret any *other* arch's perf.data files, so an Arm arch perf binary should be able to run an inject on an Intel PT perf.data file, and vice versa.]
OpenCSD is nowhere near as ubiquitous as python, and will take some time to be included in distributions, and that's if it's meaningful to do so: Is there some other purpose than perf ETM aux decode that the distribution keepers will be convinced of? I'm not aware of any, and if it's only for perf, I wouldn't be surprised if the distributions didn't ask why we wouldn't include it directly into perf's sources in the first place.
Not saying upstreaming the library into the kernel is necessarily wrong, just that perf already accepts the idea of an out-of-tree dependency.
I don't see a proprietary h/w trace decoder being among them, but we can talk about this all day.
Instead, there is one way we can find out whether it'll be accepted for sure: by submitting at least the first patch of this series "perf tools: Add initial hooks for decoding coresight" as an RFC or whatever to the upstream perf tool maintainers (acme, etc.). The sooner we do that, the less time we'll spend speculating about it.
Kim _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Unfortunately I did forget to add the references. Here they are:
[1] https://lwn.net/Articles/626463 [2] https://github.com/torvalds/linux/tree/master/drivers/hwtracing/coresight [3] https://lwn.net/Articles/648154 [4] https://github.com/Linaro/OpenCSD
Best regards,
Tor Jeremiassen
-----Original Message----- From: Kim Phillips [mailto:kim.phillips@arm.com] Sent: Tuesday, September 19, 2017 6:26 PM To: Jeremiassen, Tor Cc: mathieu.poirier@linaro.org; coresight@lists.linaro.org Subject: [EXTERNAL] Re: [PATCH v7 00/22] Add support for CoreSight trace decoding
On Tue, 19 Sep 2017 03:33:16 -0500 Tor Jeremiassen tor@ti.com wrote:
This patchset adds support for user space decoding of CoreSight traces [1]
These [x] references don't exist at the bottom of this email: did you forget to add them?
of the ARM architecture. Kernel support for configuring CoreSight tracers and collect the hardware trace data in the auxtrace section of the perf.data file is already integrated [2]. The user space implementation mirrors to a large degree that of the Intel Processor Trace (PT) [3] implementation, except that the decoder library itself is separate from the perf tool sources, and is built and maintained as a separate open source project [4]. Instead, this patch set includes the necessary code and build settings to interfaces to the decoder library, as well as a "stub" or "null" library for the case when the perf tool is built without the library.
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
The decoder library interface code in this patch set only supports ETMv4 trace decoding, though the library itself supports a broader range. Future patches will add support for more versions of the ARM ETM trace encoding.
Changes from v2:
I haven't seen any prior versions submitted to this list; should this per-version change info be stripped, given according to the people on this list, it's the first version we're seeing?
It would be nice to know what branch of what tree this series is supposed to be applied to, or even have a URL for a git repo where they've been already suitably applied? I see one of the OpenCSD forks on github is owned by user Tor: Can they be pushed there for easy access?
Thanks,
Kim
On Wed, Sep 20, 2017 at 04:49:44PM +0100, Kim Phillips wrote:
Tor Jeremiassen tor@ti.com wrote:
project [4]. Instead, this patch set includes the necessary code and build settings to interfaces to the decoder library, as well as a "stub" or
"null"
library for the case when the perf tool is built without the library.
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
I can't see why people would insist on having the library be upstreamed, it's not something that exists solely for the kernel. CoreSight is something that any ARM system can use if it's got appropriate hardware built in, the goal is for people to be able to share the decoding code between all kinds of tools running on many OSs. perf is one such application but far from the only one, and it's still usable without the trace decode.
If there were no reasonable possibility of the library being used outside of the kernel or it were so fundamental to building kernels that it were essential then it'd be possible people would feel a need to duplicate it, but even then we'd probably do as we do with dtc and have a copy of the code rather than first class kernel code. As it is it's an optional feature and seems closer to things like some of the kselftests which happily use external libraries installed on the system.
On Wed, 20 Sep 2017 17:12:58 +0100 Mark Brown broonie@kernel.org wrote:
On Wed, Sep 20, 2017 at 04:49:44PM +0100, Kim Phillips wrote:
Tor Jeremiassen tor@ti.com wrote:
project [4]. Instead, this patch set includes the necessary code and build settings to interfaces to the decoder library, as well as a "stub" or
"null"
library for the case when the perf tool is built without the library.
I seriously doubt this would be acceptable upstream: they prefer to have all code fully-inclusive. Do we have a plan for somehow upstreaming the library, or some other means for working around this restriction?
I can't see why people would insist on having the library be upstreamed, it's not something that exists solely for the kernel. CoreSight is something that any ARM system can use if it's got appropriate hardware built in, the goal is for people to be able to share the decoding code between all kinds of tools running on many OSs. perf is one such application but far from the only one, and it's still usable without the trace decode.
For record, sure, but this series is about report AFAICT (I still can't tell where it applies cleanly).
If it helps to clarify my position, I'm not saying the ETM trace decoder / OpenCSD library / master branch should necessarily be converted to move *away* from where it is, and live *solely* in the kernel tree's perf tool sources: I'm asking what if the maintainers didn't want to have to depend on external libraries for Coresight report support.
If there were no reasonable possibility of the library being used outside of the kernel or it were so fundamental to building kernels that it were essential then it'd be possible people would feel a need to duplicate it, but even then we'd probably do as we do with dtc and have a copy of the code rather than first class kernel code. As it is it's an optional feature and seems closer to things like some of the kselftests which happily use external libraries installed on the system.
I don't know how optional or not Coresight report is - I'll leave that up to the upstream maintainers, but, I will say that perf report with Intel PT input currently runs on Arm perf binaries, and there is no option to opt-out of it, so the upstream maintainers, sure, whilst being a little Intel-centric, nevertheless made the decision that any perf binary should be able to decode a perf.data file from another arch.
FWIW, Intel PT also has an independent decoder project:
https://github.com/01org/processor-trace
Upon quick perusal, it bears almost no common code with the Intel PT decoder present in the perf tool upstream source (tools/perf/util/intel-pt-decoder/). I'm guessing the upstream perf decoder was done separately to better interleave with the perf buffer handling and event generation callbacks? Not sure, but it's evidence two versions of a single common h/w trace decoder exist IRL.
That same project is also available as a package on my distro:
libipt-dev/zesty 1.5-1ubuntu1 amd64 Intel Processor Trace Decoder Library -- development files
Yet it still didn't qualify to be an optional perf tool dependency: the string 'libipt' doesn't occur in the kernel source tree.
I still think this conversation will conclude once the upstream perf tool maintainers are consulted, as it's ultimately their decision.
Kim
On Wed, Sep 20, 2017 at 03:54:14PM -0500, Kim Phillips wrote:
If it helps to clarify my position, I'm not saying the ETM trace decoder / OpenCSD library / master branch should necessarily be converted to move *away* from where it is, and live *solely* in the kernel tree's perf tool sources: I'm asking what if the maintainers didn't want to have to depend on external libraries for Coresight report support.
If that is an issue for the perf maintainers I'm sure they will be more than capable of bringing it up themselves and as I said in the message to which you are replying there is an existing approach to this:
duplicate it, but even then we'd probably do as we do with dtc and have a copy of the code rather than first class kernel code. As it is
which has served us well.
I don't know how optional or not Coresight report is - I'll leave that up to the upstream maintainers, but, I will say that perf report with Intel PT input currently runs on Arm perf binaries, and there is no option to opt-out of it, so the upstream maintainers, sure, whilst being a little Intel-centric, nevertheless made the decision that any perf binary should be able to decode a perf.data file from another arch.
I think you may be reading far too much into this and that it may be a much less deliberate or considered decision than you seem to see it as, parallel implementation (perhaps due to difficulty in reuse due to things being tied too closely to the rest of the implementation) or just poor communication are also possible.