This patchset adds support for CoreSight CPU-wide trace scenarios. More specifically it extends the work that was done for per thread scenarios to handle more than a single trace ID. It also temporally correlate traces based on timestamp generated by the tracers so that rendering by the perf mechanic is ordered.
Everything is based on Arnaldo's perf/core branch (46d4c9a05285). I will send another revision when it is rebased to a 5.2 rc candidate.
Before this set: # root@juno:/home/linaro# perf record -e cs_etm/@20070000.etr/ -C 2,3 sleep 1 failed to mmap with 12 (Cannot allocate memory)
After this set: # root@juno:/home/linaro# perf record -e cs_etm/@20070000.etr/ -C 2,3 sleep 1 [ perf record: Captured and wrote 1.352 MB perf.data ]
Regards, Mathieu
Changes for V2: * Fixed error condition in function cs_etm_set_option() (Leo) * Fixed changelog spelling error (Leo). * Moved from calloc() to malloc() in cs_etm__etmq_get_traceid_queue() * Got rid of CS_ETM_PACKET_QUEUE_NR macro * Fixed indentation problem in function cs_etm__process_traceid_queue() (Leo).
Mathieu Poirier (17): perf tools: Configure contextID tracing in CPU-wide mode perf tools: Configure timestsamp generation in CPU-wide mode perf tools: Configure SWITCH_EVENTS in CPU-wide mode perf tools: Add handling of itrace start events perf tools: Add handling of switch-CPU-wide events perf tools: Refactor error path in cs_etm_decoder__new() perf tools: Move packet queue out of decoder structure perf tools: Fix indentation in function cs_etm__process_decoder_queue() perf tools: Introduce the concept of trace ID queues perf tools: Get rid of unused cpu in struct cs_etm_queue perf tools: Move thread to traceid_queue perf tools: Move tid/pid to traceid_queue perf tools: Use traceID aware memory callback API perf tools: Add support for multiple traceID queues perf tools: Linking PE contextID with perf thread mechanic perf tools: Add notion of time to decoding code perf tools: Add support for CPU-wide trace scenarios
tools/perf/Makefile.config | 3 + tools/perf/arch/arm/util/cs-etm.c | 186 ++- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 269 +++-- .../perf/util/cs-etm-decoder/cs-etm-decoder.h | 39 +- tools/perf/util/cs-etm.c | 1026 +++++++++++++---- tools/perf/util/cs-etm.h | 103 ++ 6 files changed, 1252 insertions(+), 374 deletions(-)
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com --- tools/perf/arch/arm/util/cs-etm.c | 126 +++++++++++++++++++++++++----- tools/perf/util/cs-etm.h | 12 +++ 2 files changed, 119 insertions(+), 19 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 911426721170..3912f0bf04ed 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -35,8 +35,100 @@ struct cs_etm_recording { size_t snapshot_size; };
+static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = { + [CS_ETM_ETMCCER] = "mgmt/etmccer", + [CS_ETM_ETMIDR] = "mgmt/etmidr", +}; + +static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = { + [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0", + [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1", + [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2", + [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8", + [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus", +}; + static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu);
+static int cs_etm_set_context_id(struct auxtrace_record *itr, + struct perf_evsel *evsel, int cpu) +{ + struct cs_etm_recording *ptr; + struct perf_pmu *cs_etm_pmu; + char path[PATH_MAX]; + int err = -EINVAL; + u32 val; + + ptr = container_of(itr, struct cs_etm_recording, itr); + cs_etm_pmu = ptr->cs_etm_pmu; + + if (!cs_etm_is_etmv4(itr, cpu)) + goto out; + + /* Get a handle on TRCIRD2 */ + snprintf(path, PATH_MAX, "cpu%d/%s", + cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]); + err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val); + + /* There was a problem reading the file, bailing out */ + if (err != 1) { + pr_err("%s: can't read file %s\n", + CORESIGHT_ETM_PMU_NAME, path); + goto out; + } + + /* + * TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID tracing + * is supported: + * 0b00000 Context ID tracing is not supported. + * 0b00100 Maximum of 32-bit Context ID size. + * All other values are reserved. + */ + val = BMVAL(val, 5, 9); + if (!val || val != 0x4) { + err = -EINVAL; + goto out; + } + + /* All good, let the kernel know */ + evsel->attr.config |= (1 << ETM_OPT_CTXTID); + err = 0; + +out: + + return err; +} + +static int cs_etm_set_option(struct auxtrace_record *itr, + struct perf_evsel *evsel, u32 option) +{ + int i, err = -EINVAL; + struct cpu_map *event_cpus = evsel->evlist->cpus; + struct cpu_map *online_cpus = cpu_map__new(NULL); + + /* Set option of each CPU we have */ + for (i = 0; i < cpu__max_cpu(); i++) { + if (!cpu_map__has(event_cpus, i) || + !cpu_map__has(online_cpus, i)) + continue; + + switch (option) { + case ETM_OPT_CTXTID: + err = cs_etm_set_context_id(itr, evsel, i); + if (err) + goto out; + break; + default: + goto out; + } + } + + err = 0; +out: + cpu_map__put(online_cpus); + return err; +} + static int cs_etm_parse_snapshot_options(struct auxtrace_record *itr, struct record_opts *opts, const char *str) @@ -105,8 +197,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, container_of(itr, struct cs_etm_recording, itr); struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu; struct perf_evsel *evsel, *cs_etm_evsel = NULL; - const struct cpu_map *cpus = evlist->cpus; + struct cpu_map *cpus = evlist->cpus; bool privileged = (geteuid() == 0 || perf_event_paranoid() < 0); + int err = 0;
ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode; @@ -241,19 +334,24 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
/* * In the case of per-cpu mmaps, we need the CPU on the - * AUX event. + * AUX event. We also need the contextID in order to be notified + * when a context switch happened. */ - if (!cpu_map__empty(cpus)) + if (!cpu_map__empty(cpus)) { perf_evsel__set_sample_bit(cs_etm_evsel, CPU);
+ err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID); + if (err) + goto out; + } + /* Add dummy event to keep tracking */ if (opts->full_auxtrace) { struct perf_evsel *tracking_evsel; - int err;
err = parse_events(evlist, "dummy:u", NULL); if (err) - return err; + goto out;
tracking_evsel = perf_evlist__last(evlist); perf_evlist__set_tracking_event(evlist, tracking_evsel); @@ -266,7 +364,8 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, perf_evsel__set_sample_bit(tracking_evsel, TIME); }
- return 0; +out: + return err; }
static u64 cs_etm_get_config(struct auxtrace_record *itr) @@ -314,6 +413,8 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) config_opts = cs_etm_get_config(itr); if (config_opts & BIT(ETM_OPT_CYCACC)) config |= BIT(ETM4_CFG_BIT_CYCACC); + if (config_opts & BIT(ETM_OPT_CTXTID)) + config |= BIT(ETM4_CFG_BIT_CTXTID); if (config_opts & BIT(ETM_OPT_TS)) config |= BIT(ETM4_CFG_BIT_TS); if (config_opts & BIT(ETM_OPT_RETSTK)) @@ -363,19 +464,6 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, (etmv3 * CS_ETMV3_PRIV_SIZE)); }
-static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = { - [CS_ETM_ETMCCER] = "mgmt/etmccer", - [CS_ETM_ETMIDR] = "mgmt/etmidr", -}; - -static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = { - [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0", - [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1", - [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2", - [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8", - [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus", -}; - static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu) { bool ret = false; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0e97c196147a..826c9eedaf5c 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024)
+/* + * Create a contiguous bitmask starting at bit position @l and ending at + * position @h. For example + * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. + * + * Carbon copy of implementation found in $KERNEL/include/linux/bitops.h + */ +#define GENMASK(h, l) \ + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) + +#define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb) + #define CS_ETM_HEADER_SIZE (CS_HEADER_VERSION_0_MAX * sizeof(u64))
#define __perf_cs_etmv3_magic 0x3030303030303030ULL
Hi Mathieu,
On 24/05/2019 18:34, Mathieu Poirier wrote:
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
I am sorry but, I don't remember reviewing this patch in the previous postings. But here we go.
tools/perf/arch/arm/util/cs-etm.c | 126 +++++++++++++++++++++++++----- tools/perf/util/cs-etm.h | 12 +++ 2 files changed, 119 insertions(+), 19 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 911426721170..3912f0bf04ed 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -35,8 +35,100 @@ struct cs_etm_recording { size_t snapshot_size; };
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu); +static int cs_etm_set_context_id(struct auxtrace_record *itr,
struct perf_evsel *evsel, int cpu)
+{
- struct cs_etm_recording *ptr;
- struct perf_pmu *cs_etm_pmu;
- char path[PATH_MAX];
- int err = -EINVAL;
- u32 val;
- ptr = container_of(itr, struct cs_etm_recording, itr);
- cs_etm_pmu = ptr->cs_etm_pmu;
- if (!cs_etm_is_etmv4(itr, cpu))
goto out;
- /* Get a handle on TRCIRD2 */
- snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]);
- err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
- /* There was a problem reading the file, bailing out */
- if (err != 1) {
pr_err("%s: can't read file %s\n",
CORESIGHT_ETM_PMU_NAME, path);
goto out;
- }
- /*
* TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID tracing
* is supported:
* 0b00000 Context ID tracing is not supported.
* 0b00100 Maximum of 32-bit Context ID size.
* All other values are reserved.
*/
- val = BMVAL(val, 5, 9);
- if (!val || val != 0x4) {
err = -EINVAL;
goto out;
- }
- /* All good, let the kernel know */
- evsel->attr.config |= (1 << ETM_OPT_CTXTID);
- err = 0;
+out:
- return err;
+}
+static int cs_etm_set_option(struct auxtrace_record *itr,
struct perf_evsel *evsel, u32 option)
+{
- int i, err = -EINVAL;
- struct cpu_map *event_cpus = evsel->evlist->cpus;
- struct cpu_map *online_cpus = cpu_map__new(NULL);
- /* Set option of each CPU we have */
- for (i = 0; i < cpu__max_cpu(); i++) {
if (!cpu_map__has(event_cpus, i) ||
!cpu_map__has(online_cpus, i))
continue;
switch (option) {
case ETM_OPT_CTXTID:
err = cs_etm_set_context_id(itr, evsel, i);
if (err)
goto out;
break;
default:
goto out;
}
- }
I am not too familiar with the perf tool code. But, isn't there a way to force the config bit, right from the beginning when the events are created, when we know that we are doing a CPU wide tracing, along with the other config bits ?
- err = 0;
+out:
- cpu_map__put(online_cpus);
- return err;
+}
- static int cs_etm_parse_snapshot_options(struct auxtrace_record *itr, struct record_opts *opts, const char *str)
@@ -105,8 +197,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, container_of(itr, struct cs_etm_recording, itr); struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu; struct perf_evsel *evsel, *cs_etm_evsel = NULL;
- const struct cpu_map *cpus = evlist->cpus;
- struct cpu_map *cpus = evlist->cpus; bool privileged = (geteuid() == 0 || perf_event_paranoid() < 0);
- int err = 0;
ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode; @@ -241,19 +334,24 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, /* * In the case of per-cpu mmaps, we need the CPU on the
* AUX event.
* AUX event. We also need the contextID in order to be notified
*/* when a context switch happened.
- if (!cpu_map__empty(cpus))
- if (!cpu_map__empty(cpus)) { perf_evsel__set_sample_bit(cs_etm_evsel, CPU);
err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID);
if (err)
goto out;
- }
- /* Add dummy event to keep tracking */ if (opts->full_auxtrace) { struct perf_evsel *tracking_evsel;
int err;
err = parse_events(evlist, "dummy:u", NULL); if (err)
return err;
goto out;
tracking_evsel = perf_evlist__last(evlist); perf_evlist__set_tracking_event(evlist, tracking_evsel); @@ -266,7 +364,8 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, perf_evsel__set_sample_bit(tracking_evsel, TIME); }
- return 0;
+out:
- return err; }
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0e97c196147a..826c9eedaf5c 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024) +/*
- Create a contiguous bitmask starting at bit position @l and ending at
- position @h. For example
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- Carbon copy of implementation found in $KERNEL/include/linux/bitops.h
- */
+#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
minor nit: Could this be placed in a more generic header file for the other parts of the perf tool to consume ?
+#define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb)
Cheers Suzuki
Hey Suzuki,
On Fri, 7 Jun 2019 at 03:21, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 24/05/2019 18:34, Mathieu Poirier wrote:
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
I am sorry but, I don't remember reviewing this patch in the previous postings. But here we go.
We definitely misunderstood each other - apologies for that.
tools/perf/arch/arm/util/cs-etm.c | 126 +++++++++++++++++++++++++----- tools/perf/util/cs-etm.h | 12 +++ 2 files changed, 119 insertions(+), 19 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 911426721170..3912f0bf04ed 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -35,8 +35,100 @@ struct cs_etm_recording { size_t snapshot_size; };
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu);
+static int cs_etm_set_context_id(struct auxtrace_record *itr,
struct perf_evsel *evsel, int cpu)
+{
struct cs_etm_recording *ptr;
struct perf_pmu *cs_etm_pmu;
char path[PATH_MAX];
int err = -EINVAL;
u32 val;
ptr = container_of(itr, struct cs_etm_recording, itr);
cs_etm_pmu = ptr->cs_etm_pmu;
if (!cs_etm_is_etmv4(itr, cpu))
goto out;
/* Get a handle on TRCIRD2 */
snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]);
err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
/* There was a problem reading the file, bailing out */
if (err != 1) {
pr_err("%s: can't read file %s\n",
CORESIGHT_ETM_PMU_NAME, path);
goto out;
}
/*
* TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID tracing
* is supported:
* 0b00000 Context ID tracing is not supported.
* 0b00100 Maximum of 32-bit Context ID size.
* All other values are reserved.
*/
val = BMVAL(val, 5, 9);
if (!val || val != 0x4) {
err = -EINVAL;
goto out;
}
/* All good, let the kernel know */
evsel->attr.config |= (1 << ETM_OPT_CTXTID);
err = 0;
+out:
return err;
+}
+static int cs_etm_set_option(struct auxtrace_record *itr,
struct perf_evsel *evsel, u32 option)
+{
int i, err = -EINVAL;
struct cpu_map *event_cpus = evsel->evlist->cpus;
struct cpu_map *online_cpus = cpu_map__new(NULL);
/* Set option of each CPU we have */
for (i = 0; i < cpu__max_cpu(); i++) {
if (!cpu_map__has(event_cpus, i) ||
!cpu_map__has(online_cpus, i))
continue;
switch (option) {
case ETM_OPT_CTXTID:
err = cs_etm_set_context_id(itr, evsel, i);
if (err)
goto out;
break;
default:
goto out;
}
}
I am not too familiar with the perf tool code. But, isn't there a way to force the config bit, right from the beginning when the events are created, when we know that we are doing a CPU wide tracing, along with the other config bits ?
This code is ran just after the event list is created. In order to avoid this step and have the config bits set right from the beginning one would have to explicitly specify the options within the '/' '/' of the cs_etm event on the command line, which would be cumbersome and error prone. Instead this code guarantees that all options needed for a CPU-wide session are set properly.
err = 0;
+out:
cpu_map__put(online_cpus);
return err;
+}
- static int cs_etm_parse_snapshot_options(struct auxtrace_record *itr, struct record_opts *opts, const char *str)
@@ -105,8 +197,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, container_of(itr, struct cs_etm_recording, itr); struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu; struct perf_evsel *evsel, *cs_etm_evsel = NULL;
const struct cpu_map *cpus = evlist->cpus;
struct cpu_map *cpus = evlist->cpus; bool privileged = (geteuid() == 0 || perf_event_paranoid() < 0);
int err = 0; ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
@@ -241,19 +334,24 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
/* * In the case of per-cpu mmaps, we need the CPU on the
* AUX event.
* AUX event. We also need the contextID in order to be notified
* when a context switch happened. */
if (!cpu_map__empty(cpus))
if (!cpu_map__empty(cpus)) { perf_evsel__set_sample_bit(cs_etm_evsel, CPU);
err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID);
if (err)
goto out;
}
/* Add dummy event to keep tracking */ if (opts->full_auxtrace) { struct perf_evsel *tracking_evsel;
int err; err = parse_events(evlist, "dummy:u", NULL); if (err)
return err;
goto out; tracking_evsel = perf_evlist__last(evlist); perf_evlist__set_tracking_event(evlist, tracking_evsel);
@@ -266,7 +364,8 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, perf_evsel__set_sample_bit(tracking_evsel, TIME); }
return 0;
+out:
}return err;
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0e97c196147a..826c9eedaf5c 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024)
+/*
- Create a contiguous bitmask starting at bit position @l and ending at
- position @h. For example
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- Carbon copy of implementation found in $KERNEL/include/linux/bitops.h
- */
+#define GENMASK(h, l) \
(((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
minor nit: Could this be placed in a more generic header file for the other parts of the perf tool to consume ?
Back when I wrote this code my thinking was to keep it private since nobody else in the perf tools had a need for it. But I now that Arnaldo added the header back in August [1] there is no need for a private version.
Arnaldo, do you want a patch on top of the current patchset or a new set?
[1]. ba4aa02b417f0 (Arnaldo Carvalho de Melo 2018-09-25 10:55:59 -0300 17) * GENMASK_ULL(39, 21)
+#define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb)
Cheers Suzuki
Em Fri, Jun 07, 2019 at 10:21:36AM +0100, Suzuki K Poulose escreveu:
Hi Mathieu,
On 24/05/2019 18:34, Mathieu Poirier wrote:
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
I am sorry but, I don't remember reviewing this patch in the previous postings. But here we go.
Can I keep it as is? I addressed one of your concerns below, please check.
- Arnaldo
+++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024) +/*
- Create a contiguous bitmask starting at bit position @l and ending at
- position @h. For example
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- Carbon copy of implementation found in $KERNEL/include/linux/bitops.h
- */
+#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
minor nit: Could this be placed in a more generic header file for the other parts of the perf tool to consume ?
Yeah, since we have:
Good catch, we have it already:
[acme@quaco perf]$ tail tools/include/linux/bits.h * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. */ #define GENMASK(h, l) \ (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
#define GENMASK_ULL(h, l) \ (((~0ULL) - (1ULL << (l)) + 1) & \ (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
#endif /* __LINUX_BITS_H */ [acme@quaco perf]$ [acme@quaco perf]$
So I'm adding this to the pile with a Suggested-by: Suzuki, ok?
commit 3217a621248824fbff8563d8447fdafe69c5316d Author: Arnaldo Carvalho de Melo acme@redhat.com Date: Fri Jun 7 15:14:27 2019 -0300
perf cs-etm: Remove duplicate GENMASK() define, use linux/bits.h instead
Suzuki noticed that this should be more useful in a generic header, and after looking I noticed we have it already in our copy of include/linux/bits.h in tools/include, so just use it, test built on x86-64 and ubuntu 19.04 with:
perfbuilder@46646c9e848e:/$ aarch64-linux-gnu-gcc --version |& head -1 aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 perfbuilder@46646c9e848e:/$
Suggested-by: Suzuki K Poulose suzuki.poulose@arm.com Link: https://lkml.kernel.org/r/68c1c548-33cd-31e8-100d-7ffad008c7b2@arm.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Leo Yan leo.yan@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org, Link: https://lkml.kernel.org/n/tip-69pd3mqvxdlh2shddsc7yhyv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 33b57e748c3d..bc848fd095f4 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -9,6 +9,7 @@
#include "util/event.h" #include "util/session.h" +#include <linux/bits.h>
/* Versionning header in case things need tro change in the future. That way * decoding of old snapshot is still possible. @@ -161,16 +162,6 @@ struct cs_etm_packet_queue {
#define CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL
-/* - * Create a contiguous bitmask starting at bit position @l and ending at - * position @h. For example - * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. - * - * Carbon copy of implementation found in $KERNEL/include/linux/bitops.h - */ -#define GENMASK(h, l) \ - (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) - #define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb)
#define CS_ETM_HEADER_SIZE (CS_HEADER_VERSION_0_MAX * sizeof(u64))
On Fri, 7 Jun 2019 at 12:20, Arnaldo Carvalho de Melo arnaldo.melo@gmail.com wrote:
Em Fri, Jun 07, 2019 at 10:21:36AM +0100, Suzuki K Poulose escreveu:
Hi Mathieu,
On 24/05/2019 18:34, Mathieu Poirier wrote:
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
I am sorry but, I don't remember reviewing this patch in the previous postings. But here we go.
Can I keep it as is? I addressed one of your concerns below, please check.
- Arnaldo
+++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024) +/*
- Create a contiguous bitmask starting at bit position @l and ending at
- position @h. For example
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- Carbon copy of implementation found in $KERNEL/include/linux/bitops.h
- */
+#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
minor nit: Could this be placed in a more generic header file for the other parts of the perf tool to consume ?
Yeah, since we have:
Good catch, we have it already:
[acme@quaco perf]$ tail tools/include/linux/bits.h
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
*/ #define GENMASK(h, l) \ (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
#define GENMASK_ULL(h, l) \ (((~0ULL) - (1ULL << (l)) + 1) & \ (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
#endif /* __LINUX_BITS_H */ [acme@quaco perf]$ [acme@quaco perf]$
So I'm adding this to the pile with a Suggested-by: Suzuki, ok?
commit 3217a621248824fbff8563d8447fdafe69c5316d Author: Arnaldo Carvalho de Melo acme@redhat.com Date: Fri Jun 7 15:14:27 2019 -0300
perf cs-etm: Remove duplicate GENMASK() define, use linux/bits.h instead Suzuki noticed that this should be more useful in a generic header, and after looking I noticed we have it already in our copy of include/linux/bits.h in tools/include, so just use it, test built on x86-64 and ubuntu 19.04 with: perfbuilder@46646c9e848e:/$ aarch64-linux-gnu-gcc --version |& head -1 aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 perfbuilder@46646c9e848e:/$ Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lkml.kernel.org/r/68c1c548-33cd-31e8-100d-7ffad008c7b2@arm.com Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org, Link: https://lkml.kernel.org/n/tip-69pd3mqvxdlh2shddsc7yhyv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 33b57e748c3d..bc848fd095f4 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -9,6 +9,7 @@
#include "util/event.h" #include "util/session.h" +#include <linux/bits.h>
/* Versionning header in case things need tro change in the future. That way
- decoding of old snapshot is still possible.
@@ -161,16 +162,6 @@ struct cs_etm_packet_queue {
#define CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL
-/*
- Create a contiguous bitmask starting at bit position @l and ending at
- position @h. For example
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- Carbon copy of implementation found in $KERNEL/include/linux/bitops.h
- */
-#define GENMASK(h, l) \
(((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org
#define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb)
#define CS_ETM_HEADER_SIZE (CS_HEADER_VERSION_0_MAX * sizeof(u64))
When operating in CPU-wide mode tracers need to generate timestamps in order to correlate the code being traced on one CPU with what is executed on other CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/arch/arm/util/cs-etm.c | 57 +++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 3912f0bf04ed..be1e4f20affa 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -99,6 +99,54 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr, return err; }
+static int cs_etm_set_timestamp(struct auxtrace_record *itr, + struct perf_evsel *evsel, int cpu) +{ + struct cs_etm_recording *ptr; + struct perf_pmu *cs_etm_pmu; + char path[PATH_MAX]; + int err = -EINVAL; + u32 val; + + ptr = container_of(itr, struct cs_etm_recording, itr); + cs_etm_pmu = ptr->cs_etm_pmu; + + if (!cs_etm_is_etmv4(itr, cpu)) + goto out; + + /* Get a handle on TRCIRD0 */ + snprintf(path, PATH_MAX, "cpu%d/%s", + cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]); + err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val); + + /* There was a problem reading the file, bailing out */ + if (err != 1) { + pr_err("%s: can't read file %s\n", + CORESIGHT_ETM_PMU_NAME, path); + goto out; + } + + /* + * TRCIDR0.TSSIZE, bit [28-24], indicates whether global timestamping + * is supported: + * 0b00000 Global timestamping is not implemented + * 0b00110 Implementation supports a maximum timestamp of 48bits. + * 0b01000 Implementation supports a maximum timestamp of 64bits. + */ + val &= GENMASK(28, 24); + if (!val) { + err = -EINVAL; + goto out; + } + + /* All good, let the kernel know */ + evsel->attr.config |= (1 << ETM_OPT_TS); + err = 0; + +out: + return err; +} + static int cs_etm_set_option(struct auxtrace_record *itr, struct perf_evsel *evsel, u32 option) { @@ -118,6 +166,11 @@ static int cs_etm_set_option(struct auxtrace_record *itr, if (err) goto out; break; + case ETM_OPT_TS: + err = cs_etm_set_timestamp(itr, evsel, i); + if (err) + goto out; + break; default: goto out; } @@ -343,6 +396,10 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID); if (err) goto out; + + err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS); + if (err) + goto out; }
/* Add dummy event to keep tracking */
On 24/05/2019 18:34, Mathieu Poirier wrote:
When operating in CPU-wide mode tracers need to generate timestamps in order to correlate the code being traced on one CPU with what is executed on other CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
tools/perf/arch/arm/util/cs-etm.c | 57 +++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 3912f0bf04ed..be1e4f20affa 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -99,6 +99,54 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr, return err; } +static int cs_etm_set_timestamp(struct auxtrace_record *itr,
struct perf_evsel *evsel, int cpu)
+{
- struct cs_etm_recording *ptr;
- struct perf_pmu *cs_etm_pmu;
- char path[PATH_MAX];
- int err = -EINVAL;
- u32 val;
- ptr = container_of(itr, struct cs_etm_recording, itr);
- cs_etm_pmu = ptr->cs_etm_pmu;
- if (!cs_etm_is_etmv4(itr, cpu))
goto out;
- /* Get a handle on TRCIRD0 */
- snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
- err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
- /* There was a problem reading the file, bailing out */
- if (err != 1) {
pr_err("%s: can't read file %s\n",
CORESIGHT_ETM_PMU_NAME, path);
goto out;
- }
- /*
* TRCIDR0.TSSIZE, bit [28-24], indicates whether global timestamping
* is supported:
* 0b00000 Global timestamping is not implemented
* 0b00110 Implementation supports a maximum timestamp of 48bits.
* 0b01000 Implementation supports a maximum timestamp of 64bits.
*/
- val &= GENMASK(28, 24);
- if (!val) {
err = -EINVAL;
goto out;
- }
- /* All good, let the kernel know */
- evsel->attr.config |= (1 << ETM_OPT_TS);
- err = 0;
+out:
- return err;
+}
- static int cs_etm_set_option(struct auxtrace_record *itr, struct perf_evsel *evsel, u32 option) {
@@ -118,6 +166,11 @@ static int cs_etm_set_option(struct auxtrace_record *itr, if (err) goto out; break;
case ETM_OPT_TS:
err = cs_etm_set_timestamp(itr, evsel, i);
if (err)
goto out;
default: goto out; }break;
@@ -343,6 +396,10 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID); if (err) goto out;
err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS);
if (err)
goto out;
nit: Could we not do this in one shot, say :
cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS | ETM_OPT_CTXTID) ?
rather than iterating over the per-CPU events twice ? The cs_etm_set_option() could simply replace the switch() to :
if (option & ETM_OPT_1) do_something_for_1() if (option & ETM_OPT_2) do_something_for_2(); if (option & ~(ETM_OPT_1 | ETM_OPT_2 |...)) /* do unsupported option */
Cheers Suzuki
On Fri, 7 Jun 2019 at 03:41, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 24/05/2019 18:34, Mathieu Poirier wrote:
When operating in CPU-wide mode tracers need to generate timestamps in order to correlate the code being traced on one CPU with what is executed on other CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
tools/perf/arch/arm/util/cs-etm.c | 57 +++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 3912f0bf04ed..be1e4f20affa 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -99,6 +99,54 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr, return err; }
+static int cs_etm_set_timestamp(struct auxtrace_record *itr,
struct perf_evsel *evsel, int cpu)
+{
struct cs_etm_recording *ptr;
struct perf_pmu *cs_etm_pmu;
char path[PATH_MAX];
int err = -EINVAL;
u32 val;
ptr = container_of(itr, struct cs_etm_recording, itr);
cs_etm_pmu = ptr->cs_etm_pmu;
if (!cs_etm_is_etmv4(itr, cpu))
goto out;
/* Get a handle on TRCIRD0 */
snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
/* There was a problem reading the file, bailing out */
if (err != 1) {
pr_err("%s: can't read file %s\n",
CORESIGHT_ETM_PMU_NAME, path);
goto out;
}
/*
* TRCIDR0.TSSIZE, bit [28-24], indicates whether global timestamping
* is supported:
* 0b00000 Global timestamping is not implemented
* 0b00110 Implementation supports a maximum timestamp of 48bits.
* 0b01000 Implementation supports a maximum timestamp of 64bits.
*/
val &= GENMASK(28, 24);
if (!val) {
err = -EINVAL;
goto out;
}
/* All good, let the kernel know */
evsel->attr.config |= (1 << ETM_OPT_TS);
err = 0;
+out:
return err;
+}
- static int cs_etm_set_option(struct auxtrace_record *itr, struct perf_evsel *evsel, u32 option) {
@@ -118,6 +166,11 @@ static int cs_etm_set_option(struct auxtrace_record *itr, if (err) goto out; break;
case ETM_OPT_TS:
err = cs_etm_set_timestamp(itr, evsel, i);
if (err)
goto out;
break; default: goto out; }
@@ -343,6 +396,10 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID); if (err) goto out;
err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS);
if (err)
goto out;
nit: Could we not do this in one shot, say :
cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS | ETM_OPT_CTXTID) ?
rather than iterating over the per-CPU events twice ? The cs_etm_set_option() could simply replace the switch() to :
if (option & ETM_OPT_1) do_something_for_1() if (option & ETM_OPT_2) do_something_for_2(); if (option & ~(ETM_OPT_1 | ETM_OPT_2 |...)) /* do unsupported option */
Yes, that is a good optimization.
Arnaldo, do you prefer a new set or another patch on top of this one?
Thanks, Mathieu
Cheers Suzuki
Em Fri, Jun 07, 2019 at 11:46:32AM -0600, Mathieu Poirier escreveu:
On Fri, 7 Jun 2019 at 03:41, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 24/05/2019 18:34, Mathieu Poirier wrote:
When operating in CPU-wide mode tracers need to generate timestamps in order to correlate the code being traced on one CPU with what is executed on other CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
tools/perf/arch/arm/util/cs-etm.c | 57 +++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 3912f0bf04ed..be1e4f20affa 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -99,6 +99,54 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr, return err; }
+static int cs_etm_set_timestamp(struct auxtrace_record *itr,
struct perf_evsel *evsel, int cpu)
+{
struct cs_etm_recording *ptr;
struct perf_pmu *cs_etm_pmu;
char path[PATH_MAX];
int err = -EINVAL;
u32 val;
ptr = container_of(itr, struct cs_etm_recording, itr);
cs_etm_pmu = ptr->cs_etm_pmu;
if (!cs_etm_is_etmv4(itr, cpu))
goto out;
/* Get a handle on TRCIRD0 */
snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]);
err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
/* There was a problem reading the file, bailing out */
if (err != 1) {
pr_err("%s: can't read file %s\n",
CORESIGHT_ETM_PMU_NAME, path);
goto out;
}
/*
* TRCIDR0.TSSIZE, bit [28-24], indicates whether global timestamping
* is supported:
* 0b00000 Global timestamping is not implemented
* 0b00110 Implementation supports a maximum timestamp of 48bits.
* 0b01000 Implementation supports a maximum timestamp of 64bits.
*/
val &= GENMASK(28, 24);
if (!val) {
err = -EINVAL;
goto out;
}
/* All good, let the kernel know */
evsel->attr.config |= (1 << ETM_OPT_TS);
err = 0;
+out:
return err;
+}
- static int cs_etm_set_option(struct auxtrace_record *itr, struct perf_evsel *evsel, u32 option) {
@@ -118,6 +166,11 @@ static int cs_etm_set_option(struct auxtrace_record *itr, if (err) goto out; break;
case ETM_OPT_TS:
err = cs_etm_set_timestamp(itr, evsel, i);
if (err)
goto out;
break; default: goto out; }
@@ -343,6 +396,10 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID); if (err) goto out;
err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS);
if (err)
goto out;
nit: Could we not do this in one shot, say :
cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS | ETM_OPT_CTXTID) ?
rather than iterating over the per-CPU events twice ? The cs_etm_set_option() could simply replace the switch() to :
if (option & ETM_OPT_1) do_something_for_1() if (option & ETM_OPT_2) do_something_for_2(); if (option & ~(ETM_OPT_1 | ETM_OPT_2 |...)) /* do unsupported option */
Yes, that is a good optimization.
Arnaldo, do you prefer a new set or another patch on top of this one?
On top of it, as this isn't a fix just an optimization, so no need to go back and fix history to avoid bisection, etc.
Put it in your next set, no need to hurry.
- Arnaldo
Thanks, Mathieu
Cheers Suzuki
Ask the perf core to generate an event when processes are swapped in/out of context. That way proper action can be taken by the decoding code when faced with such event.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/arch/arm/util/cs-etm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index be1e4f20affa..cc7f1cd23b14 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -257,6 +257,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
+ if (perf_can_record_switch_events()) + opts->record_switch_events = true; + evlist__for_each_entry(evlist, evsel) { if (evsel->attr.type == cs_etm_pmu->type) { if (cs_etm_evsel) {
Add handling of ITRACE events in order to add the tid/pid of the executing process to the perf tools machine infrastructure. This information is later retrieved when a contextID packet is found in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index de488b43f440..0742c50fce46 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1657,6 +1657,29 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, return 0; }
+static int cs_etm__process_itrace_start(struct cs_etm_auxtrace *etm, + union perf_event *event) +{ + struct thread *th; + + if (etm->timeless_decoding) + return 0; + + /* + * Add the tid/pid to the log so that we can get a match when + * we get a contextID from the decoder. + */ + th = machine__findnew_thread(etm->machine, + event->itrace_start.pid, + event->itrace_start.tid); + if (!th) + return -ENOMEM; + + thread__put(th); + + return 0; +} + static int cs_etm__process_event(struct perf_session *session, union perf_event *event, struct perf_sample *sample, @@ -1694,6 +1717,9 @@ static int cs_etm__process_event(struct perf_session *session, return cs_etm__process_timeless_queues(etm, event->fork.tid);
+ if (event->header.type == PERF_RECORD_ITRACE_START) + return cs_etm__process_itrace_start(etm, event); + return 0; }
Add handling of SWITCH-CPU-WIDE events in order to add the tid/pid of the incoming process to the perf tools machine infrastructure. This information is later retrieved when a contextID packet is found in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 0742c50fce46..5322dcaaf654 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1680,6 +1680,42 @@ static int cs_etm__process_itrace_start(struct cs_etm_auxtrace *etm, return 0; }
+static int cs_etm__process_switch_cpu_wide(struct cs_etm_auxtrace *etm, + union perf_event *event) +{ + struct thread *th; + bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT; + + /* + * Context switch in per-thread mode are irrelevant since perf + * will start/stop tracing as the process is scheduled. + */ + if (etm->timeless_decoding) + return 0; + + /* + * SWITCH_IN events carry the next process to be switched out while + * SWITCH_OUT events carry the process to be switched in. As such + * we don't care about IN events. + */ + if (!out) + return 0; + + /* + * Add the tid/pid to the log so that we can get a match when + * we get a contextID from the decoder. + */ + th = machine__findnew_thread(etm->machine, + event->context_switch.next_prev_pid, + event->context_switch.next_prev_tid); + if (!th) + return -ENOMEM; + + thread__put(th); + + return 0; +} + static int cs_etm__process_event(struct perf_session *session, union perf_event *event, struct perf_sample *sample, @@ -1719,6 +1755,8 @@ static int cs_etm__process_event(struct perf_session *session,
if (event->header.type == PERF_RECORD_ITRACE_START) return cs_etm__process_itrace_start(etm, event); + else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) + return cs_etm__process_switch_cpu_wide(etm, event);
return 0; }
There is no point in having two different error goto statement since the openCSD API to free a decoder handles NULL pointers. As such function cs_etm_decoder__free() can be called to deal with all aspect of freeing decoder memory.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 39fe21e1cf93..5dafec421b0d 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -577,7 +577,7 @@ cs_etm_decoder__new(int num_cpu, struct cs_etm_decoder_params *d_params, /* init library print logging support */ ret = cs_etm_decoder__init_def_logger_printing(d_params, decoder); if (ret != 0) - goto err_free_decoder_tree; + goto err_free_decoder;
/* init raw frame logging if required */ cs_etm_decoder__init_raw_frame_logging(d_params, decoder); @@ -587,15 +587,13 @@ cs_etm_decoder__new(int num_cpu, struct cs_etm_decoder_params *d_params, &t_params[i], decoder); if (ret != 0) - goto err_free_decoder_tree; + goto err_free_decoder; }
return decoder;
-err_free_decoder_tree: - ocsd_destroy_dcd_tree(decoder->dcd_tree); err_free_decoder: - free(decoder); + cs_etm_decoder__free(decoder); return NULL; }
The decoder needs to work with more than one traceID queue if we want to support CPU-wide scenarios with N:1 source/sink topologies. As such move the packet buffer and related fields out of the decoder structure and into the cs_etm_queue structure.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 129 +++++++----------- .../perf/util/cs-etm-decoder/cs-etm-decoder.h | 36 +---- tools/perf/util/cs-etm.c | 37 ++++- tools/perf/util/cs-etm.h | 53 +++++++ 4 files changed, 144 insertions(+), 111 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 5dafec421b0d..3ac238e58901 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -18,8 +18,6 @@ #include "intlist.h" #include "util.h"
-#define MAX_BUFFER 1024 - /* use raw logging */ #ifdef CS_DEBUG_RAW #define CS_LOG_RAW_FRAMES @@ -31,18 +29,12 @@ #endif #endif
-#define CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL - struct cs_etm_decoder { void *data; void (*packet_printer)(const char *msg); dcd_tree_handle_t dcd_tree; cs_etm_mem_cb_type mem_access; ocsd_datapath_resp_t prev_return; - u32 packet_count; - u32 head; - u32 tail; - struct cs_etm_packet packet_buffer[MAX_BUFFER]; };
static u32 @@ -88,14 +80,14 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder) return 0; }
-int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, +int cs_etm_decoder__get_packet(struct cs_etm_packet_queue *packet_queue, struct cs_etm_packet *packet) { - if (!decoder || !packet) + if (!packet_queue || !packet) return -EINVAL;
/* Nothing to do, might as well just return */ - if (decoder->packet_count == 0) + if (packet_queue->packet_count == 0) return 0; /* * The queueing process in function cs_etm_decoder__buffer_packet() @@ -106,11 +98,12 @@ int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, * value. Otherwise the first element of the packet queue is not * used. */ - decoder->head = (decoder->head + 1) & (MAX_BUFFER - 1); + packet_queue->head = (packet_queue->head + 1) & + (CS_ETM_PACKET_MAX_BUFFER - 1);
- *packet = decoder->packet_buffer[decoder->head]; + *packet = packet_queue->packet_buffer[packet_queue->head];
- decoder->packet_count--; + packet_queue->packet_count--;
return 1; } @@ -276,84 +269,60 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); }
-static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) -{ - int i; - - decoder->head = 0; - decoder->tail = 0; - decoder->packet_count = 0; - for (i = 0; i < MAX_BUFFER; i++) { - decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN; - decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[i].instr_count = 0; - decoder->packet_buffer[i].last_instr_taken_branch = false; - decoder->packet_buffer[i].last_instr_size = 0; - decoder->packet_buffer[i].last_instr_type = 0; - decoder->packet_buffer[i].last_instr_subtype = 0; - decoder->packet_buffer[i].last_instr_cond = 0; - decoder->packet_buffer[i].flags = 0; - decoder->packet_buffer[i].exception_number = UINT32_MAX; - decoder->packet_buffer[i].trace_chan_id = UINT8_MAX; - decoder->packet_buffer[i].cpu = INT_MIN; - } -} - static ocsd_datapath_resp_t -cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, const u8 trace_chan_id, enum cs_etm_sample_type sample_type) { u32 et = 0; int cpu;
- if (decoder->packet_count >= MAX_BUFFER - 1) + if (packet_queue->packet_count >= CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_FATAL_SYS_ERR;
if (cs_etm__get_cpu(trace_chan_id, &cpu) < 0) return OCSD_RESP_FATAL_SYS_ERR;
- et = decoder->tail; - et = (et + 1) & (MAX_BUFFER - 1); - decoder->tail = et; - decoder->packet_count++; - - decoder->packet_buffer[et].sample_type = sample_type; - decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN; - decoder->packet_buffer[et].cpu = cpu; - decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[et].instr_count = 0; - decoder->packet_buffer[et].last_instr_taken_branch = false; - decoder->packet_buffer[et].last_instr_size = 0; - decoder->packet_buffer[et].last_instr_type = 0; - decoder->packet_buffer[et].last_instr_subtype = 0; - decoder->packet_buffer[et].last_instr_cond = 0; - decoder->packet_buffer[et].flags = 0; - decoder->packet_buffer[et].exception_number = UINT32_MAX; - decoder->packet_buffer[et].trace_chan_id = trace_chan_id; - - if (decoder->packet_count == MAX_BUFFER - 1) + et = packet_queue->tail; + et = (et + 1) & (CS_ETM_PACKET_MAX_BUFFER - 1); + packet_queue->tail = et; + packet_queue->packet_count++; + + packet_queue->packet_buffer[et].sample_type = sample_type; + packet_queue->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN; + packet_queue->packet_buffer[et].cpu = cpu; + packet_queue->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR; + packet_queue->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR; + packet_queue->packet_buffer[et].instr_count = 0; + packet_queue->packet_buffer[et].last_instr_taken_branch = false; + packet_queue->packet_buffer[et].last_instr_size = 0; + packet_queue->packet_buffer[et].last_instr_type = 0; + packet_queue->packet_buffer[et].last_instr_subtype = 0; + packet_queue->packet_buffer[et].last_instr_cond = 0; + packet_queue->packet_buffer[et].flags = 0; + packet_queue->packet_buffer[et].exception_number = UINT32_MAX; + packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id; + + if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_WAIT;
return OCSD_RESP_CONT; }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { int ret = 0; struct cs_etm_packet *packet;
- ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + ret = cs_etm_decoder__buffer_packet(packet_queue, trace_chan_id, CS_ETM_RANGE); if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) return ret;
- packet = &decoder->packet_buffer[decoder->tail]; + packet = &packet_queue->packet_buffer[packet_queue->tail];
switch (elem->isa) { case ocsd_isa_aarch64: @@ -400,36 +369,36 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_discontinuity(struct cs_etm_decoder *decoder, - const uint8_t trace_chan_id) +cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, + const uint8_t trace_chan_id) { - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY); }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_exception(struct cs_etm_packet_queue *queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { int ret = 0; struct cs_etm_packet *packet;
- ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + ret = cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_EXCEPTION); if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) return ret;
- packet = &decoder->packet_buffer[decoder->tail]; + packet = &queue->packet_buffer[queue->tail]; packet->exception_number = elem->exception_number;
return ret; }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_exception_ret(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) { - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_EXCEPTION_RET); }
@@ -441,6 +410,13 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( { ocsd_datapath_resp_t resp = OCSD_RESP_CONT; struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context; + struct cs_etm_queue *etmq = decoder->data; + struct cs_etm_packet_queue *packet_queue; + + /* First get the packet queue */ + packet_queue = cs_etm__etmq_get_packet_queue(etmq); + if (!packet_queue) + return OCSD_RESP_FATAL_SYS_ERR;
switch (elem->elem_type) { case OCSD_GEN_TRC_ELEM_UNKNOWN: @@ -448,19 +424,19 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( case OCSD_GEN_TRC_ELEM_EO_TRACE: case OCSD_GEN_TRC_ELEM_NO_SYNC: case OCSD_GEN_TRC_ELEM_TRACE_ON: - resp = cs_etm_decoder__buffer_discontinuity(decoder, + resp = cs_etm_decoder__buffer_discontinuity(packet_queue, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_INSTR_RANGE: - resp = cs_etm_decoder__buffer_range(decoder, elem, + resp = cs_etm_decoder__buffer_range(packet_queue, elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_EXCEPTION: - resp = cs_etm_decoder__buffer_exception(decoder, elem, + resp = cs_etm_decoder__buffer_exception(packet_queue, elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_EXCEPTION_RET: - resp = cs_etm_decoder__buffer_exception_ret(decoder, + resp = cs_etm_decoder__buffer_exception_ret(packet_queue, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT: @@ -554,7 +530,6 @@ cs_etm_decoder__new(int num_cpu, struct cs_etm_decoder_params *d_params,
decoder->data = d_params->data; decoder->prev_return = OCSD_RESP_CONT; - cs_etm_decoder__clear_buffer(decoder); format = (d_params->formatted ? OCSD_TRC_SRC_FRAME_FORMATTED : OCSD_TRC_SRC_SINGLE); flags = 0; diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 3ab11dfa92ae..6ae7ab4cf5fe 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -14,38 +14,8 @@ #include <stdio.h>
struct cs_etm_decoder; - -enum cs_etm_sample_type { - CS_ETM_EMPTY, - CS_ETM_RANGE, - CS_ETM_DISCONTINUITY, - CS_ETM_EXCEPTION, - CS_ETM_EXCEPTION_RET, -}; - -enum cs_etm_isa { - CS_ETM_ISA_UNKNOWN, - CS_ETM_ISA_A64, - CS_ETM_ISA_A32, - CS_ETM_ISA_T32, -}; - -struct cs_etm_packet { - enum cs_etm_sample_type sample_type; - enum cs_etm_isa isa; - u64 start_addr; - u64 end_addr; - u32 instr_count; - u32 last_instr_type; - u32 last_instr_subtype; - u32 flags; - u32 exception_number; - u8 last_instr_cond; - u8 last_instr_taken_branch; - u8 last_instr_size; - u8 trace_chan_id; - int cpu; -}; +struct cs_etm_packet; +struct cs_etm_packet_queue;
struct cs_etm_queue;
@@ -119,7 +89,7 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, u64 start, u64 end, cs_etm_mem_cb_type cb_func);
-int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, +int cs_etm_decoder__get_packet(struct cs_etm_packet_queue *packet_queue, struct cs_etm_packet *packet);
int cs_etm_decoder__reset(struct cs_etm_decoder *decoder); diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 5322dcaaf654..a74c53a45839 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -78,6 +78,7 @@ struct cs_etm_queue { struct cs_etm_packet *packet; const unsigned char *buf; size_t buf_len, buf_used; + struct cs_etm_packet_queue packet_queue; };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -125,6 +126,36 @@ int cs_etm__get_cpu(u8 trace_chan_id, int *cpu) return 0; }
+static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) +{ + int i; + + queue->head = 0; + queue->tail = 0; + queue->packet_count = 0; + for (i = 0; i < CS_ETM_PACKET_MAX_BUFFER; i++) { + queue->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN; + queue->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR; + queue->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR; + queue->packet_buffer[i].instr_count = 0; + queue->packet_buffer[i].last_instr_taken_branch = false; + queue->packet_buffer[i].last_instr_size = 0; + queue->packet_buffer[i].last_instr_type = 0; + queue->packet_buffer[i].last_instr_subtype = 0; + queue->packet_buffer[i].last_instr_cond = 0; + queue->packet_buffer[i].flags = 0; + queue->packet_buffer[i].exception_number = UINT32_MAX; + queue->packet_buffer[i].trace_chan_id = UINT8_MAX; + queue->packet_buffer[i].cpu = INT_MIN; + } +} + +struct cs_etm_packet_queue +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq) +{ + return &etmq->packet_queue; +} + static void cs_etm__packet_dump(const char *pkt_string) { const char *color = PERF_COLOR_BLUE; @@ -513,6 +544,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->pid = -1; etmq->offset = 0; etmq->period_instructions = 0; + cs_etm__clear_packet_queue(&etmq->packet_queue);
out: return ret; @@ -1542,10 +1574,13 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) { int ret; + struct cs_etm_packet_queue *packet_queue; + + packet_queue = cs_etm__etmq_get_packet_queue(etmq);
/* Process each packet in this chunk */ while (1) { - ret = cs_etm_decoder__get_packet(etmq->decoder, + ret = cs_etm_decoder__get_packet(packet_queue, etmq->packet); if (ret <= 0) /* diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 826c9eedaf5c..75385e2fd283 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -97,12 +97,57 @@ enum { CS_ETMV4_EXC_END = 31, };
+enum cs_etm_sample_type { + CS_ETM_EMPTY, + CS_ETM_RANGE, + CS_ETM_DISCONTINUITY, + CS_ETM_EXCEPTION, + CS_ETM_EXCEPTION_RET, +}; + +enum cs_etm_isa { + CS_ETM_ISA_UNKNOWN, + CS_ETM_ISA_A64, + CS_ETM_ISA_A32, + CS_ETM_ISA_T32, +}; + /* RB tree for quick conversion between traceID and metadata pointers */ struct intlist *traceid_list;
+struct cs_etm_queue; + +struct cs_etm_packet { + enum cs_etm_sample_type sample_type; + enum cs_etm_isa isa; + u64 start_addr; + u64 end_addr; + u32 instr_count; + u32 last_instr_type; + u32 last_instr_subtype; + u32 flags; + u32 exception_number; + u8 last_instr_cond; + u8 last_instr_taken_branch; + u8 last_instr_size; + u8 trace_chan_id; + int cpu; +}; + +#define CS_ETM_PACKET_MAX_BUFFER 1024 + +struct cs_etm_packet_queue { + u32 packet_count; + u32 head; + u32 tail; + struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER]; +}; + #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024)
+#define CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL + /* * Create a contiguous bitmask starting at bit position @l and ending at * position @h. For example @@ -126,6 +171,8 @@ struct intlist *traceid_list; int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); +struct cs_etm_packet_queue +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq); #else static inline int cs_etm__process_auxtrace_info(union perf_event *event __maybe_unused, @@ -139,6 +186,12 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, { return -1; } + +static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( + struct cs_etm_queue *etmq __maybe_unused) +{ + return NULL; +} #endif
#endif
Fixing wrong indentation of the while() loop - no change of functionality.
Fixes: 3fa0e83e2948 ("perf cs-etm: Modularize main packet processing loop") Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 108 +++++++++++++++++++-------------------- 1 file changed, 54 insertions(+), 54 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index a74c53a45839..68fec6f019fe 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1578,64 +1578,64 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq)
packet_queue = cs_etm__etmq_get_packet_queue(etmq);
- /* Process each packet in this chunk */ - while (1) { - ret = cs_etm_decoder__get_packet(packet_queue, - etmq->packet); - if (ret <= 0) - /* - * Stop processing this chunk on - * end of data or error - */ - break; + /* Process each packet in this chunk */ + while (1) { + ret = cs_etm_decoder__get_packet(packet_queue, + etmq->packet); + if (ret <= 0) + /* + * Stop processing this chunk on + * end of data or error + */ + break; + + /* + * Since packet addresses are swapped in packet + * handling within below switch() statements, + * thus setting sample flags must be called + * prior to switch() statement to use address + * information before packets swapping. + */ + ret = cs_etm__set_sample_flags(etmq); + if (ret < 0) + break;
+ switch (etmq->packet->sample_type) { + case CS_ETM_RANGE: /* - * Since packet addresses are swapped in packet - * handling within below switch() statements, - * thus setting sample flags must be called - * prior to switch() statement to use address - * information before packets swapping. + * If the packet contains an instruction + * range, generate instruction sequence + * events. */ - ret = cs_etm__set_sample_flags(etmq); - if (ret < 0) - break; - - switch (etmq->packet->sample_type) { - case CS_ETM_RANGE: - /* - * If the packet contains an instruction - * range, generate instruction sequence - * events. - */ - cs_etm__sample(etmq); - break; - case CS_ETM_EXCEPTION: - case CS_ETM_EXCEPTION_RET: - /* - * If the exception packet is coming, - * make sure the previous instruction - * range packet to be handled properly. - */ - cs_etm__exception(etmq); - break; - case CS_ETM_DISCONTINUITY: - /* - * Discontinuity in trace, flush - * previous branch stack - */ - cs_etm__flush(etmq); - break; - case CS_ETM_EMPTY: - /* - * Should not receive empty packet, - * report error. - */ - pr_err("CS ETM Trace: empty packet\n"); - return -EINVAL; - default: - break; - } + cs_etm__sample(etmq); + break; + case CS_ETM_EXCEPTION: + case CS_ETM_EXCEPTION_RET: + /* + * If the exception packet is coming, + * make sure the previous instruction + * range packet to be handled properly. + */ + cs_etm__exception(etmq); + break; + case CS_ETM_DISCONTINUITY: + /* + * Discontinuity in trace, flush + * previous branch stack + */ + cs_etm__flush(etmq); + break; + case CS_ETM_EMPTY: + /* + * Should not receive empty packet, + * report error. + */ + pr_err("CS ETM Trace: empty packet\n"); + return -EINVAL; + default: + break; } + }
return ret; }
In an ideal world there is one CPU per cs_etm_queue and as such, one trace ID per cs_etm_queue. In the real world CoreSight topologies allow multiple CPUs to use the same sink, which translates to multiple trace IDs per cs_etm_queue.
To deal with this a new cs_etm_traceid_queue structure is introduced to enclose all the information related to a single trace ID, allowing a cs_etm_queue to handle traces generated by any number of CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 4 +- tools/perf/util/cs-etm.c | 360 +++++++++++------- tools/perf/util/cs-etm.h | 15 +- 3 files changed, 234 insertions(+), 145 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 3ac238e58901..4303d2d00d31 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -413,8 +413,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( struct cs_etm_queue *etmq = decoder->data; struct cs_etm_packet_queue *packet_queue;
- /* First get the packet queue */ - packet_queue = cs_etm__etmq_get_packet_queue(etmq); + /* First get the packet queue for this traceID */ + packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id); if (!packet_queue) return OCSD_RESP_FATAL_SYS_ERR;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 68fec6f019fe..9e8212c74055 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -60,25 +60,30 @@ struct cs_etm_auxtrace { unsigned int pmu_type; };
+struct cs_etm_traceid_queue { + u8 trace_chan_id; + u64 period_instructions; + size_t last_branch_pos; + union perf_event *event_buf; + struct branch_stack *last_branch; + struct branch_stack *last_branch_rb; + struct cs_etm_packet *prev_packet; + struct cs_etm_packet *packet; + struct cs_etm_packet_queue packet_queue; +}; + struct cs_etm_queue { struct cs_etm_auxtrace *etm; struct thread *thread; struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; - union perf_event *event_buf; unsigned int queue_nr; pid_t pid, tid; int cpu; u64 offset; - u64 period_instructions; - struct branch_stack *last_branch; - struct branch_stack *last_branch_rb; - size_t last_branch_pos; - struct cs_etm_packet *prev_packet; - struct cs_etm_packet *packet; const unsigned char *buf; size_t buf_len, buf_used; - struct cs_etm_packet_queue packet_queue; + struct cs_etm_traceid_queue *traceid_queues; };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -150,10 +155,96 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) } }
+static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, + u8 trace_chan_id) +{ + int rc = -ENOMEM; + struct cs_etm_auxtrace *etm = etmq->etm; + + cs_etm__clear_packet_queue(&tidq->packet_queue); + + tidq->trace_chan_id = trace_chan_id; + + tidq->packet = zalloc(sizeof(struct cs_etm_packet)); + if (!tidq->packet) + goto out; + + tidq->prev_packet = zalloc(sizeof(struct cs_etm_packet)); + if (!tidq->prev_packet) + goto out_free; + + if (etm->synth_opts.last_branch) { + size_t sz = sizeof(struct branch_stack); + + sz += etm->synth_opts.last_branch_sz * + sizeof(struct branch_entry); + tidq->last_branch = zalloc(sz); + if (!tidq->last_branch) + goto out_free; + tidq->last_branch_rb = zalloc(sz); + if (!tidq->last_branch_rb) + goto out_free; + } + + tidq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE); + if (!tidq->event_buf) + goto out_free; + + return 0; + +out_free: + zfree(&tidq->last_branch_rb); + zfree(&tidq->last_branch); + zfree(&tidq->prev_packet); + zfree(&tidq->packet); +out: + return rc; +} + +static struct cs_etm_traceid_queue +*cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) +{ + struct cs_etm_traceid_queue *tidq; + struct cs_etm_auxtrace *etm = etmq->etm; + + if (!etm->timeless_decoding) + return NULL; + + tidq = etmq->traceid_queues; + + if (tidq) + return tidq; + + tidq = malloc(sizeof(*tidq)); + if (!tidq) + return NULL; + + memset(tidq, 0, sizeof(*tidq)); + + if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id)) + goto out_free; + + etmq->traceid_queues = tidq; + + return etmq->traceid_queues; + +out_free: + free(tidq); + + return NULL; +} + struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq) +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) { - return &etmq->packet_queue; + struct cs_etm_traceid_queue *tidq; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (tidq) + return &tidq->packet_queue; + + return NULL; }
static void cs_etm__packet_dump(const char *pkt_string) @@ -327,11 +418,12 @@ static void cs_etm__free_queue(void *priv)
thread__zput(etmq->thread); cs_etm_decoder__free(etmq->decoder); - zfree(&etmq->event_buf); - zfree(&etmq->last_branch); - zfree(&etmq->last_branch_rb); - zfree(&etmq->prev_packet); - zfree(&etmq->packet); + zfree(&etmq->traceid_queues->event_buf); + zfree(&etmq->traceid_queues->last_branch); + zfree(&etmq->traceid_queues->last_branch_rb); + zfree(&etmq->traceid_queues->prev_packet); + zfree(&etmq->traceid_queues->packet); + zfree(&etmq->traceid_queues); free(etmq); }
@@ -443,37 +535,11 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) struct cs_etm_decoder_params d_params; struct cs_etm_trace_params *t_params = NULL; struct cs_etm_queue *etmq; - size_t szp = sizeof(struct cs_etm_packet);
etmq = zalloc(sizeof(*etmq)); if (!etmq) return NULL;
- etmq->packet = zalloc(szp); - if (!etmq->packet) - goto out_free; - - etmq->prev_packet = zalloc(szp); - if (!etmq->prev_packet) - goto out_free; - - if (etm->synth_opts.last_branch) { - size_t sz = sizeof(struct branch_stack); - - sz += etm->synth_opts.last_branch_sz * - sizeof(struct branch_entry); - etmq->last_branch = zalloc(sz); - if (!etmq->last_branch) - goto out_free; - etmq->last_branch_rb = zalloc(sz); - if (!etmq->last_branch_rb) - goto out_free; - } - - etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE); - if (!etmq->event_buf) - goto out_free; - /* Use metadata to fill in trace parameters for trace decoder */ t_params = zalloc(sizeof(*t_params) * etm->num_cpu);
@@ -508,12 +574,6 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) out_free_decoder: cs_etm_decoder__free(etmq->decoder); out_free: - zfree(&t_params); - zfree(&etmq->event_buf); - zfree(&etmq->last_branch); - zfree(&etmq->last_branch_rb); - zfree(&etmq->prev_packet); - zfree(&etmq->packet); free(etmq);
return NULL; @@ -543,8 +603,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->tid = queue->tid; etmq->pid = -1; etmq->offset = 0; - etmq->period_instructions = 0; - cs_etm__clear_packet_queue(&etmq->packet_queue);
out: return ret; @@ -577,10 +635,12 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) return 0; }
-static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { - struct branch_stack *bs_src = etmq->last_branch_rb; - struct branch_stack *bs_dst = etmq->last_branch; + struct branch_stack *bs_src = tidq->last_branch_rb; + struct branch_stack *bs_dst = tidq->last_branch; size_t nr = 0;
/* @@ -600,9 +660,9 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) * two steps. First, copy the branches from the most recently inserted * branch ->last_branch_pos until the end of bs_src->entries buffer. */ - nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos; + nr = etmq->etm->synth_opts.last_branch_sz - tidq->last_branch_pos; memcpy(&bs_dst->entries[0], - &bs_src->entries[etmq->last_branch_pos], + &bs_src->entries[tidq->last_branch_pos], sizeof(struct branch_entry) * nr);
/* @@ -615,14 +675,15 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) { memcpy(&bs_dst->entries[nr], &bs_src->entries[0], - sizeof(struct branch_entry) * etmq->last_branch_pos); + sizeof(struct branch_entry) * tidq->last_branch_pos); } }
-static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq) { - etmq->last_branch_pos = 0; - etmq->last_branch_rb->nr = 0; + tidq->last_branch_pos = 0; + tidq->last_branch_rb->nr = 0; }
static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, @@ -675,9 +736,10 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, return packet->start_addr + offset * 4; }
-static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) +static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { - struct branch_stack *bs = etmq->last_branch_rb; + struct branch_stack *bs = tidq->last_branch_rb; struct branch_entry *be;
/* @@ -686,14 +748,14 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) * buffer down. After writing the first element of the stack, move the * insert position back to the end of the buffer. */ - if (!etmq->last_branch_pos) - etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz; + if (!tidq->last_branch_pos) + tidq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
- etmq->last_branch_pos -= 1; + tidq->last_branch_pos -= 1;
- be = &bs->entries[etmq->last_branch_pos]; - be->from = cs_etm__last_executed_instr(etmq->prev_packet); - be->to = cs_etm__first_executed_instr(etmq->packet); + be = &bs->entries[tidq->last_branch_pos]; + be->from = cs_etm__last_executed_instr(tidq->prev_packet); + be->to = cs_etm__first_executed_instr(tidq->packet); /* No support for mispredict */ be->flags.mispred = 0; be->flags.predicted = 1; @@ -777,11 +839,12 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) { int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm; - union perf_event *event = etmq->event_buf; + union perf_event *event = tidq->event_buf; struct perf_sample sample = {.ip = 0,};
event->sample.header.type = PERF_RECORD_SAMPLE; @@ -794,14 +857,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.id = etmq->etm->instructions_id; sample.stream_id = etmq->etm->instructions_id; sample.period = period; - sample.cpu = etmq->packet->cpu; - sample.flags = etmq->prev_packet->flags; + sample.cpu = tidq->packet->cpu; + sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
if (etm->synth_opts.last_branch) { - cs_etm__copy_last_branch_rb(etmq); - sample.branch_stack = etmq->last_branch; + cs_etm__copy_last_branch_rb(etmq, tidq); + sample.branch_stack = tidq->last_branch; }
if (etm->synth_opts.inject) { @@ -819,7 +882,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, ret);
if (etm->synth_opts.last_branch) - cs_etm__reset_last_branch_rb(etmq); + cs_etm__reset_last_branch_rb(tidq);
return ret; } @@ -828,19 +891,20 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, * The cs etm packet encodes an instruction range between a branch target * and the next taken branch. Generate sample accordingly. */ -static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) +static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct perf_sample sample = {.ip = 0,}; - union perf_event *event = etmq->event_buf; + union perf_event *event = tidq->event_buf; struct dummy_branch_stack { u64 nr; struct branch_entry entries; } dummy_bs; u64 ip;
- ip = cs_etm__last_executed_instr(etmq->prev_packet); + ip = cs_etm__last_executed_instr(tidq->prev_packet);
event->sample.header.type = PERF_RECORD_SAMPLE; event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); @@ -849,12 +913,12 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) sample.ip = ip; sample.pid = etmq->pid; sample.tid = etmq->tid; - sample.addr = cs_etm__first_executed_instr(etmq->packet); + sample.addr = cs_etm__first_executed_instr(tidq->packet); sample.id = etmq->etm->branches_id; sample.stream_id = etmq->etm->branches_id; sample.period = 1; - sample.cpu = etmq->packet->cpu; - sample.flags = etmq->prev_packet->flags; + sample.cpu = tidq->packet->cpu; + sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
/* @@ -997,33 +1061,34 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, return 0; }
-static int cs_etm__sample(struct cs_etm_queue *etmq) +static int cs_etm__sample(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret; - u64 instrs_executed = etmq->packet->instr_count; + u64 instrs_executed = tidq->packet->instr_count;
- etmq->period_instructions += instrs_executed; + tidq->period_instructions += instrs_executed;
/* * Record a branch when the last instruction in * PREV_PACKET is a branch. */ if (etm->synth_opts.last_branch && - etmq->prev_packet->sample_type == CS_ETM_RANGE && - etmq->prev_packet->last_instr_taken_branch) - cs_etm__update_last_branch_rb(etmq); + tidq->prev_packet->sample_type == CS_ETM_RANGE && + tidq->prev_packet->last_instr_taken_branch) + cs_etm__update_last_branch_rb(etmq, tidq);
if (etm->sample_instructions && - etmq->period_instructions >= etm->instructions_sample_period) { + tidq->period_instructions >= etm->instructions_sample_period) { /* * Emit instruction sample periodically * TODO: allow period to be defined in cycles and clock time */
/* Get number of instructions executed after the sample point */ - u64 instrs_over = etmq->period_instructions - + u64 instrs_over = tidq->period_instructions - etm->instructions_sample_period;
/* @@ -1032,31 +1097,31 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset); + u64 addr = cs_etm__instr_addr(etmq, tidq->packet, offset);
ret = cs_etm__synth_instruction_sample( - etmq, addr, etm->instructions_sample_period); + etmq, tidq, addr, etm->instructions_sample_period); if (ret) return ret;
/* Carry remaining instructions into next sample period */ - etmq->period_instructions = instrs_over; + tidq->period_instructions = instrs_over; }
if (etm->sample_branches) { bool generate_sample = false;
/* Generate sample for tracing on packet */ - if (etmq->prev_packet->sample_type == CS_ETM_DISCONTINUITY) + if (tidq->prev_packet->sample_type == CS_ETM_DISCONTINUITY) generate_sample = true;
/* Generate sample for branch taken packet */ - if (etmq->prev_packet->sample_type == CS_ETM_RANGE && - etmq->prev_packet->last_instr_taken_branch) + if (tidq->prev_packet->sample_type == CS_ETM_RANGE && + tidq->prev_packet->last_instr_taken_branch) generate_sample = true;
if (generate_sample) { - ret = cs_etm__synth_branch_sample(etmq); + ret = cs_etm__synth_branch_sample(etmq, tidq); if (ret) return ret; } @@ -1067,15 +1132,15 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */ - tmp = etmq->packet; - etmq->packet = etmq->prev_packet; - etmq->prev_packet = tmp; + tmp = tidq->packet; + tidq->packet = tidq->prev_packet; + tidq->prev_packet = tmp; }
return 0; }
-static int cs_etm__exception(struct cs_etm_queue *etmq) +static int cs_etm__exception(struct cs_etm_traceid_queue *tidq) { /* * When the exception packet is inserted, whether the last instruction @@ -1088,24 +1153,25 @@ static int cs_etm__exception(struct cs_etm_queue *etmq) * swap PACKET with PREV_PACKET. This keeps PREV_PACKET to be useful * for generating instruction and branch samples. */ - if (etmq->prev_packet->sample_type == CS_ETM_RANGE) - etmq->prev_packet->last_instr_taken_branch = true; + if (tidq->prev_packet->sample_type == CS_ETM_RANGE) + tidq->prev_packet->last_instr_taken_branch = true;
return 0; }
-static int cs_etm__flush(struct cs_etm_queue *etmq) +static int cs_etm__flush(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int err = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp;
/* Handle start tracing packet */ - if (etmq->prev_packet->sample_type == CS_ETM_EMPTY) + if (tidq->prev_packet->sample_type == CS_ETM_EMPTY) goto swap_packet;
if (etmq->etm->synth_opts.last_branch && - etmq->prev_packet->sample_type == CS_ETM_RANGE) { + tidq->prev_packet->sample_type == CS_ETM_RANGE) { /* * Generate a last branch event for the branches left in the * circular buffer at the end of the trace. @@ -1113,21 +1179,21 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Use the address of the end of the last reported execution * range */ - u64 addr = cs_etm__last_executed_instr(etmq->prev_packet); + u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( - etmq, addr, - etmq->period_instructions); + etmq, tidq, addr, + tidq->period_instructions); if (err) return err;
- etmq->period_instructions = 0; + tidq->period_instructions = 0;
}
if (etm->sample_branches && - etmq->prev_packet->sample_type == CS_ETM_RANGE) { - err = cs_etm__synth_branch_sample(etmq); + tidq->prev_packet->sample_type == CS_ETM_RANGE) { + err = cs_etm__synth_branch_sample(etmq, tidq); if (err) return err; } @@ -1138,15 +1204,16 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */ - tmp = etmq->packet; - etmq->packet = etmq->prev_packet; - etmq->prev_packet = tmp; + tmp = tidq->packet; + tidq->packet = tidq->prev_packet; + tidq->prev_packet = tmp; }
return err; }
-static int cs_etm__end_block(struct cs_etm_queue *etmq) +static int cs_etm__end_block(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int err;
@@ -1160,20 +1227,20 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq) * the trace. */ if (etmq->etm->synth_opts.last_branch && - etmq->prev_packet->sample_type == CS_ETM_RANGE) { + tidq->prev_packet->sample_type == CS_ETM_RANGE) { /* * Use the address of the end of the last reported execution * range. */ - u64 addr = cs_etm__last_executed_instr(etmq->prev_packet); + u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( - etmq, addr, - etmq->period_instructions); + etmq, tidq, addr, + tidq->period_instructions); if (err) return err;
- etmq->period_instructions = 0; + tidq->period_instructions = 0; }
return 0; @@ -1272,10 +1339,11 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, return false; }
-static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, u64 magic) { - struct cs_etm_packet *packet = etmq->packet; - struct cs_etm_packet *prev_packet = etmq->prev_packet; + struct cs_etm_packet *packet = tidq->packet; + struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SVC) @@ -1296,9 +1364,10 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) return false; }
-static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_async_exception(struct cs_etm_traceid_queue *tidq, + u64 magic) { - struct cs_etm_packet *packet = etmq->packet; + struct cs_etm_packet *packet = tidq->packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_DEBUG_HALT || @@ -1321,10 +1390,12 @@ static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) return false; }
-static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, + u64 magic) { - struct cs_etm_packet *packet = etmq->packet; - struct cs_etm_packet *prev_packet = etmq->prev_packet; + struct cs_etm_packet *packet = tidq->packet; + struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SMC || @@ -1367,10 +1438,11 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) return false; }
-static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) +static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { - struct cs_etm_packet *packet = etmq->packet; - struct cs_etm_packet *prev_packet = etmq->prev_packet; + struct cs_etm_packet *packet = tidq->packet; + struct cs_etm_packet *prev_packet = tidq->prev_packet; u64 magic; int ret;
@@ -1472,7 +1544,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) return ret;
/* The exception is for system call. */ - if (cs_etm__is_syscall(etmq, magic)) + if (cs_etm__is_syscall(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_SYSCALLRET; @@ -1480,7 +1552,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * The exceptions are triggered by external signals from bus, * interrupt controller, debug module, PE reset or halt. */ - else if (cs_etm__is_async_exception(etmq, magic)) + else if (cs_etm__is_async_exception(tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC | @@ -1489,7 +1561,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * Otherwise, exception is caused by trap, instruction & * data fault, or alignment errors. */ - else if (cs_etm__is_sync_exception(etmq, magic)) + else if (cs_etm__is_sync_exception(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_INTERRUPT; @@ -1571,17 +1643,18 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; }
-static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) +static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int ret; struct cs_etm_packet_queue *packet_queue;
- packet_queue = cs_etm__etmq_get_packet_queue(etmq); + packet_queue = &tidq->packet_queue;
/* Process each packet in this chunk */ while (1) { ret = cs_etm_decoder__get_packet(packet_queue, - etmq->packet); + tidq->packet); if (ret <= 0) /* * Stop processing this chunk on @@ -1596,18 +1669,18 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * prior to switch() statement to use address * information before packets swapping. */ - ret = cs_etm__set_sample_flags(etmq); + ret = cs_etm__set_sample_flags(etmq, tidq); if (ret < 0) break;
- switch (etmq->packet->sample_type) { + switch (tidq->packet->sample_type) { case CS_ETM_RANGE: /* * If the packet contains an instruction * range, generate instruction sequence * events. */ - cs_etm__sample(etmq); + cs_etm__sample(etmq, tidq); break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET: @@ -1616,14 +1689,14 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * make sure the previous instruction * range packet to be handled properly. */ - cs_etm__exception(etmq); + cs_etm__exception(tidq); break; case CS_ETM_DISCONTINUITY: /* * Discontinuity in trace, flush * previous branch stack */ - cs_etm__flush(etmq); + cs_etm__flush(etmq, tidq); break; case CS_ETM_EMPTY: /* @@ -1643,6 +1716,11 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) static int cs_etm__run_decoder(struct cs_etm_queue *etmq) { int err = 0; + struct cs_etm_traceid_queue *tidq; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID); + if (!tidq) + return -EINVAL;
/* Go through each buffer in the queue and decode them one by one */ while (1) { @@ -1661,13 +1739,13 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) * an error occurs other than hoping the next one will * be better. */ - err = cs_etm__process_decoder_queue(etmq); + err = cs_etm__process_traceid_queue(etmq, tidq);
} while (etmq->buf_len);
if (err == 0) /* Flush any remaining branch stack entries */ - err = cs_etm__end_block(etmq); + err = cs_etm__end_block(etmq, tidq); }
return err; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 75385e2fd283..f16082d37ab5 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -136,6 +136,16 @@ struct cs_etm_packet {
#define CS_ETM_PACKET_MAX_BUFFER 1024
+/* + * When working with per-thread scenarios the process under trace can + * be scheduled on any CPU and as such, more than one traceID may be + * associated with the same process. Since a traceID of '0' is illegal + * as per the CoreSight architecture, use that specific value to + * identify the queue where all packets (with any traceID) are + * aggregated. + */ +#define CS_ETM_PER_THREAD_TRACEID 0 + struct cs_etm_packet_queue { u32 packet_count; u32 head; @@ -172,7 +182,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq); +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else static inline int cs_etm__process_auxtrace_info(union perf_event *event __maybe_unused, @@ -188,7 +198,8 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, }
static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( - struct cs_etm_queue *etmq __maybe_unused) + struct cs_etm_queue *etmq __maybe_unused, + u8 trace_chan_id __maybe_unused) { return NULL; }
Nowadays the synthesize code is using the packet's cpu information, making cs_etm_queue::cpu useless. As such simply remove it.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 9e8212c74055..531bbb355ba4 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -79,7 +79,6 @@ struct cs_etm_queue { struct auxtrace_buffer *buffer; unsigned int queue_nr; pid_t pid, tid; - int cpu; u64 offset; const unsigned char *buf; size_t buf_len, buf_used; @@ -599,7 +598,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, queue->priv = etmq; etmq->etm = etm; etmq->queue_nr = queue_nr; - etmq->cpu = queue->cpu; etmq->tid = queue->tid; etmq->pid = -1; etmq->offset = 0; @@ -831,11 +829,8 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, etmq->thread = machine__find_thread(etm->machine, -1, etmq->tid);
- if (etmq->thread) { + if (etmq->thread) etmq->pid = etmq->thread->pid_; - if (queue->cpu == -1) - etmq->cpu = etmq->thread->cpu; - } }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
The thread field of structure cs_etm_queue is CPU dependent and as such need to be part of the cs_etm_traceid_queue in order to support CPU-wide trace scenarios.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 531bbb355ba4..0d51d6d9a594 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -65,6 +65,7 @@ struct cs_etm_traceid_queue { u64 period_instructions; size_t last_branch_pos; union perf_event *event_buf; + struct thread *thread; struct branch_stack *last_branch; struct branch_stack *last_branch_rb; struct cs_etm_packet *prev_packet; @@ -74,7 +75,6 @@ struct cs_etm_traceid_queue {
struct cs_etm_queue { struct cs_etm_auxtrace *etm; - struct thread *thread; struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr; @@ -415,7 +415,7 @@ static void cs_etm__free_queue(void *priv) if (!etmq) return;
- thread__zput(etmq->thread); + thread__zput(etmq->traceid_queues->thread); cs_etm_decoder__free(etmq->decoder); zfree(&etmq->traceid_queues->event_buf); zfree(&etmq->traceid_queues->last_branch); @@ -503,7 +503,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, machine = etmq->etm->machine; cpumode = cs_etm__cpu_mode(etmq, address);
- thread = etmq->thread; + thread = etmq->traceid_queues->thread; if (!thread) { if (cpumode != PERF_RECORD_MISC_KERNEL) return 0; @@ -819,18 +819,21 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, struct auxtrace_queue *queue) { + struct cs_etm_traceid_queue *tidq; struct cs_etm_queue *etmq = queue->priv;
+ tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID); + /* CPU-wide tracing isn't supported yet */ if (queue->tid == -1) return;
- if ((!etmq->thread) && (etmq->tid != -1)) - etmq->thread = machine__find_thread(etm->machine, -1, + if ((!tidq->thread) && (etmq->tid != -1)) + tidq->thread = machine__find_thread(etm->machine, -1, etmq->tid);
- if (etmq->thread) - etmq->pid = etmq->thread->pid_; + if (tidq->thread) + etmq->pid = tidq->thread->pid_; }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
The tid/pid fields of structure cs_etm_queue are CPU dependent and as such need to be part of the cs_etm_traceid_queue in order to support CPU-wide trace scenarios.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 18 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 0d51d6d9a594..7e3b4d10f5c4 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -62,6 +62,7 @@ struct cs_etm_auxtrace {
struct cs_etm_traceid_queue { u8 trace_chan_id; + pid_t pid, tid; u64 period_instructions; size_t last_branch_pos; union perf_event *event_buf; @@ -78,7 +79,6 @@ struct cs_etm_queue { struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr; - pid_t pid, tid; u64 offset; const unsigned char *buf; size_t buf_len, buf_used; @@ -159,10 +159,14 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) { int rc = -ENOMEM; + struct auxtrace_queue *queue; struct cs_etm_auxtrace *etm = etmq->etm;
cs_etm__clear_packet_queue(&tidq->packet_queue);
+ queue = &etmq->etm->queues.queue_array[etmq->queue_nr]; + tidq->tid = queue->tid; + tidq->pid = -1; tidq->trace_chan_id = trace_chan_id;
tidq->packet = zalloc(sizeof(struct cs_etm_packet)); @@ -598,8 +602,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, queue->priv = etmq; etmq->etm = etm; etmq->queue_nr = queue_nr; - etmq->tid = queue->tid; - etmq->pid = -1; etmq->offset = 0;
out: @@ -817,23 +819,19 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) }
static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, - struct auxtrace_queue *queue) + struct auxtrace_queue *queue, + struct cs_etm_traceid_queue *tidq) { - struct cs_etm_traceid_queue *tidq; - struct cs_etm_queue *etmq = queue->priv; - - tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID); - /* CPU-wide tracing isn't supported yet */ if (queue->tid == -1) return;
- if ((!tidq->thread) && (etmq->tid != -1)) + if ((!tidq->thread) && (tidq->tid != -1)) tidq->thread = machine__find_thread(etm->machine, -1, - etmq->tid); + tidq->tid);
if (tidq->thread) - etmq->pid = tidq->thread->pid_; + tidq->pid = tidq->thread->pid_; }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, @@ -850,8 +848,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, event->sample.header.size = sizeof(struct perf_event_header);
sample.ip = addr; - sample.pid = etmq->pid; - sample.tid = etmq->tid; + sample.pid = tidq->pid; + sample.tid = tidq->tid; sample.id = etmq->etm->instructions_id; sample.stream_id = etmq->etm->instructions_id; sample.period = period; @@ -909,8 +907,8 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, event->sample.header.size = sizeof(struct perf_event_header);
sample.ip = ip; - sample.pid = etmq->pid; - sample.tid = etmq->tid; + sample.pid = tidq->pid; + sample.tid = tidq->tid; sample.addr = cs_etm__first_executed_instr(tidq->packet); sample.id = etmq->etm->branches_id; sample.stream_id = etmq->etm->branches_id; @@ -1758,9 +1756,19 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, for (i = 0; i < queues->nr_queues; i++) { struct auxtrace_queue *queue = &etm->queues.queue_array[i]; struct cs_etm_queue *etmq = queue->priv; + struct cs_etm_traceid_queue *tidq; + + if (!etmq) + continue; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, + CS_ETM_PER_THREAD_TRACEID); + + if (!tidq) + continue;
- if (etmq && ((tid == -1) || (etmq->tid == tid))) { - cs_etm__set_pid_tid_cpu(etm, queue); + if ((tid == -1) || (tidq->tid == tid)) { + cs_etm__set_pid_tid_cpu(etm, queue, tidq); cs_etm__run_decoder(etmq); } }
When working with CPU-wide traces different traceID may be found in the same stream. As such we need to use the decoder callback that provides the traceID in order to know the thread context being decoded.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 14 +++---- .../perf/util/cs-etm-decoder/cs-etm-decoder.h | 3 +- tools/perf/util/cs-etm.c | 41 +++++++++++++------ 3 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 4303d2d00d31..87264b79de0e 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -41,15 +41,14 @@ static u32 cs_etm_decoder__mem_access(const void *context, const ocsd_vaddr_t address, const ocsd_mem_space_acc_t mem_space __maybe_unused, + const u8 trace_chan_id, const u32 req_size, u8 *buffer) { struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
- return decoder->mem_access(decoder->data, - address, - req_size, - buffer); + return decoder->mem_access(decoder->data, trace_chan_id, + address, req_size, buffer); }
int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, @@ -58,9 +57,10 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, { decoder->mem_access = cb_func;
- if (ocsd_dt_add_callback_mem_acc(decoder->dcd_tree, start, end, - OCSD_MEM_SPACE_ANY, - cs_etm_decoder__mem_access, decoder)) + if (ocsd_dt_add_callback_trcid_mem_acc(decoder->dcd_tree, start, end, + OCSD_MEM_SPACE_ANY, + cs_etm_decoder__mem_access, + decoder)) return -1;
return 0; diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 6ae7ab4cf5fe..11f3391d06f2 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -19,8 +19,7 @@ struct cs_etm_packet_queue;
struct cs_etm_queue;
-typedef u32 (*cs_etm_mem_cb_type)(struct cs_etm_queue *, u64, - size_t, u8 *); +typedef u32 (*cs_etm_mem_cb_type)(struct cs_etm_queue *, u8, u64, size_t, u8 *);
struct cs_etmv3_trace_params { u32 reg_ctrl; diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 7e3b4d10f5c4..2483293266d8 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -491,8 +491,8 @@ static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address) } }
-static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, - size_t size, u8 *buffer) +static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id, + u64 address, size_t size, u8 *buffer) { u8 cpumode; u64 offset; @@ -501,6 +501,8 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, struct machine *machine; struct addr_location al;
+ (void)trace_chan_id; + if (!etmq) return 0;
@@ -687,10 +689,12 @@ void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq) }
static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, - u64 addr) { + u8 trace_chan_id, u64 addr) +{ u8 instrBytes[2];
- cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes); + cs_etm__mem_access(etmq, trace_chan_id, addr, + ARRAY_SIZE(instrBytes), instrBytes); /* * T32 instruction size is indicated by bits[15:11] of the first * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111 @@ -719,6 +723,7 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) }
static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, + u64 trace_chan_id, const struct cs_etm_packet *packet, u64 offset) { @@ -726,7 +731,8 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, u64 addr = packet->start_addr;
while (offset > 0) { - addr += cs_etm__t32_instr_size(etmq, addr); + addr += cs_etm__t32_instr_size(etmq, + trace_chan_id, addr); offset--; } return addr; @@ -1063,6 +1069,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret; + u8 trace_chan_id = tidq->trace_chan_id; u64 instrs_executed = tidq->packet->instr_count;
tidq->period_instructions += instrs_executed; @@ -1093,7 +1100,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq, tidq->packet, offset); + u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, + tidq->packet, offset);
ret = cs_etm__synth_instruction_sample( etmq, tidq, addr, etm->instructions_sample_period); @@ -1268,7 +1276,7 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq) return etmq->buf_len; }
-static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, +static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, struct cs_etm_packet *packet, u64 end_addr) { @@ -1291,7 +1299,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, * so below only read 2 bytes as instruction size for T32. */ addr = end_addr - 2; - cs_etm__mem_access(etmq, addr, sizeof(instr16), (u8 *)&instr16); + cs_etm__mem_access(etmq, trace_chan_id, addr, + sizeof(instr16), (u8 *)&instr16); if ((instr16 & 0xFF00) == 0xDF00) return true;
@@ -1306,7 +1315,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, * +---------+---------+-------------------------+ */ addr = end_addr - 4; - cs_etm__mem_access(etmq, addr, sizeof(instr32), (u8 *)&instr32); + cs_etm__mem_access(etmq, trace_chan_id, addr, + sizeof(instr32), (u8 *)&instr32); if ((instr32 & 0x0F000000) == 0x0F000000 && (instr32 & 0xF0000000) != 0xF0000000) return true; @@ -1322,7 +1332,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, * +-----------------------+---------+-----------+ */ addr = end_addr - 4; - cs_etm__mem_access(etmq, addr, sizeof(instr32), (u8 *)&instr32); + cs_etm__mem_access(etmq, trace_chan_id, addr, + sizeof(instr32), (u8 *)&instr32); if ((instr32 & 0xFFE0001F) == 0xd4000001) return true;
@@ -1338,6 +1349,7 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 magic) { + u8 trace_chan_id = tidq->trace_chan_id; struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -1352,7 +1364,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, */ if (magic == __perf_cs_etmv4_magic) { if (packet->exception_number == CS_ETMV4_EXC_CALL && - cs_etm__is_svc_instr(etmq, prev_packet, + cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet, prev_packet->end_addr)) return true; } @@ -1390,6 +1402,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 magic) { + u8 trace_chan_id = tidq->trace_chan_id; struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -1415,7 +1428,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, * (SMC, HVC) are taken as sync exceptions. */ if (packet->exception_number == CS_ETMV4_EXC_CALL && - !cs_etm__is_svc_instr(etmq, prev_packet, + !cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet, prev_packet->end_addr)) return true;
@@ -1439,6 +1452,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, { struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet; + u8 trace_chan_id = tidq->trace_chan_id; u64 magic; int ret;
@@ -1519,7 +1533,8 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, if (prev_packet->flags == (PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_INTERRUPT) && - cs_etm__is_svc_instr(etmq, packet, packet->start_addr)) + cs_etm__is_svc_instr(etmq, trace_chan_id, + packet, packet->start_addr)) prev_packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_SYSCALLRET;
When operating in CPU-wide trace mode with a source/sink topology of N:1 packets with multiple traceID will end up in the same cs_etm_queue. In order to properly decode packets they need to be split in different queues, i.e one queue per traceID.
As such add support for multiple traceID per cs_etm_queue by adding a new cs_etm_traceid_queue every time a new traceID is discovered in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/Makefile.config | 3 + tools/perf/util/cs-etm.c | 131 ++++++++++++++++++++++++++++++------- 2 files changed, 110 insertions(+), 24 deletions(-)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index e1bb5288ab1f..feb2d1b85087 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -412,6 +412,9 @@ ifdef CORESIGHT $(call feature_check,libopencsd) ifeq ($(feature-libopencsd), 1) CFLAGS += -DHAVE_CSTRACE_SUPPORT $(LIBOPENCSD_CFLAGS) + ifeq ($(feature-reallocarray), 0) + CFLAGS += -DCOMPAT_NEED_REALLOCARRAY + endif LDFLAGS += $(LIBOPENCSD_LDFLAGS) EXTLIBS += $(OPENCSDLIBS) $(call detected,CONFIG_LIBOPENCSD) diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2483293266d8..afc2491f9f2a 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -29,6 +29,7 @@ #include "thread.h" #include "thread_map.h" #include "thread-stack.h" +#include <tools/libc_compat.h> #include "util.h"
#define MAX_TIMESTAMP (~0ULL) @@ -82,7 +83,9 @@ struct cs_etm_queue { u64 offset; const unsigned char *buf; size_t buf_len, buf_used; - struct cs_etm_traceid_queue *traceid_queues; + /* Conversion between traceID and index in traceid_queues array */ + struct intlist *traceid_queues_list; + struct cs_etm_traceid_queue **traceid_queues; };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -208,31 +211,71 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, static struct cs_etm_traceid_queue *cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) { - struct cs_etm_traceid_queue *tidq; + int idx; + struct int_node *inode; + struct intlist *traceid_queues_list; + struct cs_etm_traceid_queue *tidq, **traceid_queues; struct cs_etm_auxtrace *etm = etmq->etm;
- if (!etm->timeless_decoding) - return NULL; + if (etm->timeless_decoding) + trace_chan_id = CS_ETM_PER_THREAD_TRACEID;
- tidq = etmq->traceid_queues; + traceid_queues_list = etmq->traceid_queues_list;
- if (tidq) - return tidq; + /* + * Check if the traceid_queue exist for this traceID by looking + * in the queue list. + */ + inode = intlist__find(traceid_queues_list, trace_chan_id); + if (inode) { + idx = (int)(intptr_t)inode->priv; + return etmq->traceid_queues[idx]; + }
+ /* We couldn't find a traceid_queue for this traceID, allocate one */ tidq = malloc(sizeof(*tidq)); if (!tidq) return NULL;
memset(tidq, 0, sizeof(*tidq));
+ /* Get a valid index for the new traceid_queue */ + idx = intlist__nr_entries(traceid_queues_list); + /* Memory for the inode is free'ed in cs_etm_free_traceid_queues () */ + inode = intlist__findnew(traceid_queues_list, trace_chan_id); + if (!inode) + goto out_free; + + /* Associate this traceID with this index */ + inode->priv = (void *)(intptr_t)idx; + if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id)) goto out_free;
- etmq->traceid_queues = tidq; + /* Grow the traceid_queues array by one unit */ + traceid_queues = etmq->traceid_queues; + traceid_queues = reallocarray(traceid_queues, + idx + 1, + sizeof(*traceid_queues)); + + /* + * On failure reallocarray() returns NULL and the original block of + * memory is left untouched. + */ + if (!traceid_queues) + goto out_free; + + traceid_queues[idx] = tidq; + etmq->traceid_queues = traceid_queues;
- return etmq->traceid_queues; + return etmq->traceid_queues[idx];
out_free: + /* + * Function intlist__remove() removes the inode from the list + * and delete the memory associated to it. + */ + intlist__remove(traceid_queues_list, inode); free(tidq);
return NULL; @@ -412,6 +455,44 @@ static int cs_etm__flush_events(struct perf_session *session, return cs_etm__process_timeless_queues(etm, -1); }
+static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq) +{ + int idx; + uintptr_t priv; + struct int_node *inode, *tmp; + struct cs_etm_traceid_queue *tidq; + struct intlist *traceid_queues_list = etmq->traceid_queues_list; + + intlist__for_each_entry_safe(inode, tmp, traceid_queues_list) { + priv = (uintptr_t)inode->priv; + idx = priv; + + /* Free this traceid_queue from the array */ + tidq = etmq->traceid_queues[idx]; + thread__zput(tidq->thread); + zfree(&tidq->event_buf); + zfree(&tidq->last_branch); + zfree(&tidq->last_branch_rb); + zfree(&tidq->prev_packet); + zfree(&tidq->packet); + zfree(&tidq); + + /* + * Function intlist__remove() removes the inode from the list + * and delete the memory associated to it. + */ + intlist__remove(traceid_queues_list, inode); + } + + /* Then the RB tree itself */ + intlist__delete(traceid_queues_list); + etmq->traceid_queues_list = NULL; + + /* finally free the traceid_queues array */ + free(etmq->traceid_queues); + etmq->traceid_queues = NULL; +} + static void cs_etm__free_queue(void *priv) { struct cs_etm_queue *etmq = priv; @@ -419,14 +500,8 @@ static void cs_etm__free_queue(void *priv) if (!etmq) return;
- thread__zput(etmq->traceid_queues->thread); cs_etm_decoder__free(etmq->decoder); - zfree(&etmq->traceid_queues->event_buf); - zfree(&etmq->traceid_queues->last_branch); - zfree(&etmq->traceid_queues->last_branch_rb); - zfree(&etmq->traceid_queues->prev_packet); - zfree(&etmq->traceid_queues->packet); - zfree(&etmq->traceid_queues); + cs_etm__free_traceid_queues(etmq); free(etmq); }
@@ -497,19 +572,21 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id, u8 cpumode; u64 offset; int len; - struct thread *thread; - struct machine *machine; - struct addr_location al; - - (void)trace_chan_id; + struct thread *thread; + struct machine *machine; + struct addr_location al; + struct cs_etm_traceid_queue *tidq;
if (!etmq) return 0;
machine = etmq->etm->machine; cpumode = cs_etm__cpu_mode(etmq, address); + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (!tidq) + return 0;
- thread = etmq->traceid_queues->thread; + thread = tidq->thread; if (!thread) { if (cpumode != PERF_RECORD_MISC_KERNEL) return 0; @@ -545,6 +622,10 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) if (!etmq) return NULL;
+ etmq->traceid_queues_list = intlist__new(NULL); + if (!etmq->traceid_queues_list) + goto out_free; + /* Use metadata to fill in trace parameters for trace decoder */ t_params = zalloc(sizeof(*t_params) * etm->num_cpu);
@@ -579,6 +660,7 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) out_free_decoder: cs_etm_decoder__free(etmq->decoder); out_free: + intlist__delete(etmq->traceid_queues_list); free(etmq);
return NULL; @@ -1280,8 +1362,9 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, struct cs_etm_packet *packet, u64 end_addr) { - u16 instr16; - u32 instr32; + /* Initialise to keep compiler happy */ + u16 instr16 = 0; + u32 instr32 = 0; u64 addr;
switch (packet->isa) {
Link contextID packets received from the decoder with the perf tool thread mechanic so that we know the specifics of the process currently executing.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 20 ++++++++++++ tools/perf/util/cs-etm.c | 32 +++++++++++++++---- tools/perf/util/cs-etm.h | 10 ++++++ 3 files changed, 56 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 87264b79de0e..ce85e52f989c 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -402,6 +402,24 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue, CS_ETM_EXCEPTION_RET); }
+static ocsd_datapath_resp_t +cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, + const ocsd_generic_trace_elem *elem, + const uint8_t trace_chan_id) +{ + pid_t tid; + + /* Ignore PE_CONTEXT packets that don't have a valid contextID */ + if (!elem->context.ctxt_id_valid) + return OCSD_RESP_CONT; + + tid = elem->context.context_id; + if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id)) + return OCSD_RESP_FATAL_SYS_ERR; + + return OCSD_RESP_CONT; +} + static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( const void *context, const ocsd_trc_index_t indx __maybe_unused, @@ -440,6 +458,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT: + resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id); + break; case OCSD_GEN_TRC_ELEM_ADDR_NACC: case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index afc2491f9f2a..17adf554b679 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -907,13 +907,8 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) }
static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, - struct auxtrace_queue *queue, struct cs_etm_traceid_queue *tidq) { - /* CPU-wide tracing isn't supported yet */ - if (queue->tid == -1) - return; - if ((!tidq->thread) && (tidq->tid != -1)) tidq->thread = machine__find_thread(etm->machine, -1, tidq->tid); @@ -922,6 +917,31 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, tidq->pid = tidq->thread->pid_; }
+int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, + pid_t tid, u8 trace_chan_id) +{ + int cpu, err = -EINVAL; + struct cs_etm_auxtrace *etm = etmq->etm; + struct cs_etm_traceid_queue *tidq; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (!tidq) + return err; + + if (cs_etm__get_cpu(trace_chan_id, &cpu) < 0) + return err; + + err = machine__set_current_tid(etm->machine, cpu, tid, tid); + if (err) + return err; + + tidq->tid = tid; + thread__zput(tidq->thread); + + cs_etm__set_pid_tid_cpu(etm, tidq); + return 0; +} + static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) @@ -1866,7 +1886,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, continue;
if ((tid == -1) || (tidq->tid == tid)) { - cs_etm__set_pid_tid_cpu(etm, queue, tidq); + cs_etm__set_pid_tid_cpu(etm, tidq); cs_etm__run_decoder(etmq); } } diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index f16082d37ab5..b2a7628620bf 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -181,6 +181,8 @@ struct cs_etm_packet_queue { int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); +int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, + pid_t tid, u8 trace_chan_id); struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -197,6 +199,14 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, return -1; }
+static inline int cs_etm__etmq_set_tid( + struct cs_etm_queue *etmq __maybe_unused, + pid_t tid __maybe_unused, + u8 trace_chan_id __maybe_unused) +{ + return -1; +} + static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused)
This patch deals with timestamp packets received from the decoding library in order to give the front end packet processing loop a handle on the time instruction conveyed by range packets have been executed at.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 112 +++++++++++++++++- tools/perf/util/cs-etm.c | 19 +++ tools/perf/util/cs-etm.h | 17 +++ 3 files changed, 144 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index ce85e52f989c..33e975c8d11b 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -269,6 +269,76 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); }
+static ocsd_datapath_resp_t +cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq, + struct cs_etm_packet_queue *packet_queue, + const uint8_t trace_chan_id) +{ + /* No timestamp packet has been received, nothing to do */ + if (!packet_queue->timestamp) + return OCSD_RESP_CONT; + + packet_queue->timestamp = packet_queue->next_timestamp; + + /* Estimate the timestamp for the next range packet */ + packet_queue->next_timestamp += packet_queue->instr_count; + packet_queue->instr_count = 0; + + /* Tell the front end which traceid_queue needs attention */ + cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id); + + return OCSD_RESP_WAIT; +} + +static ocsd_datapath_resp_t +cs_etm_decoder__do_hard_timestamp(struct cs_etm_queue *etmq, + const ocsd_generic_trace_elem *elem, + const uint8_t trace_chan_id) +{ + struct cs_etm_packet_queue *packet_queue; + + /* First get the packet queue for this traceID */ + packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id); + if (!packet_queue) + return OCSD_RESP_FATAL_SYS_ERR; + + /* + * We've seen a timestamp packet before - simply record the new value. + * Function do_soft_timestamp() will report the value to the front end, + * hence asking the decoder to keep decoding rather than stopping. + */ + if (packet_queue->timestamp) { + packet_queue->next_timestamp = elem->timestamp; + return OCSD_RESP_CONT; + } + + /* + * This is the first timestamp we've seen since the beginning of traces + * or a discontinuity. Since timestamps packets are generated *after* + * range packets have been generated, we need to estimate the time at + * which instructions started by substracting the number of instructions + * executed to the timestamp. + */ + packet_queue->timestamp = elem->timestamp - + packet_queue->instr_count; + packet_queue->next_timestamp = elem->timestamp; + packet_queue->instr_count = 0; + + /* Tell the front end which traceid_queue needs attention */ + cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id); + + /* Halt processing until we are being told to proceed */ + return OCSD_RESP_WAIT; +} + +static void +cs_etm_decoder__reset_timestamp(struct cs_etm_packet_queue *packet_queue) +{ + packet_queue->timestamp = 0; + packet_queue->next_timestamp = 0; + packet_queue->instr_count = 0; +} + static ocsd_datapath_resp_t cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, const u8 trace_chan_id, @@ -310,7 +380,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, +cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq, + struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { @@ -365,6 +436,23 @@ cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue,
packet->last_instr_size = elem->last_instr_sz;
+ /* per-thread scenario, no need to generate a timestamp */ + if (cs_etm__etmq_is_timeless(etmq)) + goto out; + + /* + * The packet queue is full and we haven't seen a timestamp (had we + * seen one the packet queue wouldn't be full). Let the front end + * deal with it. + */ + if (ret == OCSD_RESP_WAIT) + goto out; + + packet_queue->instr_count += elem->num_instr_range; + /* Tell the front end we have a new timestamp to process */ + ret = cs_etm_decoder__do_soft_timestamp(etmq, packet_queue, + trace_chan_id); +out: return ret; }
@@ -372,6 +460,11 @@ static ocsd_datapath_resp_t cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) { + /* + * Something happened and who knows when we'll get new traces so + * reset time statistics. + */ + cs_etm_decoder__reset_timestamp(queue); return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY); } @@ -404,6 +497,7 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue,
static ocsd_datapath_resp_t cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, + struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { @@ -417,6 +511,12 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id)) return OCSD_RESP_FATAL_SYS_ERR;
+ /* + * A timestamp is generated after a PE_CONTEXT element so make sure + * to rely on that coming one. + */ + cs_etm_decoder__reset_timestamp(packet_queue); + return OCSD_RESP_CONT; }
@@ -446,7 +546,7 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_INSTR_RANGE: - resp = cs_etm_decoder__buffer_range(packet_queue, elem, + resp = cs_etm_decoder__buffer_range(etmq, packet_queue, elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_EXCEPTION: @@ -457,11 +557,15 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( resp = cs_etm_decoder__buffer_exception_ret(packet_queue, trace_chan_id); break; + case OCSD_GEN_TRC_ELEM_TIMESTAMP: + resp = cs_etm_decoder__do_hard_timestamp(etmq, elem, + trace_chan_id); + break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT: - resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id); + resp = cs_etm_decoder__set_tid(etmq, packet_queue, + elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_ADDR_NACC: - case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN: case OCSD_GEN_TRC_ELEM_EVENT: diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 17adf554b679..91496a3a2209 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -80,6 +80,7 @@ struct cs_etm_queue { struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr; + u8 pending_timestamp; u64 offset; const unsigned char *buf; size_t buf_len, buf_used; @@ -133,6 +134,19 @@ int cs_etm__get_cpu(u8 trace_chan_id, int *cpu) return 0; }
+void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, + u8 trace_chan_id) +{ + /* + * Wnen a timestamp packet is encountered the backend code + * is stopped so that the front end has time to process packets + * that were accumulated in the traceID queue. Since there can + * be more than one channel per cs_etm_queue, we need to specify + * what traceID queue needs servicing. + */ + etmq->pending_timestamp = trace_chan_id; +} + static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) { int i; @@ -942,6 +956,11 @@ int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, return 0; }
+bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq) +{ + return !!etmq->etm->timeless_decoding; +} + static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index b2a7628620bf..33b57e748c3d 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -150,6 +150,9 @@ struct cs_etm_packet_queue { u32 packet_count; u32 head; u32 tail; + u32 instr_count; + u64 timestamp; + u64 next_timestamp; struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER]; };
@@ -183,6 +186,9 @@ int cs_etm__process_auxtrace_info(union perf_event *event, int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, pid_t tid, u8 trace_chan_id); +bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq); +void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, + u8 trace_chan_id); struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -207,6 +213,17 @@ static inline int cs_etm__etmq_set_tid( return -1; }
+static inline bool cs_etm__etmq_is_timeless( + struct cs_etm_queue *etmq __maybe_unused) +{ + /* What else to return? */ + return true; +} + +static inline void cs_etm__etmq_set_traceid_queue_timestamp( + struct cs_etm_queue *etmq __maybe_unused, + u8 trace_chan_id __maybe_unused) {} + static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused)
Em Fri, May 24, 2019 at 11:35:07AM -0600, Mathieu Poirier escreveu:
This patch deals with timestamp packets received from the decoding library in order to give the front end packet processing loop a handle on the time instruction conveyed by range packets have been executed at.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 112 +++++++++++++++++- tools/perf/util/cs-etm.c | 19 +++ tools/perf/util/cs-etm.h | 17 +++ 3 files changed, 144 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index ce85e52f989c..33e975c8d11b 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -269,6 +269,76 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); } +static ocsd_datapath_resp_t +cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq,
struct cs_etm_packet_queue *packet_queue,
const uint8_t trace_chan_id)
+{
- /* No timestamp packet has been received, nothing to do */
- if (!packet_queue->timestamp)
return OCSD_RESP_CONT;
- packet_queue->timestamp = packet_queue->next_timestamp;
- /* Estimate the timestamp for the next range packet */
- packet_queue->next_timestamp += packet_queue->instr_count;
- packet_queue->instr_count = 0;
- /* Tell the front end which traceid_queue needs attention */
- cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
- return OCSD_RESP_WAIT;
+}
+static ocsd_datapath_resp_t +cs_etm_decoder__do_hard_timestamp(struct cs_etm_queue *etmq,
const ocsd_generic_trace_elem *elem,
const uint8_t trace_chan_id)
+{
- struct cs_etm_packet_queue *packet_queue;
- /* First get the packet queue for this traceID */
- packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id);
- if (!packet_queue)
return OCSD_RESP_FATAL_SYS_ERR;
- /*
* We've seen a timestamp packet before - simply record the new value.
* Function do_soft_timestamp() will report the value to the front end,
* hence asking the decoder to keep decoding rather than stopping.
*/
- if (packet_queue->timestamp) {
packet_queue->next_timestamp = elem->timestamp;
return OCSD_RESP_CONT;
- }
- /*
* This is the first timestamp we've seen since the beginning of traces
* or a discontinuity. Since timestamps packets are generated *after*
* range packets have been generated, we need to estimate the time at
* which instructions started by substracting the number of instructions
* executed to the timestamp.
*/
- packet_queue->timestamp = elem->timestamp -
packet_queue->instr_count;
No need to break lines like that, in this case it even wouldn't pass the width used for the comments right above it :-)
I'm fixing it up this time.
Something else, all the patches in this series, so far, needed to have as the subject prefix "perf cs-etm: ...", not the generic one "perf tools: ...". I'm fixing it up as well, no need to resend.
- Arnaldo
- packet_queue->next_timestamp = elem->timestamp;
- packet_queue->instr_count = 0;
- /* Tell the front end which traceid_queue needs attention */
- cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
- /* Halt processing until we are being told to proceed */
- return OCSD_RESP_WAIT;
+}
+static void +cs_etm_decoder__reset_timestamp(struct cs_etm_packet_queue *packet_queue) +{
- packet_queue->timestamp = 0;
- packet_queue->next_timestamp = 0;
- packet_queue->instr_count = 0;
+}
static ocsd_datapath_resp_t cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, const u8 trace_chan_id, @@ -310,7 +380,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, } static ocsd_datapath_resp_t -cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, +cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq,
struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id)
{ @@ -365,6 +436,23 @@ cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, packet->last_instr_size = elem->last_instr_sz;
- /* per-thread scenario, no need to generate a timestamp */
- if (cs_etm__etmq_is_timeless(etmq))
goto out;
- /*
* The packet queue is full and we haven't seen a timestamp (had we
* seen one the packet queue wouldn't be full). Let the front end
* deal with it.
*/
- if (ret == OCSD_RESP_WAIT)
goto out;
- packet_queue->instr_count += elem->num_instr_range;
- /* Tell the front end we have a new timestamp to process */
- ret = cs_etm_decoder__do_soft_timestamp(etmq, packet_queue,
trace_chan_id);
+out: return ret; } @@ -372,6 +460,11 @@ static ocsd_datapath_resp_t cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) {
- /*
* Something happened and who knows when we'll get new traces so
* reset time statistics.
*/
- cs_etm_decoder__reset_timestamp(queue); return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY);
} @@ -404,6 +497,7 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue, static ocsd_datapath_resp_t cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id)
{ @@ -417,6 +511,12 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id)) return OCSD_RESP_FATAL_SYS_ERR;
- /*
* A timestamp is generated after a PE_CONTEXT element so make sure
* to rely on that coming one.
*/
- cs_etm_decoder__reset_timestamp(packet_queue);
- return OCSD_RESP_CONT;
} @@ -446,7 +546,7 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
resp = cs_etm_decoder__buffer_range(packet_queue, elem,
break; case OCSD_GEN_TRC_ELEM_EXCEPTION:resp = cs_etm_decoder__buffer_range(etmq, packet_queue, elem, trace_chan_id);
@@ -457,11 +557,15 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( resp = cs_etm_decoder__buffer_exception_ret(packet_queue, trace_chan_id); break;
- case OCSD_GEN_TRC_ELEM_TIMESTAMP:
resp = cs_etm_decoder__do_hard_timestamp(etmq, elem,
trace_chan_id);
case OCSD_GEN_TRC_ELEM_PE_CONTEXT:break;
resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id);
resp = cs_etm_decoder__set_tid(etmq, packet_queue,
break; case OCSD_GEN_TRC_ELEM_ADDR_NACC:elem, trace_chan_id);
- case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN: case OCSD_GEN_TRC_ELEM_EVENT:
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 17adf554b679..91496a3a2209 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -80,6 +80,7 @@ struct cs_etm_queue { struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr;
- u8 pending_timestamp; u64 offset; const unsigned char *buf; size_t buf_len, buf_used;
@@ -133,6 +134,19 @@ int cs_etm__get_cpu(u8 trace_chan_id, int *cpu) return 0; } +void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
u8 trace_chan_id)
+{
- /*
* Wnen a timestamp packet is encountered the backend code
* is stopped so that the front end has time to process packets
* that were accumulated in the traceID queue. Since there can
* be more than one channel per cs_etm_queue, we need to specify
* what traceID queue needs servicing.
*/
- etmq->pending_timestamp = trace_chan_id;
+}
static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) { int i; @@ -942,6 +956,11 @@ int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, return 0; } +bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq) +{
- return !!etmq->etm->timeless_decoding;
+}
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index b2a7628620bf..33b57e748c3d 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -150,6 +150,9 @@ struct cs_etm_packet_queue { u32 packet_count; u32 head; u32 tail;
- u32 instr_count;
- u64 timestamp;
- u64 next_timestamp; struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER];
}; @@ -183,6 +186,9 @@ int cs_etm__process_auxtrace_info(union perf_event *event, int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, pid_t tid, u8 trace_chan_id); +bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq); +void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
u8 trace_chan_id);
struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -207,6 +213,17 @@ static inline int cs_etm__etmq_set_tid( return -1; } +static inline bool cs_etm__etmq_is_timeless(
struct cs_etm_queue *etmq __maybe_unused)
+{
- /* What else to return? */
- return true;
+}
+static inline void cs_etm__etmq_set_traceid_queue_timestamp(
struct cs_etm_queue *etmq __maybe_unused,
u8 trace_chan_id __maybe_unused) {}
static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused) -- 2.17.1
On Thu, 6 Jun 2019 at 12:50, Arnaldo Carvalho de Melo arnaldo.melo@gmail.com wrote:
Em Fri, May 24, 2019 at 11:35:07AM -0600, Mathieu Poirier escreveu:
This patch deals with timestamp packets received from the decoding library in order to give the front end packet processing loop a handle on the time instruction conveyed by range packets have been executed at.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 112 +++++++++++++++++- tools/perf/util/cs-etm.c | 19 +++ tools/perf/util/cs-etm.h | 17 +++ 3 files changed, 144 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index ce85e52f989c..33e975c8d11b 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -269,6 +269,76 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); }
+static ocsd_datapath_resp_t +cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq,
struct cs_etm_packet_queue *packet_queue,
const uint8_t trace_chan_id)
+{
/* No timestamp packet has been received, nothing to do */
if (!packet_queue->timestamp)
return OCSD_RESP_CONT;
packet_queue->timestamp = packet_queue->next_timestamp;
/* Estimate the timestamp for the next range packet */
packet_queue->next_timestamp += packet_queue->instr_count;
packet_queue->instr_count = 0;
/* Tell the front end which traceid_queue needs attention */
cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
return OCSD_RESP_WAIT;
+}
+static ocsd_datapath_resp_t +cs_etm_decoder__do_hard_timestamp(struct cs_etm_queue *etmq,
const ocsd_generic_trace_elem *elem,
const uint8_t trace_chan_id)
+{
struct cs_etm_packet_queue *packet_queue;
/* First get the packet queue for this traceID */
packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id);
if (!packet_queue)
return OCSD_RESP_FATAL_SYS_ERR;
/*
* We've seen a timestamp packet before - simply record the new value.
* Function do_soft_timestamp() will report the value to the front end,
* hence asking the decoder to keep decoding rather than stopping.
*/
if (packet_queue->timestamp) {
packet_queue->next_timestamp = elem->timestamp;
return OCSD_RESP_CONT;
}
/*
* This is the first timestamp we've seen since the beginning of traces
* or a discontinuity. Since timestamps packets are generated *after*
* range packets have been generated, we need to estimate the time at
* which instructions started by substracting the number of instructions
* executed to the timestamp.
*/
packet_queue->timestamp = elem->timestamp -
packet_queue->instr_count;
No need to break lines like that, in this case it even wouldn't pass the width used for the comments right above it :-)
I'm fixing it up this time.
Something else, all the patches in this series, so far, needed to have as the subject prefix "perf cs-etm: ...", not the generic one "perf tools: ...". I'm fixing it up as well, no need to resend.
Got that - thanks Mathieu
- Arnaldo
packet_queue->next_timestamp = elem->timestamp;
packet_queue->instr_count = 0;
/* Tell the front end which traceid_queue needs attention */
cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id);
/* Halt processing until we are being told to proceed */
return OCSD_RESP_WAIT;
+}
+static void +cs_etm_decoder__reset_timestamp(struct cs_etm_packet_queue *packet_queue) +{
packet_queue->timestamp = 0;
packet_queue->next_timestamp = 0;
packet_queue->instr_count = 0;
+}
static ocsd_datapath_resp_t cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, const u8 trace_chan_id, @@ -310,7 +380,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, +cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq,
struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id)
{ @@ -365,6 +436,23 @@ cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue,
packet->last_instr_size = elem->last_instr_sz;
/* per-thread scenario, no need to generate a timestamp */
if (cs_etm__etmq_is_timeless(etmq))
goto out;
/*
* The packet queue is full and we haven't seen a timestamp (had we
* seen one the packet queue wouldn't be full). Let the front end
* deal with it.
*/
if (ret == OCSD_RESP_WAIT)
goto out;
packet_queue->instr_count += elem->num_instr_range;
/* Tell the front end we have a new timestamp to process */
ret = cs_etm_decoder__do_soft_timestamp(etmq, packet_queue,
trace_chan_id);
+out: return ret; }
@@ -372,6 +460,11 @@ static ocsd_datapath_resp_t cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) {
/*
* Something happened and who knows when we'll get new traces so
* reset time statistics.
*/
cs_etm_decoder__reset_timestamp(queue); return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY);
} @@ -404,6 +497,7 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue,
static ocsd_datapath_resp_t cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id)
{ @@ -417,6 +511,12 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id)) return OCSD_RESP_FATAL_SYS_ERR;
/*
* A timestamp is generated after a PE_CONTEXT element so make sure
* to rely on that coming one.
*/
cs_etm_decoder__reset_timestamp(packet_queue);
return OCSD_RESP_CONT;
}
@@ -446,7 +546,7 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
resp = cs_etm_decoder__buffer_range(packet_queue, elem,
resp = cs_etm_decoder__buffer_range(etmq, packet_queue, elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_EXCEPTION:
@@ -457,11 +557,15 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( resp = cs_etm_decoder__buffer_exception_ret(packet_queue, trace_chan_id); break;
case OCSD_GEN_TRC_ELEM_TIMESTAMP:
resp = cs_etm_decoder__do_hard_timestamp(etmq, elem,
trace_chan_id);
break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id);
resp = cs_etm_decoder__set_tid(etmq, packet_queue,
elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_ADDR_NACC:
case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN: case OCSD_GEN_TRC_ELEM_EVENT:
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 17adf554b679..91496a3a2209 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -80,6 +80,7 @@ struct cs_etm_queue { struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr;
u8 pending_timestamp; u64 offset; const unsigned char *buf; size_t buf_len, buf_used;
@@ -133,6 +134,19 @@ int cs_etm__get_cpu(u8 trace_chan_id, int *cpu) return 0; }
+void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
u8 trace_chan_id)
+{
/*
* Wnen a timestamp packet is encountered the backend code
* is stopped so that the front end has time to process packets
* that were accumulated in the traceID queue. Since there can
* be more than one channel per cs_etm_queue, we need to specify
* what traceID queue needs servicing.
*/
etmq->pending_timestamp = trace_chan_id;
+}
static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) { int i; @@ -942,6 +956,11 @@ int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, return 0; }
+bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq) +{
return !!etmq->etm->timeless_decoding;
+}
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index b2a7628620bf..33b57e748c3d 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -150,6 +150,9 @@ struct cs_etm_packet_queue { u32 packet_count; u32 head; u32 tail;
u32 instr_count;
u64 timestamp;
u64 next_timestamp; struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER];
};
@@ -183,6 +186,9 @@ int cs_etm__process_auxtrace_info(union perf_event *event, int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, pid_t tid, u8 trace_chan_id); +bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq); +void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
u8 trace_chan_id);
struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -207,6 +213,17 @@ static inline int cs_etm__etmq_set_tid( return -1; }
+static inline bool cs_etm__etmq_is_timeless(
struct cs_etm_queue *etmq __maybe_unused)
+{
/* What else to return? */
return true;
+}
+static inline void cs_etm__etmq_set_traceid_queue_timestamp(
struct cs_etm_queue *etmq __maybe_unused,
u8 trace_chan_id __maybe_unused) {}
static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused) -- 2.17.1
--
- Arnaldo
Add support for CPU-wide trace scenarios by correlating range packets with timestamp packets. That way range packets received on different ETMQ/traceID channels can be processed and synthesized in chronological order.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 254 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 246 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 91496a3a2209..0c7776b51045 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -90,12 +90,26 @@ struct cs_etm_queue { };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); +static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); +static int cs_etm__get_data_block(struct cs_etm_queue *etmq); +static int cs_etm__decode_data_block(struct cs_etm_queue *etmq);
/* PTMs ETMIDR [11:8] set to b0011 */ #define ETMIDR_PTM_VERSION 0x00000300
+/* + * A struct auxtrace_heap_item only has a queue_nr and a timestamp to + * work with. One option is to modify to auxtrace_heap_XYZ() API or simply + * encode the etm queue number as the upper 16 bit and the channel as + * the lower 16 bit. + */ +#define TO_CS_QUEUE_NR(queue_nr, trace_id_chan) \ + (queue_nr << 16 | trace_chan_id) +#define TO_QUEUE_NR(cs_queue_nr) (cs_queue_nr >> 16) +#define TO_TRACE_CHAN_ID(cs_queue_nr) (cs_queue_nr & 0x0000ffff) + static u32 cs_etm__get_v7_protocol_version(u32 etmidr) { etmidr &= ETMIDR_PTM_VERSION; @@ -147,6 +161,29 @@ void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, etmq->pending_timestamp = trace_chan_id; }
+static u64 cs_etm__etmq_get_timestamp(struct cs_etm_queue *etmq, + u8 *trace_chan_id) +{ + struct cs_etm_packet_queue *packet_queue; + + if (!etmq->pending_timestamp) + return 0; + + if (trace_chan_id) + *trace_chan_id = etmq->pending_timestamp; + + packet_queue = cs_etm__etmq_get_packet_queue(etmq, + etmq->pending_timestamp); + if (!packet_queue) + return 0; + + /* Acknowledge pending status */ + etmq->pending_timestamp = 0; + + /* See function cs_etm_decoder__do_{hard|soft}_timestamp() */ + return packet_queue->timestamp; +} + static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) { int i; @@ -171,6 +208,20 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) } }
+static void cs_etm__clear_all_packet_queues(struct cs_etm_queue *etmq) +{ + int idx; + struct int_node *inode; + struct cs_etm_traceid_queue *tidq; + struct intlist *traceid_queues_list = etmq->traceid_queues_list; + + intlist__for_each_entry(inode, traceid_queues_list) { + idx = (int)(intptr_t)inode->priv; + tidq = etmq->traceid_queues[idx]; + cs_etm__clear_packet_queue(&tidq->packet_queue); + } +} + static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u8 trace_chan_id) @@ -458,15 +509,15 @@ static int cs_etm__flush_events(struct perf_session *session, if (!tool->ordered_events) return -EINVAL;
- if (!etm->timeless_decoding) - return -EINVAL; - ret = cs_etm__update_queues(etm);
if (ret < 0) return ret;
- return cs_etm__process_timeless_queues(etm, -1); + if (etm->timeless_decoding) + return cs_etm__process_timeless_queues(etm, -1); + + return cs_etm__process_queues(etm); }
static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq) @@ -685,6 +736,9 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, unsigned int queue_nr) { int ret = 0; + unsigned int cs_queue_nr; + u8 trace_chan_id; + u64 timestamp; struct cs_etm_queue *etmq = queue->priv;
if (list_empty(&queue->head) || etmq) @@ -702,6 +756,67 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->queue_nr = queue_nr; etmq->offset = 0;
+ if (etm->timeless_decoding) + goto out; + + /* + * We are under a CPU-wide trace scenario. As such we need to know + * when the code that generated the traces started to execute so that + * it can be correlated with execution on other CPUs. So we get a + * handle on the beginning of traces and decode until we find a + * timestamp. The timestamp is then added to the auxtrace min heap + * in order to know what nibble (of all the etmqs) to decode first. + */ + while (1) { + /* + * Fetch an aux_buffer from this etmq. Bail if no more + * blocks or an error has been encountered. + */ + ret = cs_etm__get_data_block(etmq); + if (ret <= 0) + goto out; + + /* + * Run decoder on the trace block. The decoder will stop when + * encountering a timestamp, a full packet queue or the end of + * trace for that block. + */ + ret = cs_etm__decode_data_block(etmq); + if (ret) + goto out; + + /* + * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all + * the timestamp calculation for us. + */ + timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id); + + /* We found a timestamp, no need to continue. */ + if (timestamp) + break; + + /* + * We didn't find a timestamp so empty all the traceid packet + * queues before looking for another timestamp packet, either + * in the current data block or a new one. Packets that were + * just decoded are useless since no timestamp has been + * associated with them. As such simply discard them. + */ + cs_etm__clear_all_packet_queues(etmq); + } + + /* + * We have a timestamp. Add it to the min heap to reflect when + * instructions conveyed by the range packets of this traceID queue + * started to execute. Once the same has been done for all the traceID + * queues of each etmq, redenring and decoding can start in + * chronological order. + * + * Note that packets decoded above are still in the traceID's packet + * queue and will be processed in cs_etm__process_queues(). + */ + cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_id_chan); + ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp); out: return ret; } @@ -1846,6 +1961,28 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, return ret; }
+static void cs_etm__clear_all_traceid_queues(struct cs_etm_queue *etmq) +{ + int idx; + struct int_node *inode; + struct cs_etm_traceid_queue *tidq; + struct intlist *traceid_queues_list = etmq->traceid_queues_list; + + intlist__for_each_entry(inode, traceid_queues_list) { + idx = (int)(intptr_t)inode->priv; + tidq = etmq->traceid_queues[idx]; + + /* Ignore return value */ + cs_etm__process_traceid_queue(etmq, tidq); + + /* + * Generate an instruction sample with the remaining + * branchstack entries. + */ + cs_etm__flush(etmq, tidq); + } +} + static int cs_etm__run_decoder(struct cs_etm_queue *etmq) { int err = 0; @@ -1913,6 +2050,105 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, return 0; }
+static int cs_etm__process_queues(struct cs_etm_auxtrace *etm) +{ + int ret = 0; + unsigned int cs_queue_nr, queue_nr; + u8 trace_chan_id; + u64 timestamp; + struct auxtrace_queue *queue; + struct cs_etm_queue *etmq; + struct cs_etm_traceid_queue *tidq; + + while (1) { + if (!etm->heap.heap_cnt) + goto out; + + /* Take the entry at the top of the min heap */ + cs_queue_nr = etm->heap.heap_array[0].queue_nr; + queue_nr = TO_QUEUE_NR(cs_queue_nr); + trace_chan_id = TO_TRACE_CHAN_ID(cs_queue_nr); + queue = &etm->queues.queue_array[queue_nr]; + etmq = queue->priv; + + /* + * Remove the top entry from the heap since we are about + * to process it. + */ + auxtrace_heap__pop(&etm->heap); + + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (!tidq) { + /* + * No traceID queue has been allocated for this traceID, + * which means something somewhere went very wrong. No + * other choice than simply exit. + */ + ret = -EINVAL; + goto out; + } + + /* + * Packets associated with this timestamp are already in + * the etmq's traceID queue, so process them. + */ + ret = cs_etm__process_traceid_queue(etmq, tidq); + if (ret < 0) + goto out; + + /* + * Packets for this timestamp have been processed, time to + * move on to the next timestamp, fetching a new auxtrace_buffer + * if need be. + */ +refetch: + ret = cs_etm__get_data_block(etmq); + if (ret < 0) + goto out; + + /* + * No more auxtrace_buffers to process in this etmq, simply + * move on to another entry in the auxtrace_heap. + */ + if (!ret) + continue; + + ret = cs_etm__decode_data_block(etmq); + if (ret) + goto out; + + timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id); + + if (!timestamp) { + /* + * Function cs_etm__decode_data_block() returns when + * there is no more traces to decode in the current + * auxtrace_buffer OR when a timestamp has been + * encountered on any of the traceID queues. Since we + * did not get a timestamp, there is no more traces to + * process in this auxtrace_buffer. As such empty and + * flush all traceID queues. + */ + cs_etm__clear_all_traceid_queues(etmq); + + /* Fetch another auxtrace_buffer for this etmq */ + goto refetch; + } + + /* + * Add to the min heap the timestamp for packets that have + * just been decoded. They will be processed and synthesized + * during the next call to cs_etm__process_traceid_queue() for + * this queue/traceID. + */ + cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_chan_id); + ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp); + } + +out: + return ret; +} + static int cs_etm__process_itrace_start(struct cs_etm_auxtrace *etm, union perf_event *event) { @@ -1991,9 +2227,6 @@ static int cs_etm__process_event(struct perf_session *session, return -EINVAL; }
- if (!etm->timeless_decoding) - return -EINVAL; - if (sample->time && (sample->time != (u64) -1)) timestamp = sample->time; else @@ -2005,7 +2238,8 @@ static int cs_etm__process_event(struct perf_session *session, return err; }
- if (event->header.type == PERF_RECORD_EXIT) + if (etm->timeless_decoding && + event->header.type == PERF_RECORD_EXIT) return cs_etm__process_timeless_queues(etm, event->fork.tid);
@@ -2014,6 +2248,10 @@ static int cs_etm__process_event(struct perf_session *session, else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);
+ if (!etm->timeless_decoding && + event->header.type == PERF_RECORD_AUX) + return cs_etm__process_queues(etm); + return 0; }
On Fri, May 24, 2019 at 11:34:51AM -0600, Mathieu Poirier wrote:
This patchset adds support for CoreSight CPU-wide trace scenarios. More specifically it extends the work that was done for per thread scenarios to handle more than a single trace ID. It also temporally correlate traces based on timestamp generated by the tracers so that rendering by the perf mechanic is ordered.
Everything is based on Arnaldo's perf/core branch (46d4c9a05285). I will send another revision when it is rebased to a 5.2 rc candidate.
Before this set: # root@juno:/home/linaro# perf record -e cs_etm/@20070000.etr/ -C 2,3 sleep 1 failed to mmap with 12 (Cannot allocate memory)
After this set: # root@juno:/home/linaro# perf record -e cs_etm/@20070000.etr/ -C 2,3 sleep 1 [ perf record: Captured and wrote 1.352 MB perf.data ]
I have tested this patch set on Juno and DB410c boards, FWIW:
Tested-by: Leo Yan leo.yan@linaro.org
Regards, Mathieu
Changes for V2:
- Fixed error condition in function cs_etm_set_option() (Leo)
- Fixed changelog spelling error (Leo).
- Moved from calloc() to malloc() in cs_etm__etmq_get_traceid_queue()
- Got rid of CS_ETM_PACKET_QUEUE_NR macro
- Fixed indentation problem in function cs_etm__process_traceid_queue() (Leo).
Mathieu Poirier (17): perf tools: Configure contextID tracing in CPU-wide mode perf tools: Configure timestsamp generation in CPU-wide mode perf tools: Configure SWITCH_EVENTS in CPU-wide mode perf tools: Add handling of itrace start events perf tools: Add handling of switch-CPU-wide events perf tools: Refactor error path in cs_etm_decoder__new() perf tools: Move packet queue out of decoder structure perf tools: Fix indentation in function cs_etm__process_decoder_queue() perf tools: Introduce the concept of trace ID queues perf tools: Get rid of unused cpu in struct cs_etm_queue perf tools: Move thread to traceid_queue perf tools: Move tid/pid to traceid_queue perf tools: Use traceID aware memory callback API perf tools: Add support for multiple traceID queues perf tools: Linking PE contextID with perf thread mechanic perf tools: Add notion of time to decoding code perf tools: Add support for CPU-wide trace scenarios
tools/perf/Makefile.config | 3 + tools/perf/arch/arm/util/cs-etm.c | 186 ++- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 269 +++-- .../perf/util/cs-etm-decoder/cs-etm-decoder.h | 39 +- tools/perf/util/cs-etm.c | 1026 +++++++++++++---- tools/perf/util/cs-etm.h | 103 ++ 6 files changed, 1252 insertions(+), 374 deletions(-)
-- 2.17.1