This patchset complements the kernel portion of the solution by adding the capability to decode and render traces based on the time they were executed.
Most of the changes are related to decoding traces with multiple traceIDs, something that is mandatory when dealing with N:1 source/sink topologies. Both kernel and user space patches have been rebased on yesterday's linux-next and are available here[1].
Note that compiling the perf tools on both target and host platforms in the 5.1 cycle requires the addition of "CORESIGHT=1", i.e:
$ make -C tools/perf CORESIGHT=1 VF=1
Moreover this set uses a new decoder callback that was introduced in version 0.11.0 of the openCSD library. Testing with a different version of the library than the one installed on a system can be done by using the CSINCLUDES and CSLIBS environment variables, as described in the library's documentation[2]. As with the kernel portion, I'm only sending to the coresight mailing list because of the merge window.
Review and comments would be most appreciated.
Thanks, Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/log/?h=next-2019... [2]. https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md
Mathieu Poirier (17): perf tools: Configure contextID tracing in CPU-wide mode perf tools: Configure timestsamp generation in CPU-wide mode perf tools: Configure SWITCH_EVENTS in CPU-wide mode perf tools: Add handling of itrace start events perf tools: Add handling of switch-CPU-wide events perf tools: Refactor error path in cs_etm_decoder__new() perf tools: Move packet queue out of decoder structure perf tools: Introduce the concept of trace ID queues perf tools: Get rid of unused cpu in struct cs_etm_queue perf tools: Move thread to traceid_queue perf tools: Move tid/pid to traceid_queue perf tools: Mandate using openCSD v0.11.0 perf tools: Use traceID aware memory callback API perf tools: Add support for multiple traceID queues perf tools: Linking PE contextID with perf thread mechanic perf tools: Add notion of time to decoding code perf tools: Add support for CPU-wide trace scenarios
tools/build/feature/test-libopencsd.c | 4 +- tools/perf/arch/arm/util/cs-etm.c | 186 +++- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 270 +++-- .../perf/util/cs-etm-decoder/cs-etm-decoder.h | 39 +- tools/perf/util/cs-etm.c | 939 ++++++++++++++---- tools/perf/util/cs-etm.h | 104 ++ 6 files changed, 1208 insertions(+), 334 deletions(-)
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/arch/arm/util/cs-etm.c | 126 +++++++++++++++++++++++++----- tools/perf/util/cs-etm.h | 12 +++ 2 files changed, 119 insertions(+), 19 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 911426721170..1fc597532c78 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -35,8 +35,100 @@ struct cs_etm_recording { size_t snapshot_size; };
+static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = { + [CS_ETM_ETMCCER] = "mgmt/etmccer", + [CS_ETM_ETMIDR] = "mgmt/etmidr", +}; + +static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = { + [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0", + [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1", + [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2", + [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8", + [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus", +}; + static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu);
+static int cs_etm_set_context_id(struct auxtrace_record *itr, + struct perf_evsel *evsel, int cpu) +{ + struct cs_etm_recording *ptr; + struct perf_pmu *cs_etm_pmu; + char path[PATH_MAX]; + int err = -EINVAL; + u32 val; + + ptr = container_of(itr, struct cs_etm_recording, itr); + cs_etm_pmu = ptr->cs_etm_pmu; + + if (!cs_etm_is_etmv4(itr, cpu)) + goto out; + + /* Get a handle on TRCIRD2 */ + snprintf(path, PATH_MAX, "cpu%d/%s", + cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]); + err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val); + + /* There was a problem reading the file, bailing out */ + if (err != 1) { + pr_err("%s: can't read file %s\n", + CORESIGHT_ETM_PMU_NAME, path); + goto out; + } + + /* + * TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID tracing + * is supported: + * 0b00000 Context ID tracing is not supported. + * 0b00100 Maximum of 32-bit Context ID size. + * All other values are reserved. + */ + val = BMVAL(val, 5, 9); + if (!val || val != 0x4) { + err = -EINVAL; + goto out; + } + + /* All good, let the kernel know */ + evsel->attr.config |= (1 << ETM_OPT_CTXTID); + err = 0; + +out: + + return err; +} + +static int cs_etm_set_option(struct auxtrace_record *itr, + struct perf_evsel *evsel, u32 option) +{ + int i, err = -EINVAL; + struct cpu_map *event_cpus = evsel->evlist->cpus; + struct cpu_map *online_cpus = cpu_map__new(NULL); + + /* Set option of each CPU we have */ + for (i = 0; i < cpu__max_cpu(); i++) { + if (!cpu_map__has(event_cpus, i) || + !cpu_map__has(online_cpus, i)) + continue; + + switch (option) { + case ETM_OPT_CTXTID: + err = cs_etm_set_context_id(itr, evsel, i); + if (err) + goto out; + break; + default: + break; + } + } + + err = 0; +out: + cpu_map__put(online_cpus); + return err; +} + static int cs_etm_parse_snapshot_options(struct auxtrace_record *itr, struct record_opts *opts, const char *str) @@ -105,8 +197,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, container_of(itr, struct cs_etm_recording, itr); struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu; struct perf_evsel *evsel, *cs_etm_evsel = NULL; - const struct cpu_map *cpus = evlist->cpus; + struct cpu_map *cpus = evlist->cpus; bool privileged = (geteuid() == 0 || perf_event_paranoid() < 0); + int err = 0;
ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode; @@ -241,19 +334,24 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
/* * In the case of per-cpu mmaps, we need the CPU on the - * AUX event. + * AUX event. We also need the contextID in order to be notified + * when a context switch happened. */ - if (!cpu_map__empty(cpus)) + if (!cpu_map__empty(cpus)) { perf_evsel__set_sample_bit(cs_etm_evsel, CPU);
+ err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID); + if (err) + goto out; + } + /* Add dummy event to keep tracking */ if (opts->full_auxtrace) { struct perf_evsel *tracking_evsel; - int err;
err = parse_events(evlist, "dummy:u", NULL); if (err) - return err; + goto out;
tracking_evsel = perf_evlist__last(evlist); perf_evlist__set_tracking_event(evlist, tracking_evsel); @@ -266,7 +364,8 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, perf_evsel__set_sample_bit(tracking_evsel, TIME); }
- return 0; +out: + return err; }
static u64 cs_etm_get_config(struct auxtrace_record *itr) @@ -314,6 +413,8 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) config_opts = cs_etm_get_config(itr); if (config_opts & BIT(ETM_OPT_CYCACC)) config |= BIT(ETM4_CFG_BIT_CYCACC); + if (config_opts & BIT(ETM_OPT_CTXTID)) + config |= BIT(ETM4_CFG_BIT_CTXTID); if (config_opts & BIT(ETM_OPT_TS)) config |= BIT(ETM4_CFG_BIT_TS); if (config_opts & BIT(ETM_OPT_RETSTK)) @@ -363,19 +464,6 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, (etmv3 * CS_ETMV3_PRIV_SIZE)); }
-static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = { - [CS_ETM_ETMCCER] = "mgmt/etmccer", - [CS_ETM_ETMIDR] = "mgmt/etmidr", -}; - -static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = { - [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0", - [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1", - [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2", - [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8", - [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus", -}; - static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu) { bool ret = false; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0e97c196147a..826c9eedaf5c 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024)
+/* + * Create a contiguous bitmask starting at bit position @l and ending at + * position @h. For example + * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. + * + * Carbon copy of implementation found in $KERNEL/include/linux/bitops.h + */ +#define GENMASK(h, l) \ + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) + +#define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb) + #define CS_ETM_HEADER_SIZE (CS_HEADER_VERSION_0_MAX * sizeof(u64))
#define __perf_cs_etmv3_magic 0x3030303030303030ULL
On Wed, Mar 06, 2019 at 03:59:26PM -0700, Mathieu Poirier wrote:
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
tools/perf/arch/arm/util/cs-etm.c | 126 +++++++++++++++++++++++++----- tools/perf/util/cs-etm.h | 12 +++ 2 files changed, 119 insertions(+), 19 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 911426721170..1fc597532c78 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -35,8 +35,100 @@ struct cs_etm_recording { size_t snapshot_size; }; +static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = {
- [CS_ETM_ETMCCER] = "mgmt/etmccer",
- [CS_ETM_ETMIDR] = "mgmt/etmidr",
+};
+static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = {
- [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0",
- [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1",
- [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2",
- [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8",
- [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus",
+};
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu); +static int cs_etm_set_context_id(struct auxtrace_record *itr,
struct perf_evsel *evsel, int cpu)
+{
- struct cs_etm_recording *ptr;
- struct perf_pmu *cs_etm_pmu;
- char path[PATH_MAX];
- int err = -EINVAL;
- u32 val;
- ptr = container_of(itr, struct cs_etm_recording, itr);
- cs_etm_pmu = ptr->cs_etm_pmu;
- if (!cs_etm_is_etmv4(itr, cpu))
goto out;
- /* Get a handle on TRCIRD2 */
- snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]);
- err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
- /* There was a problem reading the file, bailing out */
- if (err != 1) {
pr_err("%s: can't read file %s\n",
CORESIGHT_ETM_PMU_NAME, path);
goto out;
- }
- /*
* TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID tracing
* is supported:
* 0b00000 Context ID tracing is not supported.
* 0b00100 Maximum of 32-bit Context ID size.
* All other values are reserved.
*/
- val = BMVAL(val, 5, 9);
- if (!val || val != 0x4) {
err = -EINVAL;
goto out;
- }
- /* All good, let the kernel know */
- evsel->attr.config |= (1 << ETM_OPT_CTXTID);
- err = 0;
+out:
- return err;
+}
+static int cs_etm_set_option(struct auxtrace_record *itr,
struct perf_evsel *evsel, u32 option)
+{
- int i, err = -EINVAL;
- struct cpu_map *event_cpus = evsel->evlist->cpus;
- struct cpu_map *online_cpus = cpu_map__new(NULL);
- /* Set option of each CPU we have */
- for (i = 0; i < cpu__max_cpu(); i++) {
if (!cpu_map__has(event_cpus, i) ||
!cpu_map__has(online_cpus, i))
continue;
switch (option) {
case ETM_OPT_CTXTID:
err = cs_etm_set_context_id(itr, evsel, i);
if (err)
goto out;
break;
default:
break;
}
- }
- err = 0;
Nitpick: maybe we should remove the sentence 'err = 0' so that for any unknown option, we can resturn '-EINVAL'.
+out:
- cpu_map__put(online_cpus);
- return err;
+}
static int cs_etm_parse_snapshot_options(struct auxtrace_record *itr, struct record_opts *opts, const char *str) @@ -105,8 +197,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, container_of(itr, struct cs_etm_recording, itr); struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu; struct perf_evsel *evsel, *cs_etm_evsel = NULL;
- const struct cpu_map *cpus = evlist->cpus;
- struct cpu_map *cpus = evlist->cpus; bool privileged = (geteuid() == 0 || perf_event_paranoid() < 0);
- int err = 0;
ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode; @@ -241,19 +334,24 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, /* * In the case of per-cpu mmaps, we need the CPU on the
* AUX event.
* AUX event. We also need the contextID in order to be notified
*/* when a context switch happened.
- if (!cpu_map__empty(cpus))
- if (!cpu_map__empty(cpus)) { perf_evsel__set_sample_bit(cs_etm_evsel, CPU);
err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID);
if (err)
goto out;
- }
- /* Add dummy event to keep tracking */ if (opts->full_auxtrace) { struct perf_evsel *tracking_evsel;
int err;
err = parse_events(evlist, "dummy:u", NULL); if (err)
return err;
goto out;
tracking_evsel = perf_evlist__last(evlist); perf_evlist__set_tracking_event(evlist, tracking_evsel); @@ -266,7 +364,8 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, perf_evsel__set_sample_bit(tracking_evsel, TIME); }
- return 0;
+out:
- return err;
} static u64 cs_etm_get_config(struct auxtrace_record *itr) @@ -314,6 +413,8 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) config_opts = cs_etm_get_config(itr); if (config_opts & BIT(ETM_OPT_CYCACC)) config |= BIT(ETM4_CFG_BIT_CYCACC);
- if (config_opts & BIT(ETM_OPT_CTXTID))
if (config_opts & BIT(ETM_OPT_TS)) config |= BIT(ETM4_CFG_BIT_TS); if (config_opts & BIT(ETM_OPT_RETSTK))config |= BIT(ETM4_CFG_BIT_CTXTID);
@@ -363,19 +464,6 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, (etmv3 * CS_ETMV3_PRIV_SIZE)); } -static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = {
- [CS_ETM_ETMCCER] = "mgmt/etmccer",
- [CS_ETM_ETMIDR] = "mgmt/etmidr",
-};
-static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = {
- [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0",
- [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1",
- [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2",
- [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8",
- [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus",
-};
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu) { bool ret = false; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0e97c196147a..826c9eedaf5c 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024) +/*
- Create a contiguous bitmask starting at bit position @l and ending at
- position @h. For example
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- Carbon copy of implementation found in $KERNEL/include/linux/bitops.h
- */
+#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+#define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb)
#define CS_ETM_HEADER_SIZE (CS_HEADER_VERSION_0_MAX * sizeof(u64))
#define __perf_cs_etmv3_magic 0x3030303030303030ULL
2.17.1
On Wed, Apr 03, 2019 at 03:59:20PM +0800, Leo Yan wrote:
On Wed, Mar 06, 2019 at 03:59:26PM -0700, Mathieu Poirier wrote:
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
tools/perf/arch/arm/util/cs-etm.c | 126 +++++++++++++++++++++++++----- tools/perf/util/cs-etm.h | 12 +++ 2 files changed, 119 insertions(+), 19 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 911426721170..1fc597532c78 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -35,8 +35,100 @@ struct cs_etm_recording { size_t snapshot_size; }; +static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = {
- [CS_ETM_ETMCCER] = "mgmt/etmccer",
- [CS_ETM_ETMIDR] = "mgmt/etmidr",
+};
+static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = {
- [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0",
- [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1",
- [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2",
- [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8",
- [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus",
+};
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu); +static int cs_etm_set_context_id(struct auxtrace_record *itr,
struct perf_evsel *evsel, int cpu)
+{
- struct cs_etm_recording *ptr;
- struct perf_pmu *cs_etm_pmu;
- char path[PATH_MAX];
- int err = -EINVAL;
- u32 val;
- ptr = container_of(itr, struct cs_etm_recording, itr);
- cs_etm_pmu = ptr->cs_etm_pmu;
- if (!cs_etm_is_etmv4(itr, cpu))
goto out;
- /* Get a handle on TRCIRD2 */
- snprintf(path, PATH_MAX, "cpu%d/%s",
cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR2]);
- err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val);
- /* There was a problem reading the file, bailing out */
- if (err != 1) {
pr_err("%s: can't read file %s\n",
CORESIGHT_ETM_PMU_NAME, path);
goto out;
- }
- /*
* TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID tracing
* is supported:
* 0b00000 Context ID tracing is not supported.
* 0b00100 Maximum of 32-bit Context ID size.
* All other values are reserved.
*/
- val = BMVAL(val, 5, 9);
- if (!val || val != 0x4) {
err = -EINVAL;
goto out;
- }
- /* All good, let the kernel know */
- evsel->attr.config |= (1 << ETM_OPT_CTXTID);
- err = 0;
+out:
- return err;
+}
+static int cs_etm_set_option(struct auxtrace_record *itr,
struct perf_evsel *evsel, u32 option)
+{
- int i, err = -EINVAL;
- struct cpu_map *event_cpus = evsel->evlist->cpus;
- struct cpu_map *online_cpus = cpu_map__new(NULL);
- /* Set option of each CPU we have */
- for (i = 0; i < cpu__max_cpu(); i++) {
if (!cpu_map__has(event_cpus, i) ||
!cpu_map__has(online_cpus, i))
continue;
switch (option) {
case ETM_OPT_CTXTID:
err = cs_etm_set_context_id(itr, evsel, i);
if (err)
goto out;
break;
default:
break;
}
- }
- err = 0;
Nitpick: maybe we should remove the sentence 'err = 0' so that for any unknown option, we can resturn '-EINVAL'.
I think what is really missing here is a 'goto' statement in the default clause. Well spotted.
Mathieu
+out:
- cpu_map__put(online_cpus);
- return err;
+}
static int cs_etm_parse_snapshot_options(struct auxtrace_record *itr, struct record_opts *opts, const char *str) @@ -105,8 +197,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, container_of(itr, struct cs_etm_recording, itr); struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu; struct perf_evsel *evsel, *cs_etm_evsel = NULL;
- const struct cpu_map *cpus = evlist->cpus;
- struct cpu_map *cpus = evlist->cpus; bool privileged = (geteuid() == 0 || perf_event_paranoid() < 0);
- int err = 0;
ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode; @@ -241,19 +334,24 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, /* * In the case of per-cpu mmaps, we need the CPU on the
* AUX event.
* AUX event. We also need the contextID in order to be notified
*/* when a context switch happened.
- if (!cpu_map__empty(cpus))
- if (!cpu_map__empty(cpus)) { perf_evsel__set_sample_bit(cs_etm_evsel, CPU);
err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID);
if (err)
goto out;
- }
- /* Add dummy event to keep tracking */ if (opts->full_auxtrace) { struct perf_evsel *tracking_evsel;
int err;
err = parse_events(evlist, "dummy:u", NULL); if (err)
return err;
goto out;
tracking_evsel = perf_evlist__last(evlist); perf_evlist__set_tracking_event(evlist, tracking_evsel); @@ -266,7 +364,8 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, perf_evsel__set_sample_bit(tracking_evsel, TIME); }
- return 0;
+out:
- return err;
} static u64 cs_etm_get_config(struct auxtrace_record *itr) @@ -314,6 +413,8 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr) config_opts = cs_etm_get_config(itr); if (config_opts & BIT(ETM_OPT_CYCACC)) config |= BIT(ETM4_CFG_BIT_CYCACC);
- if (config_opts & BIT(ETM_OPT_CTXTID))
if (config_opts & BIT(ETM_OPT_TS)) config |= BIT(ETM4_CFG_BIT_TS); if (config_opts & BIT(ETM_OPT_RETSTK))config |= BIT(ETM4_CFG_BIT_CTXTID);
@@ -363,19 +464,6 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, (etmv3 * CS_ETMV3_PRIV_SIZE)); } -static const char *metadata_etmv3_ro[CS_ETM_PRIV_MAX] = {
- [CS_ETM_ETMCCER] = "mgmt/etmccer",
- [CS_ETM_ETMIDR] = "mgmt/etmidr",
-};
-static const char *metadata_etmv4_ro[CS_ETMV4_PRIV_MAX] = {
- [CS_ETMV4_TRCIDR0] = "trcidr/trcidr0",
- [CS_ETMV4_TRCIDR1] = "trcidr/trcidr1",
- [CS_ETMV4_TRCIDR2] = "trcidr/trcidr2",
- [CS_ETMV4_TRCIDR8] = "trcidr/trcidr8",
- [CS_ETMV4_TRCAUTHSTATUS] = "mgmt/trcauthstatus",
-};
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu) { bool ret = false; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0e97c196147a..826c9eedaf5c 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -103,6 +103,18 @@ struct intlist *traceid_list; #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024) +/*
- Create a contiguous bitmask starting at bit position @l and ending at
- position @h. For example
- GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- Carbon copy of implementation found in $KERNEL/include/linux/bitops.h
- */
+#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+#define BMVAL(val, lsb, msb) ((val & GENMASK(msb, lsb)) >> lsb)
#define CS_ETM_HEADER_SIZE (CS_HEADER_VERSION_0_MAX * sizeof(u64))
#define __perf_cs_etmv3_magic 0x3030303030303030ULL
2.17.1
When operating in CPU-wide mode tracers need to generate timestamps in order to correlate the code being traced on one CPU with what is executed on other CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/arch/arm/util/cs-etm.c | 57 +++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 1fc597532c78..59b189314422 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -99,6 +99,54 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr, return err; }
+static int cs_etm_set_timestamp(struct auxtrace_record *itr, + struct perf_evsel *evsel, int cpu) +{ + struct cs_etm_recording *ptr; + struct perf_pmu *cs_etm_pmu; + char path[PATH_MAX]; + int err = -EINVAL; + u32 val; + + ptr = container_of(itr, struct cs_etm_recording, itr); + cs_etm_pmu = ptr->cs_etm_pmu; + + if (!cs_etm_is_etmv4(itr, cpu)) + goto out; + + /* Get a handle on TRCIRD0 */ + snprintf(path, PATH_MAX, "cpu%d/%s", + cpu, metadata_etmv4_ro[CS_ETMV4_TRCIDR0]); + err = perf_pmu__scan_file(cs_etm_pmu, path, "%x", &val); + + /* There was a problem reading the file, bailing out */ + if (err != 1) { + pr_err("%s: can't read file %s\n", + CORESIGHT_ETM_PMU_NAME, path); + goto out; + } + + /* + * TRCIDR0.TSSIZE, bit [28-24], indicates whether global timestamping + * is supported: + * 0b00000 Global timestamping is not implemented + * 0b00110 Implementation supports a maximum timestamp of 48bits. + * 0b01000 Implementation supports a maximum timestamp of 64bits. + */ + val &= GENMASK(28, 24); + if (!val) { + err = -EINVAL; + goto out; + } + + /* All good, let the kernel know */ + evsel->attr.config |= (1 << ETM_OPT_TS); + err = 0; + +out: + return err; +} + static int cs_etm_set_option(struct auxtrace_record *itr, struct perf_evsel *evsel, u32 option) { @@ -118,6 +166,11 @@ static int cs_etm_set_option(struct auxtrace_record *itr, if (err) goto out; break; + case ETM_OPT_TS: + err = cs_etm_set_timestamp(itr, evsel, i); + if (err) + goto out; + break; default: break; } @@ -343,6 +396,10 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID); if (err) goto out; + + err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_TS); + if (err) + goto out; }
/* Add dummy event to keep tracking */
Ask the perf core to generate an event when processes are swapped in/out of context. That way proper action can be taken by the decoding code when faced with such event.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/arch/arm/util/cs-etm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 59b189314422..19f2049a20c1 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -257,6 +257,9 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, ptr->evlist = evlist; ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
+ if (perf_can_record_switch_events()) + opts->record_switch_events = true; + evlist__for_each_entry(evlist, evsel) { if (evsel->attr.type == cs_etm_pmu->type) { if (cs_etm_evsel) {
Add handling of ITRACE events in order to add the tid/pid of the executing process to the perf tools machine infrastructure. This information is later retrieved when a contextID packet is found in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 110804936fc3..c58a6b87a376 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1663,6 +1663,29 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, return 0; }
+static int cs_etm__process_itrace_start(struct cs_etm_auxtrace *etm, + union perf_event *event) +{ + struct thread *th; + + if (etm->timeless_decoding) + return 0; + + /* + * Add the tid/pid to the log so that we can get a match when + * we get a contextID from the decoder. + */ + th = machine__findnew_thread(etm->machine, + event->itrace_start.pid, + event->itrace_start.tid); + if (!th) + return -ENOMEM; + + thread__put(th); + + return 0; +} + static int cs_etm__process_event(struct perf_session *session, union perf_event *event, struct perf_sample *sample, @@ -1700,6 +1723,9 @@ static int cs_etm__process_event(struct perf_session *session, return cs_etm__process_timeless_queues(etm, event->fork.tid);
+ if (event->header.type == PERF_RECORD_ITRACE_START) + return cs_etm__process_itrace_start(etm, event); + return 0; }
Add handling of SWITCH-CPU-WIDE events in order to add the tid/pid of the incoming process to the perf tools machine infrastructure. This information is later retrieved when a contextID packet is found in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index c58a6b87a376..b6fb389adcc3 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1686,6 +1686,42 @@ static int cs_etm__process_itrace_start(struct cs_etm_auxtrace *etm, return 0; }
+static int cs_etm__process_switch_cpu_wide(struct cs_etm_auxtrace *etm, + union perf_event *event) +{ + struct thread *th; + bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT; + + /* + * Context switch in per-thread mode are irrelevant since perf + * will start/stop tracing as the process is scheduled. + */ + if (etm->timeless_decoding) + return 0; + + /* + * SWITCH_IN events carry the next process to be switched out while + * SWITCH_OUT events carry the process to be switched in. As such + * we don't care about IN events. + */ + if (!out) + return 0; + + /* + * Add the tid/pid to the log so that we can get a match when + * we get a contextID from the decoder. + */ + th = machine__findnew_thread(etm->machine, + event->context_switch.next_prev_pid, + event->context_switch.next_prev_tid); + if (!th) + return -ENOMEM; + + thread__put(th); + + return 0; +} + static int cs_etm__process_event(struct perf_session *session, union perf_event *event, struct perf_sample *sample, @@ -1725,6 +1761,8 @@ static int cs_etm__process_event(struct perf_session *session,
if (event->header.type == PERF_RECORD_ITRACE_START) return cs_etm__process_itrace_start(etm, event); + else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) + return cs_etm__process_switch_cpu_wide(etm, event);
return 0; }
There is no point in having two different error goto statement since the openCSD API to free a decoder handles NULL pointers. As such function cs_etm_decoder__free() can be called to deal with all aspect of freeing decoder memory.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index ba4c623cd8de..9838616f8d53 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -576,7 +576,7 @@ cs_etm_decoder__new(int num_cpu, struct cs_etm_decoder_params *d_params, /* init library print logging support */ ret = cs_etm_decoder__init_def_logger_printing(d_params, decoder); if (ret != 0) - goto err_free_decoder_tree; + goto err_free_decoder;
/* init raw frame logging if required */ cs_etm_decoder__init_raw_frame_logging(d_params, decoder); @@ -586,15 +586,13 @@ cs_etm_decoder__new(int num_cpu, struct cs_etm_decoder_params *d_params, &t_params[i], decoder); if (ret != 0) - goto err_free_decoder_tree; + goto err_free_decoder; }
return decoder;
-err_free_decoder_tree: - ocsd_destroy_dcd_tree(decoder->dcd_tree); err_free_decoder: - free(decoder); + cs_etm_decoder__free(decoder); return NULL; }
The decoder needs to work with more than one traceID queue if we want to support CPU-wide scenarios with N:1 source/sink topologies. As such move the packet buffer and related fields out of the decoder structure and into the cs_etm_queue structure.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 129 +++++++----------- .../perf/util/cs-etm-decoder/cs-etm-decoder.h | 36 +---- tools/perf/util/cs-etm.c | 37 ++++- tools/perf/util/cs-etm.h | 54 ++++++++ 4 files changed, 145 insertions(+), 111 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 9838616f8d53..12f97e1ff945 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -18,8 +18,6 @@ #include "intlist.h" #include "util.h"
-#define MAX_BUFFER 1024 - /* use raw logging */ #ifdef CS_DEBUG_RAW #define CS_LOG_RAW_FRAMES @@ -31,18 +29,12 @@ #endif #endif
-#define CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL - struct cs_etm_decoder { void *data; void (*packet_printer)(const char *msg); dcd_tree_handle_t dcd_tree; cs_etm_mem_cb_type mem_access; ocsd_datapath_resp_t prev_return; - u32 packet_count; - u32 head; - u32 tail; - struct cs_etm_packet packet_buffer[MAX_BUFFER]; };
static u32 @@ -88,14 +80,14 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder) return 0; }
-int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, +int cs_etm_decoder__get_packet(struct cs_etm_packet_queue *packet_queue, struct cs_etm_packet *packet) { - if (!decoder || !packet) + if (!packet_queue || !packet) return -EINVAL;
/* Nothing to do, might as well just return */ - if (decoder->packet_count == 0) + if (packet_queue->packet_count == 0) return 0; /* * The queueing process in function cs_etm_decoder__buffer_packet() @@ -106,11 +98,12 @@ int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, * value. Otherwise the first element of the packet queue is not * used. */ - decoder->head = (decoder->head + 1) & (MAX_BUFFER - 1); + packet_queue->head = (packet_queue->head + 1) & + (CS_ETM_PACKET_MAX_BUFFER - 1);
- *packet = decoder->packet_buffer[decoder->head]; + *packet = packet_queue->packet_buffer[packet_queue->head];
- decoder->packet_count--; + packet_queue->packet_count--;
return 1; } @@ -276,84 +269,60 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); }
-static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) -{ - int i; - - decoder->head = 0; - decoder->tail = 0; - decoder->packet_count = 0; - for (i = 0; i < MAX_BUFFER; i++) { - decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN; - decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[i].instr_count = 0; - decoder->packet_buffer[i].last_instr_taken_branch = false; - decoder->packet_buffer[i].last_instr_size = 0; - decoder->packet_buffer[i].last_instr_type = 0; - decoder->packet_buffer[i].last_instr_subtype = 0; - decoder->packet_buffer[i].last_instr_cond = 0; - decoder->packet_buffer[i].flags = 0; - decoder->packet_buffer[i].exception_number = UINT32_MAX; - decoder->packet_buffer[i].trace_chan_id = UINT8_MAX; - decoder->packet_buffer[i].cpu = INT_MIN; - } -} - static ocsd_datapath_resp_t -cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, const u8 trace_chan_id, enum cs_etm_sample_type sample_type) { u32 et = 0; int cpu;
- if (decoder->packet_count >= MAX_BUFFER - 1) + if (packet_queue->packet_count >= CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_FATAL_SYS_ERR;
if (cs_etm__get_cpu(trace_chan_id, &cpu) < 0) return OCSD_RESP_FATAL_SYS_ERR;
- et = decoder->tail; - et = (et + 1) & (MAX_BUFFER - 1); - decoder->tail = et; - decoder->packet_count++; - - decoder->packet_buffer[et].sample_type = sample_type; - decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN; - decoder->packet_buffer[et].cpu = cpu; - decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR; - decoder->packet_buffer[et].instr_count = 0; - decoder->packet_buffer[et].last_instr_taken_branch = false; - decoder->packet_buffer[et].last_instr_size = 0; - decoder->packet_buffer[et].last_instr_type = 0; - decoder->packet_buffer[et].last_instr_subtype = 0; - decoder->packet_buffer[et].last_instr_cond = 0; - decoder->packet_buffer[et].flags = 0; - decoder->packet_buffer[et].exception_number = UINT32_MAX; - decoder->packet_buffer[et].trace_chan_id = trace_chan_id; - - if (decoder->packet_count == MAX_BUFFER - 1) + et = packet_queue->tail; + et = (et + 1) & (CS_ETM_PACKET_MAX_BUFFER - 1); + packet_queue->tail = et; + packet_queue->packet_count++; + + packet_queue->packet_buffer[et].sample_type = sample_type; + packet_queue->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN; + packet_queue->packet_buffer[et].cpu = cpu; + packet_queue->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR; + packet_queue->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR; + packet_queue->packet_buffer[et].instr_count = 0; + packet_queue->packet_buffer[et].last_instr_taken_branch = false; + packet_queue->packet_buffer[et].last_instr_size = 0; + packet_queue->packet_buffer[et].last_instr_type = 0; + packet_queue->packet_buffer[et].last_instr_subtype = 0; + packet_queue->packet_buffer[et].last_instr_cond = 0; + packet_queue->packet_buffer[et].flags = 0; + packet_queue->packet_buffer[et].exception_number = UINT32_MAX; + packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id; + + if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_WAIT;
return OCSD_RESP_CONT; }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { int ret = 0; struct cs_etm_packet *packet;
- ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + ret = cs_etm_decoder__buffer_packet(packet_queue, trace_chan_id, CS_ETM_RANGE); if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) return ret;
- packet = &decoder->packet_buffer[decoder->tail]; + packet = &packet_queue->packet_buffer[packet_queue->tail];
switch (elem->isa) { case ocsd_isa_aarch64: @@ -399,36 +368,36 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_discontinuity(struct cs_etm_decoder *decoder, - const uint8_t trace_chan_id) +cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, + const uint8_t trace_chan_id) { - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY); }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_exception(struct cs_etm_packet_queue *queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { int ret = 0; struct cs_etm_packet *packet;
- ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + ret = cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_EXCEPTION); if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) return ret;
- packet = &decoder->packet_buffer[decoder->tail]; + packet = &queue->packet_buffer[queue->tail]; packet->exception_number = elem->exception_number;
return ret; }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_exception_ret(struct cs_etm_decoder *decoder, +cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) { - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, + return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_EXCEPTION_RET); }
@@ -440,6 +409,13 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( { ocsd_datapath_resp_t resp = OCSD_RESP_CONT; struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context; + struct cs_etm_queue *etmq = decoder->data; + struct cs_etm_packet_queue *packet_queue; + + /* First get the packet queue */ + packet_queue = cs_etm__etmq_get_packet_queue(etmq); + if (!packet_queue) + return OCSD_RESP_FATAL_SYS_ERR;
switch (elem->elem_type) { case OCSD_GEN_TRC_ELEM_UNKNOWN: @@ -447,19 +423,19 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( case OCSD_GEN_TRC_ELEM_EO_TRACE: case OCSD_GEN_TRC_ELEM_NO_SYNC: case OCSD_GEN_TRC_ELEM_TRACE_ON: - resp = cs_etm_decoder__buffer_discontinuity(decoder, + resp = cs_etm_decoder__buffer_discontinuity(packet_queue, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_INSTR_RANGE: - resp = cs_etm_decoder__buffer_range(decoder, elem, + resp = cs_etm_decoder__buffer_range(packet_queue, elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_EXCEPTION: - resp = cs_etm_decoder__buffer_exception(decoder, elem, + resp = cs_etm_decoder__buffer_exception(packet_queue, elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_EXCEPTION_RET: - resp = cs_etm_decoder__buffer_exception_ret(decoder, + resp = cs_etm_decoder__buffer_exception_ret(packet_queue, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT: @@ -553,7 +529,6 @@ cs_etm_decoder__new(int num_cpu, struct cs_etm_decoder_params *d_params,
decoder->data = d_params->data; decoder->prev_return = OCSD_RESP_CONT; - cs_etm_decoder__clear_buffer(decoder); format = (d_params->formatted ? OCSD_TRC_SRC_FRAME_FORMATTED : OCSD_TRC_SRC_SINGLE); flags = 0; diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 3ab11dfa92ae..6ae7ab4cf5fe 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -14,38 +14,8 @@ #include <stdio.h>
struct cs_etm_decoder; - -enum cs_etm_sample_type { - CS_ETM_EMPTY, - CS_ETM_RANGE, - CS_ETM_DISCONTINUITY, - CS_ETM_EXCEPTION, - CS_ETM_EXCEPTION_RET, -}; - -enum cs_etm_isa { - CS_ETM_ISA_UNKNOWN, - CS_ETM_ISA_A64, - CS_ETM_ISA_A32, - CS_ETM_ISA_T32, -}; - -struct cs_etm_packet { - enum cs_etm_sample_type sample_type; - enum cs_etm_isa isa; - u64 start_addr; - u64 end_addr; - u32 instr_count; - u32 last_instr_type; - u32 last_instr_subtype; - u32 flags; - u32 exception_number; - u8 last_instr_cond; - u8 last_instr_taken_branch; - u8 last_instr_size; - u8 trace_chan_id; - int cpu; -}; +struct cs_etm_packet; +struct cs_etm_packet_queue;
struct cs_etm_queue;
@@ -119,7 +89,7 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, u64 start, u64 end, cs_etm_mem_cb_type cb_func);
-int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder, +int cs_etm_decoder__get_packet(struct cs_etm_packet_queue *packet_queue, struct cs_etm_packet *packet);
int cs_etm_decoder__reset(struct cs_etm_decoder *decoder); diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index b6fb389adcc3..5a9ebbf31614 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -78,6 +78,7 @@ struct cs_etm_queue { struct cs_etm_packet *packet; const unsigned char *buf; size_t buf_len, buf_used; + struct cs_etm_packet_queue packet_queue; };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -125,6 +126,36 @@ int cs_etm__get_cpu(u8 trace_chan_id, int *cpu) return 0; }
+static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) +{ + int i; + + queue->head = 0; + queue->tail = 0; + queue->packet_count = 0; + for (i = 0; i < CS_ETM_PACKET_MAX_BUFFER; i++) { + queue->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN; + queue->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR; + queue->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR; + queue->packet_buffer[i].instr_count = 0; + queue->packet_buffer[i].last_instr_taken_branch = false; + queue->packet_buffer[i].last_instr_size = 0; + queue->packet_buffer[i].last_instr_type = 0; + queue->packet_buffer[i].last_instr_subtype = 0; + queue->packet_buffer[i].last_instr_cond = 0; + queue->packet_buffer[i].flags = 0; + queue->packet_buffer[i].exception_number = UINT32_MAX; + queue->packet_buffer[i].trace_chan_id = UINT8_MAX; + queue->packet_buffer[i].cpu = INT_MIN; + } +} + +struct cs_etm_packet_queue +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq) +{ + return &etmq->packet_queue; +} + static void cs_etm__packet_dump(const char *pkt_string) { const char *color = PERF_COLOR_BLUE; @@ -515,6 +546,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->pid = -1; etmq->offset = 0; etmq->period_instructions = 0; + cs_etm__clear_packet_queue(&etmq->packet_queue);
out: return ret; @@ -1548,10 +1580,13 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) { int ret; + struct cs_etm_packet_queue *packet_queue; + + packet_queue = cs_etm__etmq_get_packet_queue(etmq);
/* Process each packet in this chunk */ while (1) { - ret = cs_etm_decoder__get_packet(etmq->decoder, + ret = cs_etm_decoder__get_packet(packet_queue, etmq->packet); if (ret <= 0) /* diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 826c9eedaf5c..b6d0e0d0c43f 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -97,12 +97,58 @@ enum { CS_ETMV4_EXC_END = 31, };
+enum cs_etm_sample_type { + CS_ETM_EMPTY, + CS_ETM_RANGE, + CS_ETM_DISCONTINUITY, + CS_ETM_EXCEPTION, + CS_ETM_EXCEPTION_RET, +}; + +enum cs_etm_isa { + CS_ETM_ISA_UNKNOWN, + CS_ETM_ISA_A64, + CS_ETM_ISA_A32, + CS_ETM_ISA_T32, +}; + /* RB tree for quick conversion between traceID and metadata pointers */ struct intlist *traceid_list;
+struct cs_etm_queue; + +struct cs_etm_packet { + enum cs_etm_sample_type sample_type; + enum cs_etm_isa isa; + u64 start_addr; + u64 end_addr; + u32 instr_count; + u32 last_instr_type; + u32 last_instr_subtype; + u32 flags; + u32 exception_number; + u8 last_instr_cond; + u8 last_instr_taken_branch; + u8 last_instr_size; + u8 trace_chan_id; + int cpu; +}; + +#define CS_ETM_PACKET_MAX_BUFFER 1024 +#define CS_ETM_PACKET_QUEUE_NR 1 + +struct cs_etm_packet_queue { + u32 packet_count; + u32 head; + u32 tail; + struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER]; +}; + #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024)
+#define CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL + /* * Create a contiguous bitmask starting at bit position @l and ending at * position @h. For example @@ -126,6 +172,8 @@ struct intlist *traceid_list; int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); +struct cs_etm_packet_queue +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq); #else static inline int cs_etm__process_auxtrace_info(union perf_event *event __maybe_unused, @@ -139,6 +187,12 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, { return -1; } + +static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( + struct cs_etm_queue *etmq __maybe_unused) +{ + return NULL; +} #endif
#endif
In and ideal world there is one CPU per cs_etm_queue and as such, one trace ID per cs_etm_queue. In the real world CoreSight topologies allow multiple CPUs to use the same sink, which translates to multiple trace IDs per cs_etm_queue.
To deal with this a new cs_etm_traceid_queue structure is introduced to enclose all the information related to a single trace ID, allowing a cs_etm_queue to handle traces generated by any number of CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 4 +- tools/perf/util/cs-etm.c | 368 +++++++++++------- tools/perf/util/cs-etm.h | 15 +- 3 files changed, 237 insertions(+), 150 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 12f97e1ff945..275c72264820 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -412,8 +412,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( struct cs_etm_queue *etmq = decoder->data; struct cs_etm_packet_queue *packet_queue;
- /* First get the packet queue */ - packet_queue = cs_etm__etmq_get_packet_queue(etmq); + /* First get the packet queue for this traceID */ + packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id); if (!packet_queue) return OCSD_RESP_FATAL_SYS_ERR;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 5a9ebbf31614..7d862d26c107 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -60,25 +60,30 @@ struct cs_etm_auxtrace { unsigned int pmu_type; };
+struct cs_etm_traceid_queue { + u8 trace_chan_id; + u64 period_instructions; + size_t last_branch_pos; + union perf_event *event_buf; + struct branch_stack *last_branch; + struct branch_stack *last_branch_rb; + struct cs_etm_packet *prev_packet; + struct cs_etm_packet *packet; + struct cs_etm_packet_queue packet_queue; +}; + struct cs_etm_queue { struct cs_etm_auxtrace *etm; struct thread *thread; struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; - union perf_event *event_buf; unsigned int queue_nr; pid_t pid, tid; int cpu; u64 offset; - u64 period_instructions; - struct branch_stack *last_branch; - struct branch_stack *last_branch_rb; - size_t last_branch_pos; - struct cs_etm_packet *prev_packet; - struct cs_etm_packet *packet; const unsigned char *buf; size_t buf_len, buf_used; - struct cs_etm_packet_queue packet_queue; + struct cs_etm_traceid_queue *traceid_queues; };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -150,10 +155,96 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) } }
+static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, + u8 trace_chan_id) +{ + int rc = -ENOMEM; + struct cs_etm_auxtrace *etm = etmq->etm; + + cs_etm__clear_packet_queue(&tidq->packet_queue); + + tidq->trace_chan_id = trace_chan_id; + + tidq->packet = zalloc(sizeof(struct cs_etm_packet)); + if (!tidq->packet) + goto out; + + if (etm->synth_opts.last_branch || etm->sample_branches) { + tidq->prev_packet = zalloc(sizeof(struct cs_etm_packet)); + if (!tidq->prev_packet) + goto out_free; + } + + if (etm->synth_opts.last_branch) { + size_t sz = sizeof(struct branch_stack); + + sz += etm->synth_opts.last_branch_sz * + sizeof(struct branch_entry); + tidq->last_branch = zalloc(sz); + if (!tidq->last_branch) + goto out_free; + tidq->last_branch_rb = zalloc(sz); + if (!tidq->last_branch_rb) + goto out_free; + } + + tidq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE); + if (!tidq->event_buf) + goto out_free; + + return 0; + +out_free: + zfree(&tidq->last_branch_rb); + zfree(&tidq->last_branch); + zfree(&tidq->prev_packet); + zfree(&tidq->packet); +out: + return rc; +} + +static struct cs_etm_traceid_queue +*cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) +{ + struct cs_etm_traceid_queue *tidq; + struct cs_etm_auxtrace *etm = etmq->etm; + + if (!etm->timeless_decoding) + return NULL; + + tidq = etmq->traceid_queues; + + if (tidq) + return tidq; + + tidq = calloc(CS_ETM_PACKET_QUEUE_NR, sizeof(*tidq)); + if (!tidq) + return NULL; + + if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id)) + goto out_free; + + etmq->traceid_queues = tidq; + + return etmq->traceid_queues; + +out_free: + free(tidq); + + return NULL; +} + struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq) +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) { - return &etmq->packet_queue; + struct cs_etm_traceid_queue *tidq; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (tidq) + return &tidq->packet_queue; + + return NULL; }
static void cs_etm__packet_dump(const char *pkt_string) @@ -327,11 +418,12 @@ static void cs_etm__free_queue(void *priv)
thread__zput(etmq->thread); cs_etm_decoder__free(etmq->decoder); - zfree(&etmq->event_buf); - zfree(&etmq->last_branch); - zfree(&etmq->last_branch_rb); - zfree(&etmq->prev_packet); - zfree(&etmq->packet); + zfree(&etmq->traceid_queues->event_buf); + zfree(&etmq->traceid_queues->last_branch); + zfree(&etmq->traceid_queues->last_branch_rb); + zfree(&etmq->traceid_queues->prev_packet); + zfree(&etmq->traceid_queues->packet); + zfree(&etmq->traceid_queues); free(etmq); }
@@ -443,39 +535,11 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) struct cs_etm_decoder_params d_params; struct cs_etm_trace_params *t_params = NULL; struct cs_etm_queue *etmq; - size_t szp = sizeof(struct cs_etm_packet);
etmq = zalloc(sizeof(*etmq)); if (!etmq) return NULL;
- etmq->packet = zalloc(szp); - if (!etmq->packet) - goto out_free; - - if (etm->synth_opts.last_branch || etm->sample_branches) { - etmq->prev_packet = zalloc(szp); - if (!etmq->prev_packet) - goto out_free; - } - - if (etm->synth_opts.last_branch) { - size_t sz = sizeof(struct branch_stack); - - sz += etm->synth_opts.last_branch_sz * - sizeof(struct branch_entry); - etmq->last_branch = zalloc(sz); - if (!etmq->last_branch) - goto out_free; - etmq->last_branch_rb = zalloc(sz); - if (!etmq->last_branch_rb) - goto out_free; - } - - etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE); - if (!etmq->event_buf) - goto out_free; - /* Use metadata to fill in trace parameters for trace decoder */ t_params = zalloc(sizeof(*t_params) * etm->num_cpu);
@@ -510,12 +574,6 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) out_free_decoder: cs_etm_decoder__free(etmq->decoder); out_free: - zfree(&t_params); - zfree(&etmq->event_buf); - zfree(&etmq->last_branch); - zfree(&etmq->last_branch_rb); - zfree(&etmq->prev_packet); - zfree(&etmq->packet); free(etmq);
return NULL; @@ -545,8 +603,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->tid = queue->tid; etmq->pid = -1; etmq->offset = 0; - etmq->period_instructions = 0; - cs_etm__clear_packet_queue(&etmq->packet_queue);
out: return ret; @@ -579,10 +635,12 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) return 0; }
-static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { - struct branch_stack *bs_src = etmq->last_branch_rb; - struct branch_stack *bs_dst = etmq->last_branch; + struct branch_stack *bs_src = tidq->last_branch_rb; + struct branch_stack *bs_dst = tidq->last_branch; size_t nr = 0;
/* @@ -602,9 +660,9 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) * two steps. First, copy the branches from the most recently inserted * branch ->last_branch_pos until the end of bs_src->entries buffer. */ - nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos; + nr = etmq->etm->synth_opts.last_branch_sz - tidq->last_branch_pos; memcpy(&bs_dst->entries[0], - &bs_src->entries[etmq->last_branch_pos], + &bs_src->entries[tidq->last_branch_pos], sizeof(struct branch_entry) * nr);
/* @@ -617,14 +675,15 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) { memcpy(&bs_dst->entries[nr], &bs_src->entries[0], - sizeof(struct branch_entry) * etmq->last_branch_pos); + sizeof(struct branch_entry) * tidq->last_branch_pos); } }
-static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq) { - etmq->last_branch_pos = 0; - etmq->last_branch_rb->nr = 0; + tidq->last_branch_pos = 0; + tidq->last_branch_rb->nr = 0; }
static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, @@ -677,9 +736,10 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, return packet->start_addr + offset * 4; }
-static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) +static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { - struct branch_stack *bs = etmq->last_branch_rb; + struct branch_stack *bs = tidq->last_branch_rb; struct branch_entry *be;
/* @@ -688,14 +748,14 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) * buffer down. After writing the first element of the stack, move the * insert position back to the end of the buffer. */ - if (!etmq->last_branch_pos) - etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz; + if (!tidq->last_branch_pos) + tidq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
- etmq->last_branch_pos -= 1; + tidq->last_branch_pos -= 1;
- be = &bs->entries[etmq->last_branch_pos]; - be->from = cs_etm__last_executed_instr(etmq->prev_packet); - be->to = cs_etm__first_executed_instr(etmq->packet); + be = &bs->entries[tidq->last_branch_pos]; + be->from = cs_etm__last_executed_instr(tidq->prev_packet); + be->to = cs_etm__first_executed_instr(tidq->packet); /* No support for mispredict */ be->flags.mispred = 0; be->flags.predicted = 1; @@ -779,11 +839,12 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) { int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm; - union perf_event *event = etmq->event_buf; + union perf_event *event = tidq->event_buf; struct perf_sample sample = {.ip = 0,};
event->sample.header.type = PERF_RECORD_SAMPLE; @@ -796,14 +857,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.id = etmq->etm->instructions_id; sample.stream_id = etmq->etm->instructions_id; sample.period = period; - sample.cpu = etmq->packet->cpu; - sample.flags = etmq->prev_packet->flags; + sample.cpu = tidq->packet->cpu; + sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
if (etm->synth_opts.last_branch) { - cs_etm__copy_last_branch_rb(etmq); - sample.branch_stack = etmq->last_branch; + cs_etm__copy_last_branch_rb(etmq, tidq); + sample.branch_stack = tidq->last_branch; }
if (etm->synth_opts.inject) { @@ -821,7 +882,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, ret);
if (etm->synth_opts.last_branch) - cs_etm__reset_last_branch_rb(etmq); + cs_etm__reset_last_branch_rb(tidq);
return ret; } @@ -830,19 +891,20 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, * The cs etm packet encodes an instruction range between a branch target * and the next taken branch. Generate sample accordingly. */ -static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) +static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct perf_sample sample = {.ip = 0,}; - union perf_event *event = etmq->event_buf; + union perf_event *event = tidq->event_buf; struct dummy_branch_stack { u64 nr; struct branch_entry entries; } dummy_bs; u64 ip;
- ip = cs_etm__last_executed_instr(etmq->prev_packet); + ip = cs_etm__last_executed_instr(tidq->prev_packet);
event->sample.header.type = PERF_RECORD_SAMPLE; event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); @@ -851,12 +913,12 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) sample.ip = ip; sample.pid = etmq->pid; sample.tid = etmq->tid; - sample.addr = cs_etm__first_executed_instr(etmq->packet); + sample.addr = cs_etm__first_executed_instr(tidq->packet); sample.id = etmq->etm->branches_id; sample.stream_id = etmq->etm->branches_id; sample.period = 1; - sample.cpu = etmq->packet->cpu; - sample.flags = etmq->prev_packet->flags; + sample.cpu = tidq->packet->cpu; + sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
/* @@ -999,34 +1061,35 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, return 0; }
-static int cs_etm__sample(struct cs_etm_queue *etmq) +static int cs_etm__sample(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret; - u64 instrs_executed = etmq->packet->instr_count; + u64 instrs_executed = tidq->packet->instr_count;
- etmq->period_instructions += instrs_executed; + tidq->period_instructions += instrs_executed;
/* * Record a branch when the last instruction in * PREV_PACKET is a branch. */ if (etm->synth_opts.last_branch && - etmq->prev_packet && - etmq->prev_packet->sample_type == CS_ETM_RANGE && - etmq->prev_packet->last_instr_taken_branch) - cs_etm__update_last_branch_rb(etmq); + tidq->prev_packet && + tidq->prev_packet->sample_type == CS_ETM_RANGE && + tidq->prev_packet->last_instr_taken_branch) + cs_etm__update_last_branch_rb(etmq, tidq);
if (etm->sample_instructions && - etmq->period_instructions >= etm->instructions_sample_period) { + tidq->period_instructions >= etm->instructions_sample_period) { /* * Emit instruction sample periodically * TODO: allow period to be defined in cycles and clock time */
/* Get number of instructions executed after the sample point */ - u64 instrs_over = etmq->period_instructions - + u64 instrs_over = tidq->period_instructions - etm->instructions_sample_period;
/* @@ -1035,31 +1098,31 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset); + u64 addr = cs_etm__instr_addr(etmq, tidq->packet, offset);
ret = cs_etm__synth_instruction_sample( - etmq, addr, etm->instructions_sample_period); + etmq, tidq, addr, etm->instructions_sample_period); if (ret) return ret;
/* Carry remaining instructions into next sample period */ - etmq->period_instructions = instrs_over; + tidq->period_instructions = instrs_over; }
- if (etm->sample_branches && etmq->prev_packet) { + if (etm->sample_branches && tidq->prev_packet) { bool generate_sample = false;
/* Generate sample for tracing on packet */ - if (etmq->prev_packet->sample_type == CS_ETM_DISCONTINUITY) + if (tidq->prev_packet->sample_type == CS_ETM_DISCONTINUITY) generate_sample = true;
/* Generate sample for branch taken packet */ - if (etmq->prev_packet->sample_type == CS_ETM_RANGE && - etmq->prev_packet->last_instr_taken_branch) + if (tidq->prev_packet->sample_type == CS_ETM_RANGE && + tidq->prev_packet->last_instr_taken_branch) generate_sample = true;
if (generate_sample) { - ret = cs_etm__synth_branch_sample(etmq); + ret = cs_etm__synth_branch_sample(etmq, tidq); if (ret) return ret; } @@ -1070,15 +1133,15 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */ - tmp = etmq->packet; - etmq->packet = etmq->prev_packet; - etmq->prev_packet = tmp; + tmp = tidq->packet; + tidq->packet = tidq->prev_packet; + tidq->prev_packet = tmp; }
return 0; }
-static int cs_etm__exception(struct cs_etm_queue *etmq) +static int cs_etm__exception(struct cs_etm_traceid_queue *tidq) { /* * When the exception packet is inserted, whether the last instruction @@ -1091,27 +1154,28 @@ static int cs_etm__exception(struct cs_etm_queue *etmq) * swap PACKET with PREV_PACKET. This keeps PREV_PACKET to be useful * for generating instruction and branch samples. */ - if (etmq->prev_packet->sample_type == CS_ETM_RANGE) - etmq->prev_packet->last_instr_taken_branch = true; + if (tidq->prev_packet->sample_type == CS_ETM_RANGE) + tidq->prev_packet->last_instr_taken_branch = true;
return 0; }
-static int cs_etm__flush(struct cs_etm_queue *etmq) +static int cs_etm__flush(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int err = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp;
- if (!etmq->prev_packet) + if (!tidq->prev_packet) return 0;
/* Handle start tracing packet */ - if (etmq->prev_packet->sample_type == CS_ETM_EMPTY) + if (tidq->prev_packet->sample_type == CS_ETM_EMPTY) goto swap_packet;
if (etmq->etm->synth_opts.last_branch && - etmq->prev_packet->sample_type == CS_ETM_RANGE) { + tidq->prev_packet->sample_type == CS_ETM_RANGE) { /* * Generate a last branch event for the branches left in the * circular buffer at the end of the trace. @@ -1119,21 +1183,21 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Use the address of the end of the last reported execution * range */ - u64 addr = cs_etm__last_executed_instr(etmq->prev_packet); + u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( - etmq, addr, - etmq->period_instructions); + etmq, tidq, addr, + tidq->period_instructions); if (err) return err;
- etmq->period_instructions = 0; + tidq->period_instructions = 0;
}
if (etm->sample_branches && - etmq->prev_packet->sample_type == CS_ETM_RANGE) { - err = cs_etm__synth_branch_sample(etmq); + tidq->prev_packet->sample_type == CS_ETM_RANGE) { + err = cs_etm__synth_branch_sample(etmq, tidq); if (err) return err; } @@ -1144,15 +1208,16 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */ - tmp = etmq->packet; - etmq->packet = etmq->prev_packet; - etmq->prev_packet = tmp; + tmp = tidq->packet; + tidq->packet = tidq->prev_packet; + tidq->prev_packet = tmp; }
return err; }
-static int cs_etm__end_block(struct cs_etm_queue *etmq) +static int cs_etm__end_block(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int err;
@@ -1166,20 +1231,20 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq) * the trace. */ if (etmq->etm->synth_opts.last_branch && - etmq->prev_packet->sample_type == CS_ETM_RANGE) { + tidq->prev_packet->sample_type == CS_ETM_RANGE) { /* * Use the address of the end of the last reported execution * range. */ - u64 addr = cs_etm__last_executed_instr(etmq->prev_packet); + u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( - etmq, addr, - etmq->period_instructions); + etmq, tidq, addr, + tidq->period_instructions); if (err) return err;
- etmq->period_instructions = 0; + tidq->period_instructions = 0; }
return 0; @@ -1278,10 +1343,11 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, return false; }
-static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, u64 magic) { - struct cs_etm_packet *packet = etmq->packet; - struct cs_etm_packet *prev_packet = etmq->prev_packet; + struct cs_etm_packet *packet = tidq->packet; + struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SVC) @@ -1302,9 +1368,10 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) return false; }
-static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_async_exception(struct cs_etm_traceid_queue *tidq, + u64 magic) { - struct cs_etm_packet *packet = etmq->packet; + struct cs_etm_packet *packet = tidq->packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_DEBUG_HALT || @@ -1327,10 +1394,12 @@ static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) return false; }
-static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, + u64 magic) { - struct cs_etm_packet *packet = etmq->packet; - struct cs_etm_packet *prev_packet = etmq->prev_packet; + struct cs_etm_packet *packet = tidq->packet; + struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SMC || @@ -1373,10 +1442,11 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) return false; }
-static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) +static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { - struct cs_etm_packet *packet = etmq->packet; - struct cs_etm_packet *prev_packet = etmq->prev_packet; + struct cs_etm_packet *packet = tidq->packet; + struct cs_etm_packet *prev_packet = tidq->prev_packet; u64 magic; int ret;
@@ -1478,7 +1548,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) return ret;
/* The exception is for system call. */ - if (cs_etm__is_syscall(etmq, magic)) + if (cs_etm__is_syscall(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_SYSCALLRET; @@ -1486,7 +1556,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * The exceptions are triggered by external signals from bus, * interrupt controller, debug module, PE reset or halt. */ - else if (cs_etm__is_async_exception(etmq, magic)) + else if (cs_etm__is_async_exception(tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC | @@ -1495,7 +1565,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * Otherwise, exception is caused by trap, instruction & * data fault, or alignment errors. */ - else if (cs_etm__is_sync_exception(etmq, magic)) + else if (cs_etm__is_sync_exception(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_INTERRUPT; @@ -1577,17 +1647,18 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; }
-static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) +static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { int ret; struct cs_etm_packet_queue *packet_queue;
- packet_queue = cs_etm__etmq_get_packet_queue(etmq); + packet_queue = &tidq->packet_queue;
/* Process each packet in this chunk */ while (1) { ret = cs_etm_decoder__get_packet(packet_queue, - etmq->packet); + tidq->packet); if (ret <= 0) /* * Stop processing this chunk on @@ -1602,18 +1673,18 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * prior to switch() statement to use address * information before packets swapping. */ - ret = cs_etm__set_sample_flags(etmq); + ret = cs_etm__set_sample_flags(etmq, tidq); if (ret < 0) break;
- switch (etmq->packet->sample_type) { + switch (tidq->packet->sample_type) { case CS_ETM_RANGE: /* * If the packet contains an instruction * range, generate instruction sequence * events. */ - cs_etm__sample(etmq); + cs_etm__sample(etmq, tidq); break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET: @@ -1622,14 +1693,14 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * make sure the previous instruction * range packet to be handled properly. */ - cs_etm__exception(etmq); + cs_etm__exception(tidq); break; case CS_ETM_DISCONTINUITY: /* * Discontinuity in trace, flush * previous branch stack */ - cs_etm__flush(etmq); + cs_etm__flush(etmq, tidq); break; case CS_ETM_EMPTY: /* @@ -1649,6 +1720,11 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) static int cs_etm__run_decoder(struct cs_etm_queue *etmq) { int err = 0; + struct cs_etm_traceid_queue *tidq; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID); + if (!tidq) + return -EINVAL;
/* Go through each buffer in the queue and decode them one by one */ while (1) { @@ -1667,13 +1743,13 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) * an error occurs other than hoping the next one will * be better. */ - err = cs_etm__process_decoder_queue(etmq); + err = cs_etm__process_traceid_queue(etmq, tidq);
} while (etmq->buf_len);
if (err == 0) /* Flush any remaining branch stack entries */ - err = cs_etm__end_block(etmq); + err = cs_etm__end_block(etmq, tidq); }
return err; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index b6d0e0d0c43f..0dfc79a678f8 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -137,6 +137,16 @@ struct cs_etm_packet { #define CS_ETM_PACKET_MAX_BUFFER 1024 #define CS_ETM_PACKET_QUEUE_NR 1
+/* + * When working with per-thread scenarios the process under trace can + * be scheduled on any CPU and as such, more than one traceID may be + * associated with the same process. Since a traceID of '0' is illegal + * as per the CoreSight architecture, use that specific value to + * identify the queue where all packets (with any traceID) are + * aggregated. + */ +#define CS_ETM_PER_THREAD_TRACEID 0 + struct cs_etm_packet_queue { u32 packet_count; u32 head; @@ -173,7 +183,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq); +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else static inline int cs_etm__process_auxtrace_info(union perf_event *event __maybe_unused, @@ -189,7 +199,8 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, }
static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( - struct cs_etm_queue *etmq __maybe_unused) + struct cs_etm_queue *etmq __maybe_unused, + u8 trace_chan_id __maybe_unused) { return NULL; }
On Wed, Mar 06, 2019 at 03:59:33PM -0700, Mathieu Poirier wrote:
In and ideal world there is one CPU per cs_etm_queue and as such, one trace
s/and/an
ID per cs_etm_queue. In the real world CoreSight topologies allow multiple CPUs to use the same sink, which translates to multiple trace IDs per cs_etm_queue.
To deal with this a new cs_etm_traceid_queue structure is introduced to enclose all the information related to a single trace ID, allowing a cs_etm_queue to handle traces generated by any number of CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 4 +- tools/perf/util/cs-etm.c | 368 +++++++++++------- tools/perf/util/cs-etm.h | 15 +- 3 files changed, 237 insertions(+), 150 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 12f97e1ff945..275c72264820 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -412,8 +412,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( struct cs_etm_queue *etmq = decoder->data; struct cs_etm_packet_queue *packet_queue;
- /* First get the packet queue */
- packet_queue = cs_etm__etmq_get_packet_queue(etmq);
- /* First get the packet queue for this traceID */
- packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id); if (!packet_queue) return OCSD_RESP_FATAL_SYS_ERR;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 5a9ebbf31614..7d862d26c107 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -60,25 +60,30 @@ struct cs_etm_auxtrace { unsigned int pmu_type; }; +struct cs_etm_traceid_queue {
- u8 trace_chan_id;
- u64 period_instructions;
- size_t last_branch_pos;
- union perf_event *event_buf;
- struct branch_stack *last_branch;
- struct branch_stack *last_branch_rb;
- struct cs_etm_packet *prev_packet;
- struct cs_etm_packet *packet;
- struct cs_etm_packet_queue packet_queue;
+};
struct cs_etm_queue { struct cs_etm_auxtrace *etm; struct thread *thread; struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer;
- union perf_event *event_buf; unsigned int queue_nr; pid_t pid, tid; int cpu; u64 offset;
- u64 period_instructions;
- struct branch_stack *last_branch;
- struct branch_stack *last_branch_rb;
- size_t last_branch_pos;
- struct cs_etm_packet *prev_packet;
- struct cs_etm_packet *packet; const unsigned char *buf; size_t buf_len, buf_used;
- struct cs_etm_packet_queue packet_queue;
- struct cs_etm_traceid_queue *traceid_queues;
}; static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -150,10 +155,96 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) } } +static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq,
u8 trace_chan_id)
+{
- int rc = -ENOMEM;
- struct cs_etm_auxtrace *etm = etmq->etm;
- cs_etm__clear_packet_queue(&tidq->packet_queue);
- tidq->trace_chan_id = trace_chan_id;
- tidq->packet = zalloc(sizeof(struct cs_etm_packet));
- if (!tidq->packet)
goto out;
- if (etm->synth_opts.last_branch || etm->sample_branches) {
tidq->prev_packet = zalloc(sizeof(struct cs_etm_packet));
if (!tidq->prev_packet)
goto out_free;
- }
- if (etm->synth_opts.last_branch) {
size_t sz = sizeof(struct branch_stack);
sz += etm->synth_opts.last_branch_sz *
sizeof(struct branch_entry);
tidq->last_branch = zalloc(sz);
if (!tidq->last_branch)
goto out_free;
tidq->last_branch_rb = zalloc(sz);
if (!tidq->last_branch_rb)
goto out_free;
- }
- tidq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
- if (!tidq->event_buf)
goto out_free;
- return 0;
+out_free:
- zfree(&tidq->last_branch_rb);
- zfree(&tidq->last_branch);
- zfree(&tidq->prev_packet);
- zfree(&tidq->packet);
+out:
- return rc;
+}
+static struct cs_etm_traceid_queue +*cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) +{
- struct cs_etm_traceid_queue *tidq;
- struct cs_etm_auxtrace *etm = etmq->etm;
- if (!etm->timeless_decoding)
return NULL;
- tidq = etmq->traceid_queues;
- if (tidq)
return tidq;
- tidq = calloc(CS_ETM_PACKET_QUEUE_NR, sizeof(*tidq));
- if (!tidq)
return NULL;
Since CS_ETM_PACKET_QUEUE_NR is defined as 1, so just curious if this macro will be changed to other values later? Otherwise, seems to me to directly use malloc(sizeof(*tidq)) will be more directive.
- if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id))
goto out_free;
- etmq->traceid_queues = tidq;
- return etmq->traceid_queues;
+out_free:
- free(tidq);
- return NULL;
+}
struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq) +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) {
- return &etmq->packet_queue;
- struct cs_etm_traceid_queue *tidq;
- tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
- if (tidq)
return &tidq->packet_queue;
- return NULL;
} static void cs_etm__packet_dump(const char *pkt_string) @@ -327,11 +418,12 @@ static void cs_etm__free_queue(void *priv) thread__zput(etmq->thread); cs_etm_decoder__free(etmq->decoder);
- zfree(&etmq->event_buf);
- zfree(&etmq->last_branch);
- zfree(&etmq->last_branch_rb);
- zfree(&etmq->prev_packet);
- zfree(&etmq->packet);
- zfree(&etmq->traceid_queues->event_buf);
- zfree(&etmq->traceid_queues->last_branch);
- zfree(&etmq->traceid_queues->last_branch_rb);
- zfree(&etmq->traceid_queues->prev_packet);
- zfree(&etmq->traceid_queues->packet);
- zfree(&etmq->traceid_queues); free(etmq);
} @@ -443,39 +535,11 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) struct cs_etm_decoder_params d_params; struct cs_etm_trace_params *t_params = NULL; struct cs_etm_queue *etmq;
- size_t szp = sizeof(struct cs_etm_packet);
etmq = zalloc(sizeof(*etmq)); if (!etmq) return NULL;
- etmq->packet = zalloc(szp);
- if (!etmq->packet)
goto out_free;
- if (etm->synth_opts.last_branch || etm->sample_branches) {
etmq->prev_packet = zalloc(szp);
if (!etmq->prev_packet)
goto out_free;
- }
- if (etm->synth_opts.last_branch) {
size_t sz = sizeof(struct branch_stack);
sz += etm->synth_opts.last_branch_sz *
sizeof(struct branch_entry);
etmq->last_branch = zalloc(sz);
if (!etmq->last_branch)
goto out_free;
etmq->last_branch_rb = zalloc(sz);
if (!etmq->last_branch_rb)
goto out_free;
- }
- etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
- if (!etmq->event_buf)
goto out_free;
- /* Use metadata to fill in trace parameters for trace decoder */ t_params = zalloc(sizeof(*t_params) * etm->num_cpu);
@@ -510,12 +574,6 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) out_free_decoder: cs_etm_decoder__free(etmq->decoder); out_free:
- zfree(&t_params);
- zfree(&etmq->event_buf);
- zfree(&etmq->last_branch);
- zfree(&etmq->last_branch_rb);
- zfree(&etmq->prev_packet);
- zfree(&etmq->packet); free(etmq);
return NULL; @@ -545,8 +603,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->tid = queue->tid; etmq->pid = -1; etmq->offset = 0;
- etmq->period_instructions = 0;
- cs_etm__clear_packet_queue(&etmq->packet_queue);
out: return ret; @@ -579,10 +635,12 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) return 0; } -static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{
- struct branch_stack *bs_src = etmq->last_branch_rb;
- struct branch_stack *bs_dst = etmq->last_branch;
- struct branch_stack *bs_src = tidq->last_branch_rb;
- struct branch_stack *bs_dst = tidq->last_branch; size_t nr = 0;
/* @@ -602,9 +660,9 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) * two steps. First, copy the branches from the most recently inserted * branch ->last_branch_pos until the end of bs_src->entries buffer. */
- nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
- nr = etmq->etm->synth_opts.last_branch_sz - tidq->last_branch_pos; memcpy(&bs_dst->entries[0],
&bs_src->entries[etmq->last_branch_pos],
&bs_src->entries[tidq->last_branch_pos], sizeof(struct branch_entry) * nr);
/* @@ -617,14 +675,15 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) { memcpy(&bs_dst->entries[nr], &bs_src->entries[0],
sizeof(struct branch_entry) * etmq->last_branch_pos);
}sizeof(struct branch_entry) * tidq->last_branch_pos);
} -static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq) {
- etmq->last_branch_pos = 0;
- etmq->last_branch_rb->nr = 0;
- tidq->last_branch_pos = 0;
- tidq->last_branch_rb->nr = 0;
} static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, @@ -677,9 +736,10 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, return packet->start_addr + offset * 4; } -static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) +static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{
- struct branch_stack *bs = etmq->last_branch_rb;
- struct branch_stack *bs = tidq->last_branch_rb; struct branch_entry *be;
/* @@ -688,14 +748,14 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) * buffer down. After writing the first element of the stack, move the * insert position back to the end of the buffer. */
- if (!etmq->last_branch_pos)
etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
- if (!tidq->last_branch_pos)
tidq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
- etmq->last_branch_pos -= 1;
- tidq->last_branch_pos -= 1;
- be = &bs->entries[etmq->last_branch_pos];
- be->from = cs_etm__last_executed_instr(etmq->prev_packet);
- be->to = cs_etm__first_executed_instr(etmq->packet);
- be = &bs->entries[tidq->last_branch_pos];
- be->from = cs_etm__last_executed_instr(tidq->prev_packet);
- be->to = cs_etm__first_executed_instr(tidq->packet); /* No support for mispredict */ be->flags.mispred = 0; be->flags.predicted = 1;
@@ -779,11 +839,12 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, } static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq, u64 addr, u64 period)
{ int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm;
- union perf_event *event = etmq->event_buf;
- union perf_event *event = tidq->event_buf; struct perf_sample sample = {.ip = 0,};
event->sample.header.type = PERF_RECORD_SAMPLE; @@ -796,14 +857,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.id = etmq->etm->instructions_id; sample.stream_id = etmq->etm->instructions_id; sample.period = period;
- sample.cpu = etmq->packet->cpu;
- sample.flags = etmq->prev_packet->flags;
- sample.cpu = tidq->packet->cpu;
- sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
if (etm->synth_opts.last_branch) {
cs_etm__copy_last_branch_rb(etmq);
sample.branch_stack = etmq->last_branch;
cs_etm__copy_last_branch_rb(etmq, tidq);
}sample.branch_stack = tidq->last_branch;
if (etm->synth_opts.inject) { @@ -821,7 +882,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, ret); if (etm->synth_opts.last_branch)
cs_etm__reset_last_branch_rb(etmq);
cs_etm__reset_last_branch_rb(tidq);
return ret; } @@ -830,19 +891,20 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
- The cs etm packet encodes an instruction range between a branch target
- and the next taken branch. Generate sample accordingly.
*/ -static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) +static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct perf_sample sample = {.ip = 0,};
- union perf_event *event = etmq->event_buf;
- union perf_event *event = tidq->event_buf; struct dummy_branch_stack { u64 nr; struct branch_entry entries; } dummy_bs; u64 ip;
- ip = cs_etm__last_executed_instr(etmq->prev_packet);
- ip = cs_etm__last_executed_instr(tidq->prev_packet);
event->sample.header.type = PERF_RECORD_SAMPLE; event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); @@ -851,12 +913,12 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) sample.ip = ip; sample.pid = etmq->pid; sample.tid = etmq->tid;
- sample.addr = cs_etm__first_executed_instr(etmq->packet);
- sample.addr = cs_etm__first_executed_instr(tidq->packet); sample.id = etmq->etm->branches_id; sample.stream_id = etmq->etm->branches_id; sample.period = 1;
- sample.cpu = etmq->packet->cpu;
- sample.flags = etmq->prev_packet->flags;
- sample.cpu = tidq->packet->cpu;
- sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
/* @@ -999,34 +1061,35 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, return 0; } -static int cs_etm__sample(struct cs_etm_queue *etmq) +static int cs_etm__sample(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
- u64 instrs_executed = etmq->packet->instr_count;
- u64 instrs_executed = tidq->packet->instr_count;
- etmq->period_instructions += instrs_executed;
- tidq->period_instructions += instrs_executed;
/* * Record a branch when the last instruction in * PREV_PACKET is a branch. */ if (etm->synth_opts.last_branch &&
etmq->prev_packet &&
etmq->prev_packet->sample_type == CS_ETM_RANGE &&
etmq->prev_packet->last_instr_taken_branch)
cs_etm__update_last_branch_rb(etmq);
tidq->prev_packet &&
tidq->prev_packet->sample_type == CS_ETM_RANGE &&
tidq->prev_packet->last_instr_taken_branch)
cs_etm__update_last_branch_rb(etmq, tidq);
if (etm->sample_instructions &&
etmq->period_instructions >= etm->instructions_sample_period) {
/*tidq->period_instructions >= etm->instructions_sample_period) {
*/
- Emit instruction sample periodically
- TODO: allow period to be defined in cycles and clock time
/* Get number of instructions executed after the sample point */
u64 instrs_over = etmq->period_instructions -
u64 instrs_over = tidq->period_instructions - etm->instructions_sample_period;
/* @@ -1035,31 +1098,31 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);
u64 addr = cs_etm__instr_addr(etmq, tidq->packet, offset);
ret = cs_etm__synth_instruction_sample(
etmq, addr, etm->instructions_sample_period);
if (ret) return ret;etmq, tidq, addr, etm->instructions_sample_period);
/* Carry remaining instructions into next sample period */
etmq->period_instructions = instrs_over;
}tidq->period_instructions = instrs_over;
- if (etm->sample_branches && etmq->prev_packet) {
- if (etm->sample_branches && tidq->prev_packet) { bool generate_sample = false;
/* Generate sample for tracing on packet */
if (etmq->prev_packet->sample_type == CS_ETM_DISCONTINUITY)
if (tidq->prev_packet->sample_type == CS_ETM_DISCONTINUITY) generate_sample = true;
/* Generate sample for branch taken packet */
if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
etmq->prev_packet->last_instr_taken_branch)
if (tidq->prev_packet->sample_type == CS_ETM_RANGE &&
tidq->prev_packet->last_instr_taken_branch) generate_sample = true;
if (generate_sample) {
ret = cs_etm__synth_branch_sample(etmq);
}ret = cs_etm__synth_branch_sample(etmq, tidq); if (ret) return ret;
@@ -1070,15 +1133,15 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */
tmp = etmq->packet;
etmq->packet = etmq->prev_packet;
etmq->prev_packet = tmp;
tmp = tidq->packet;
tidq->packet = tidq->prev_packet;
}tidq->prev_packet = tmp;
return 0; } -static int cs_etm__exception(struct cs_etm_queue *etmq) +static int cs_etm__exception(struct cs_etm_traceid_queue *tidq) { /* * When the exception packet is inserted, whether the last instruction @@ -1091,27 +1154,28 @@ static int cs_etm__exception(struct cs_etm_queue *etmq) * swap PACKET with PREV_PACKET. This keeps PREV_PACKET to be useful * for generating instruction and branch samples. */
- if (etmq->prev_packet->sample_type == CS_ETM_RANGE)
etmq->prev_packet->last_instr_taken_branch = true;
- if (tidq->prev_packet->sample_type == CS_ETM_RANGE)
tidq->prev_packet->last_instr_taken_branch = true;
return 0; } -static int cs_etm__flush(struct cs_etm_queue *etmq) +static int cs_etm__flush(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int err = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp;
- if (!etmq->prev_packet)
- if (!tidq->prev_packet) return 0;
/* Handle start tracing packet */
- if (etmq->prev_packet->sample_type == CS_ETM_EMPTY)
- if (tidq->prev_packet->sample_type == CS_ETM_EMPTY) goto swap_packet;
if (etmq->etm->synth_opts.last_branch &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
/*tidq->prev_packet->sample_type == CS_ETM_RANGE) {
- Generate a last branch event for the branches left in the
- circular buffer at the end of the trace.
@@ -1119,21 +1183,21 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Use the address of the end of the last reported execution * range */
u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
etmq, addr,
etmq->period_instructions);
etmq, tidq, addr,
if (err) return err;tidq->period_instructions);
etmq->period_instructions = 0;
tidq->period_instructions = 0;
} if (etm->sample_branches &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
err = cs_etm__synth_branch_sample(etmq);
tidq->prev_packet->sample_type == CS_ETM_RANGE) {
if (err) return err; }err = cs_etm__synth_branch_sample(etmq, tidq);
@@ -1144,15 +1208,16 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */
tmp = etmq->packet;
etmq->packet = etmq->prev_packet;
etmq->prev_packet = tmp;
tmp = tidq->packet;
tidq->packet = tidq->prev_packet;
}tidq->prev_packet = tmp;
return err; } -static int cs_etm__end_block(struct cs_etm_queue *etmq) +static int cs_etm__end_block(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int err; @@ -1166,20 +1231,20 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq) * the trace. */ if (etmq->etm->synth_opts.last_branch &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
/*tidq->prev_packet->sample_type == CS_ETM_RANGE) {
*/
- Use the address of the end of the last reported execution
- range.
u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
etmq, addr,
etmq->period_instructions);
etmq, tidq, addr,
if (err) return err;tidq->period_instructions);
etmq->period_instructions = 0;
}tidq->period_instructions = 0;
return 0; @@ -1278,10 +1343,11 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, return false; } -static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq, u64 magic)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *prev_packet = etmq->prev_packet;
- struct cs_etm_packet *packet = tidq->packet;
- struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SVC) @@ -1302,9 +1368,10 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) return false; } -static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_async_exception(struct cs_etm_traceid_queue *tidq,
u64 magic)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *packet = tidq->packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_DEBUG_HALT || @@ -1327,10 +1394,12 @@ static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) return false; } -static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq,
u64 magic)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *prev_packet = etmq->prev_packet;
- struct cs_etm_packet *packet = tidq->packet;
- struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SMC || @@ -1373,10 +1442,11 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) return false; } -static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) +static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *prev_packet = etmq->prev_packet;
- struct cs_etm_packet *packet = tidq->packet;
- struct cs_etm_packet *prev_packet = tidq->prev_packet; u64 magic; int ret;
@@ -1478,7 +1548,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) return ret; /* The exception is for system call. */
if (cs_etm__is_syscall(etmq, magic))
if (cs_etm__is_syscall(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_SYSCALLRET;
@@ -1486,7 +1556,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * The exceptions are triggered by external signals from bus, * interrupt controller, debug module, PE reset or halt. */
else if (cs_etm__is_async_exception(etmq, magic))
else if (cs_etm__is_async_exception(tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
@@ -1495,7 +1565,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * Otherwise, exception is caused by trap, instruction & * data fault, or alignment errors. */
else if (cs_etm__is_sync_exception(etmq, magic))
else if (cs_etm__is_sync_exception(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_INTERRUPT;
@@ -1577,17 +1647,18 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; } -static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) +static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int ret; struct cs_etm_packet_queue *packet_queue;
- packet_queue = cs_etm__etmq_get_packet_queue(etmq);
- packet_queue = &tidq->packet_queue;
/* Process each packet in this chunk */ while (1) {
Indention?
ret = cs_etm_decoder__get_packet(packet_queue,
etmq->packet);
tidq->packet); if (ret <= 0) /* * Stop processing this chunk on
@@ -1602,18 +1673,18 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * prior to switch() statement to use address * information before packets swapping. */
ret = cs_etm__set_sample_flags(etmq);
ret = cs_etm__set_sample_flags(etmq, tidq); if (ret < 0) break;
switch (etmq->packet->sample_type) {
switch (tidq->packet->sample_type) { case CS_ETM_RANGE: /* * If the packet contains an instruction * range, generate instruction sequence * events. */
cs_etm__sample(etmq);
cs_etm__sample(etmq, tidq); break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
@@ -1622,14 +1693,14 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * make sure the previous instruction * range packet to be handled properly. */
cs_etm__exception(etmq);
cs_etm__exception(tidq); break; case CS_ETM_DISCONTINUITY: /* * Discontinuity in trace, flush * previous branch stack */
cs_etm__flush(etmq);
cs_etm__flush(etmq, tidq); break; case CS_ETM_EMPTY: /*
@@ -1649,6 +1720,11 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) static int cs_etm__run_decoder(struct cs_etm_queue *etmq) { int err = 0;
- struct cs_etm_traceid_queue *tidq;
- tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID);
- if (!tidq)
return -EINVAL;
I am confused by this. For CPU wide trace case, since every CPU has its own trace id and the trace id is not the same with CS_ETM_PER_THREAD_TRACEID; if so we create the queue with trace id = CS_ETM_PER_THREAD_TRACEID, will this queue be redundant and cannot be used for any CPU in system?
Furthermore, we can see this 'tidq' will be always used by below decoding loop, should the 'tidq' be dynamically updated based on latest tracing data generated from different CPU?
/* Go through each buffer in the queue and decode them one by one */ while (1) { @@ -1667,13 +1743,13 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) * an error occurs other than hoping the next one will * be better. */
err = cs_etm__process_decoder_queue(etmq);
err = cs_etm__process_traceid_queue(etmq, tidq);
} while (etmq->buf_len); if (err == 0) /* Flush any remaining branch stack entries */
err = cs_etm__end_block(etmq);
}err = cs_etm__end_block(etmq, tidq);
return err; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index b6d0e0d0c43f..0dfc79a678f8 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -137,6 +137,16 @@ struct cs_etm_packet { #define CS_ETM_PACKET_MAX_BUFFER 1024 #define CS_ETM_PACKET_QUEUE_NR 1 +/*
- When working with per-thread scenarios the process under trace can
- be scheduled on any CPU and as such, more than one traceID may be
- associated with the same process. Since a traceID of '0' is illegal
- as per the CoreSight architecture, use that specific value to
- identify the queue where all packets (with any traceID) are
- aggregated.
- */
+#define CS_ETM_PER_THREAD_TRACEID 0
struct cs_etm_packet_queue { u32 packet_count; u32 head; @@ -173,7 +183,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq); +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else static inline int cs_etm__process_auxtrace_info(union perf_event *event __maybe_unused, @@ -189,7 +199,8 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, } static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(
struct cs_etm_queue *etmq __maybe_unused)
struct cs_etm_queue *etmq __maybe_unused,
u8 trace_chan_id __maybe_unused)
{ return NULL; } -- 2.17.1
On Wed, Apr 03, 2019 at 05:42:19PM +0800, Leo Yan wrote:
On Wed, Mar 06, 2019 at 03:59:33PM -0700, Mathieu Poirier wrote:
In and ideal world there is one CPU per cs_etm_queue and as such, one trace
s/and/an
Ok
ID per cs_etm_queue. In the real world CoreSight topologies allow multiple CPUs to use the same sink, which translates to multiple trace IDs per cs_etm_queue.
To deal with this a new cs_etm_traceid_queue structure is introduced to enclose all the information related to a single trace ID, allowing a cs_etm_queue to handle traces generated by any number of CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 4 +- tools/perf/util/cs-etm.c | 368 +++++++++++------- tools/perf/util/cs-etm.h | 15 +- 3 files changed, 237 insertions(+), 150 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 12f97e1ff945..275c72264820 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -412,8 +412,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( struct cs_etm_queue *etmq = decoder->data; struct cs_etm_packet_queue *packet_queue;
- /* First get the packet queue */
- packet_queue = cs_etm__etmq_get_packet_queue(etmq);
- /* First get the packet queue for this traceID */
- packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id); if (!packet_queue) return OCSD_RESP_FATAL_SYS_ERR;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 5a9ebbf31614..7d862d26c107 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -60,25 +60,30 @@ struct cs_etm_auxtrace { unsigned int pmu_type; }; +struct cs_etm_traceid_queue {
- u8 trace_chan_id;
- u64 period_instructions;
- size_t last_branch_pos;
- union perf_event *event_buf;
- struct branch_stack *last_branch;
- struct branch_stack *last_branch_rb;
- struct cs_etm_packet *prev_packet;
- struct cs_etm_packet *packet;
- struct cs_etm_packet_queue packet_queue;
+};
struct cs_etm_queue { struct cs_etm_auxtrace *etm; struct thread *thread; struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer;
- union perf_event *event_buf; unsigned int queue_nr; pid_t pid, tid; int cpu; u64 offset;
- u64 period_instructions;
- struct branch_stack *last_branch;
- struct branch_stack *last_branch_rb;
- size_t last_branch_pos;
- struct cs_etm_packet *prev_packet;
- struct cs_etm_packet *packet; const unsigned char *buf; size_t buf_len, buf_used;
- struct cs_etm_packet_queue packet_queue;
- struct cs_etm_traceid_queue *traceid_queues;
}; static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -150,10 +155,96 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) } } +static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq,
u8 trace_chan_id)
+{
- int rc = -ENOMEM;
- struct cs_etm_auxtrace *etm = etmq->etm;
- cs_etm__clear_packet_queue(&tidq->packet_queue);
- tidq->trace_chan_id = trace_chan_id;
- tidq->packet = zalloc(sizeof(struct cs_etm_packet));
- if (!tidq->packet)
goto out;
- if (etm->synth_opts.last_branch || etm->sample_branches) {
tidq->prev_packet = zalloc(sizeof(struct cs_etm_packet));
if (!tidq->prev_packet)
goto out_free;
- }
- if (etm->synth_opts.last_branch) {
size_t sz = sizeof(struct branch_stack);
sz += etm->synth_opts.last_branch_sz *
sizeof(struct branch_entry);
tidq->last_branch = zalloc(sz);
if (!tidq->last_branch)
goto out_free;
tidq->last_branch_rb = zalloc(sz);
if (!tidq->last_branch_rb)
goto out_free;
- }
- tidq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
- if (!tidq->event_buf)
goto out_free;
- return 0;
+out_free:
- zfree(&tidq->last_branch_rb);
- zfree(&tidq->last_branch);
- zfree(&tidq->prev_packet);
- zfree(&tidq->packet);
+out:
- return rc;
+}
+static struct cs_etm_traceid_queue +*cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) +{
- struct cs_etm_traceid_queue *tidq;
- struct cs_etm_auxtrace *etm = etmq->etm;
- if (!etm->timeless_decoding)
return NULL;
- tidq = etmq->traceid_queues;
- if (tidq)
return tidq;
- tidq = calloc(CS_ETM_PACKET_QUEUE_NR, sizeof(*tidq));
- if (!tidq)
return NULL;
Since CS_ETM_PACKET_QUEUE_NR is defined as 1, so just curious if this macro will be changed to other values later? Otherwise, seems to me to directly use malloc(sizeof(*tidq)) will be more directive.
I think I had different plans for that define when I started but looking back now there is hardly any reason to having it around. I have removed it to favour a combination of malloc/memset.
- if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id))
goto out_free;
- etmq->traceid_queues = tidq;
- return etmq->traceid_queues;
+out_free:
- free(tidq);
- return NULL;
+}
struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq) +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) {
- return &etmq->packet_queue;
- struct cs_etm_traceid_queue *tidq;
- tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
- if (tidq)
return &tidq->packet_queue;
- return NULL;
} static void cs_etm__packet_dump(const char *pkt_string) @@ -327,11 +418,12 @@ static void cs_etm__free_queue(void *priv) thread__zput(etmq->thread); cs_etm_decoder__free(etmq->decoder);
- zfree(&etmq->event_buf);
- zfree(&etmq->last_branch);
- zfree(&etmq->last_branch_rb);
- zfree(&etmq->prev_packet);
- zfree(&etmq->packet);
- zfree(&etmq->traceid_queues->event_buf);
- zfree(&etmq->traceid_queues->last_branch);
- zfree(&etmq->traceid_queues->last_branch_rb);
- zfree(&etmq->traceid_queues->prev_packet);
- zfree(&etmq->traceid_queues->packet);
- zfree(&etmq->traceid_queues); free(etmq);
} @@ -443,39 +535,11 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) struct cs_etm_decoder_params d_params; struct cs_etm_trace_params *t_params = NULL; struct cs_etm_queue *etmq;
- size_t szp = sizeof(struct cs_etm_packet);
etmq = zalloc(sizeof(*etmq)); if (!etmq) return NULL;
- etmq->packet = zalloc(szp);
- if (!etmq->packet)
goto out_free;
- if (etm->synth_opts.last_branch || etm->sample_branches) {
etmq->prev_packet = zalloc(szp);
if (!etmq->prev_packet)
goto out_free;
- }
- if (etm->synth_opts.last_branch) {
size_t sz = sizeof(struct branch_stack);
sz += etm->synth_opts.last_branch_sz *
sizeof(struct branch_entry);
etmq->last_branch = zalloc(sz);
if (!etmq->last_branch)
goto out_free;
etmq->last_branch_rb = zalloc(sz);
if (!etmq->last_branch_rb)
goto out_free;
- }
- etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
- if (!etmq->event_buf)
goto out_free;
- /* Use metadata to fill in trace parameters for trace decoder */ t_params = zalloc(sizeof(*t_params) * etm->num_cpu);
@@ -510,12 +574,6 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) out_free_decoder: cs_etm_decoder__free(etmq->decoder); out_free:
- zfree(&t_params);
- zfree(&etmq->event_buf);
- zfree(&etmq->last_branch);
- zfree(&etmq->last_branch_rb);
- zfree(&etmq->prev_packet);
- zfree(&etmq->packet); free(etmq);
return NULL; @@ -545,8 +603,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->tid = queue->tid; etmq->pid = -1; etmq->offset = 0;
- etmq->period_instructions = 0;
- cs_etm__clear_packet_queue(&etmq->packet_queue);
out: return ret; @@ -579,10 +635,12 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) return 0; } -static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{
- struct branch_stack *bs_src = etmq->last_branch_rb;
- struct branch_stack *bs_dst = etmq->last_branch;
- struct branch_stack *bs_src = tidq->last_branch_rb;
- struct branch_stack *bs_dst = tidq->last_branch; size_t nr = 0;
/* @@ -602,9 +660,9 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) * two steps. First, copy the branches from the most recently inserted * branch ->last_branch_pos until the end of bs_src->entries buffer. */
- nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
- nr = etmq->etm->synth_opts.last_branch_sz - tidq->last_branch_pos; memcpy(&bs_dst->entries[0],
&bs_src->entries[etmq->last_branch_pos],
&bs_src->entries[tidq->last_branch_pos], sizeof(struct branch_entry) * nr);
/* @@ -617,14 +675,15 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) { memcpy(&bs_dst->entries[nr], &bs_src->entries[0],
sizeof(struct branch_entry) * etmq->last_branch_pos);
}sizeof(struct branch_entry) * tidq->last_branch_pos);
} -static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) +static inline +void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq) {
- etmq->last_branch_pos = 0;
- etmq->last_branch_rb->nr = 0;
- tidq->last_branch_pos = 0;
- tidq->last_branch_rb->nr = 0;
} static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, @@ -677,9 +736,10 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, return packet->start_addr + offset * 4; } -static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) +static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{
- struct branch_stack *bs = etmq->last_branch_rb;
- struct branch_stack *bs = tidq->last_branch_rb; struct branch_entry *be;
/* @@ -688,14 +748,14 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) * buffer down. After writing the first element of the stack, move the * insert position back to the end of the buffer. */
- if (!etmq->last_branch_pos)
etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
- if (!tidq->last_branch_pos)
tidq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
- etmq->last_branch_pos -= 1;
- tidq->last_branch_pos -= 1;
- be = &bs->entries[etmq->last_branch_pos];
- be->from = cs_etm__last_executed_instr(etmq->prev_packet);
- be->to = cs_etm__first_executed_instr(etmq->packet);
- be = &bs->entries[tidq->last_branch_pos];
- be->from = cs_etm__last_executed_instr(tidq->prev_packet);
- be->to = cs_etm__first_executed_instr(tidq->packet); /* No support for mispredict */ be->flags.mispred = 0; be->flags.predicted = 1;
@@ -779,11 +839,12 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, } static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq, u64 addr, u64 period)
{ int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm;
- union perf_event *event = etmq->event_buf;
- union perf_event *event = tidq->event_buf; struct perf_sample sample = {.ip = 0,};
event->sample.header.type = PERF_RECORD_SAMPLE; @@ -796,14 +857,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.id = etmq->etm->instructions_id; sample.stream_id = etmq->etm->instructions_id; sample.period = period;
- sample.cpu = etmq->packet->cpu;
- sample.flags = etmq->prev_packet->flags;
- sample.cpu = tidq->packet->cpu;
- sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
if (etm->synth_opts.last_branch) {
cs_etm__copy_last_branch_rb(etmq);
sample.branch_stack = etmq->last_branch;
cs_etm__copy_last_branch_rb(etmq, tidq);
}sample.branch_stack = tidq->last_branch;
if (etm->synth_opts.inject) { @@ -821,7 +882,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, ret); if (etm->synth_opts.last_branch)
cs_etm__reset_last_branch_rb(etmq);
cs_etm__reset_last_branch_rb(tidq);
return ret; } @@ -830,19 +891,20 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
- The cs etm packet encodes an instruction range between a branch target
- and the next taken branch. Generate sample accordingly.
*/ -static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) +static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int ret = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct perf_sample sample = {.ip = 0,};
- union perf_event *event = etmq->event_buf;
- union perf_event *event = tidq->event_buf; struct dummy_branch_stack { u64 nr; struct branch_entry entries; } dummy_bs; u64 ip;
- ip = cs_etm__last_executed_instr(etmq->prev_packet);
- ip = cs_etm__last_executed_instr(tidq->prev_packet);
event->sample.header.type = PERF_RECORD_SAMPLE; event->sample.header.misc = cs_etm__cpu_mode(etmq, ip); @@ -851,12 +913,12 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq) sample.ip = ip; sample.pid = etmq->pid; sample.tid = etmq->tid;
- sample.addr = cs_etm__first_executed_instr(etmq->packet);
- sample.addr = cs_etm__first_executed_instr(tidq->packet); sample.id = etmq->etm->branches_id; sample.stream_id = etmq->etm->branches_id; sample.period = 1;
- sample.cpu = etmq->packet->cpu;
- sample.flags = etmq->prev_packet->flags;
- sample.cpu = tidq->packet->cpu;
- sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
/* @@ -999,34 +1061,35 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, return 0; } -static int cs_etm__sample(struct cs_etm_queue *etmq) +static int cs_etm__sample(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
- u64 instrs_executed = etmq->packet->instr_count;
- u64 instrs_executed = tidq->packet->instr_count;
- etmq->period_instructions += instrs_executed;
- tidq->period_instructions += instrs_executed;
/* * Record a branch when the last instruction in * PREV_PACKET is a branch. */ if (etm->synth_opts.last_branch &&
etmq->prev_packet &&
etmq->prev_packet->sample_type == CS_ETM_RANGE &&
etmq->prev_packet->last_instr_taken_branch)
cs_etm__update_last_branch_rb(etmq);
tidq->prev_packet &&
tidq->prev_packet->sample_type == CS_ETM_RANGE &&
tidq->prev_packet->last_instr_taken_branch)
cs_etm__update_last_branch_rb(etmq, tidq);
if (etm->sample_instructions &&
etmq->period_instructions >= etm->instructions_sample_period) {
/*tidq->period_instructions >= etm->instructions_sample_period) {
*/
- Emit instruction sample periodically
- TODO: allow period to be defined in cycles and clock time
/* Get number of instructions executed after the sample point */
u64 instrs_over = etmq->period_instructions -
u64 instrs_over = tidq->period_instructions - etm->instructions_sample_period;
/* @@ -1035,31 +1098,31 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);
u64 addr = cs_etm__instr_addr(etmq, tidq->packet, offset);
ret = cs_etm__synth_instruction_sample(
etmq, addr, etm->instructions_sample_period);
if (ret) return ret;etmq, tidq, addr, etm->instructions_sample_period);
/* Carry remaining instructions into next sample period */
etmq->period_instructions = instrs_over;
}tidq->period_instructions = instrs_over;
- if (etm->sample_branches && etmq->prev_packet) {
- if (etm->sample_branches && tidq->prev_packet) { bool generate_sample = false;
/* Generate sample for tracing on packet */
if (etmq->prev_packet->sample_type == CS_ETM_DISCONTINUITY)
if (tidq->prev_packet->sample_type == CS_ETM_DISCONTINUITY) generate_sample = true;
/* Generate sample for branch taken packet */
if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
etmq->prev_packet->last_instr_taken_branch)
if (tidq->prev_packet->sample_type == CS_ETM_RANGE &&
tidq->prev_packet->last_instr_taken_branch) generate_sample = true;
if (generate_sample) {
ret = cs_etm__synth_branch_sample(etmq);
}ret = cs_etm__synth_branch_sample(etmq, tidq); if (ret) return ret;
@@ -1070,15 +1133,15 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */
tmp = etmq->packet;
etmq->packet = etmq->prev_packet;
etmq->prev_packet = tmp;
tmp = tidq->packet;
tidq->packet = tidq->prev_packet;
}tidq->prev_packet = tmp;
return 0; } -static int cs_etm__exception(struct cs_etm_queue *etmq) +static int cs_etm__exception(struct cs_etm_traceid_queue *tidq) { /* * When the exception packet is inserted, whether the last instruction @@ -1091,27 +1154,28 @@ static int cs_etm__exception(struct cs_etm_queue *etmq) * swap PACKET with PREV_PACKET. This keeps PREV_PACKET to be useful * for generating instruction and branch samples. */
- if (etmq->prev_packet->sample_type == CS_ETM_RANGE)
etmq->prev_packet->last_instr_taken_branch = true;
- if (tidq->prev_packet->sample_type == CS_ETM_RANGE)
tidq->prev_packet->last_instr_taken_branch = true;
return 0; } -static int cs_etm__flush(struct cs_etm_queue *etmq) +static int cs_etm__flush(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int err = 0; struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp;
- if (!etmq->prev_packet)
- if (!tidq->prev_packet) return 0;
/* Handle start tracing packet */
- if (etmq->prev_packet->sample_type == CS_ETM_EMPTY)
- if (tidq->prev_packet->sample_type == CS_ETM_EMPTY) goto swap_packet;
if (etmq->etm->synth_opts.last_branch &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
/*tidq->prev_packet->sample_type == CS_ETM_RANGE) {
- Generate a last branch event for the branches left in the
- circular buffer at the end of the trace.
@@ -1119,21 +1183,21 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Use the address of the end of the last reported execution * range */
u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
etmq, addr,
etmq->period_instructions);
etmq, tidq, addr,
if (err) return err;tidq->period_instructions);
etmq->period_instructions = 0;
tidq->period_instructions = 0;
} if (etm->sample_branches &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
err = cs_etm__synth_branch_sample(etmq);
tidq->prev_packet->sample_type == CS_ETM_RANGE) {
if (err) return err; }err = cs_etm__synth_branch_sample(etmq, tidq);
@@ -1144,15 +1208,16 @@ static int cs_etm__flush(struct cs_etm_queue *etmq) * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. */
tmp = etmq->packet;
etmq->packet = etmq->prev_packet;
etmq->prev_packet = tmp;
tmp = tidq->packet;
tidq->packet = tidq->prev_packet;
}tidq->prev_packet = tmp;
return err; } -static int cs_etm__end_block(struct cs_etm_queue *etmq) +static int cs_etm__end_block(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int err; @@ -1166,20 +1231,20 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq) * the trace. */ if (etmq->etm->synth_opts.last_branch &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
/*tidq->prev_packet->sample_type == CS_ETM_RANGE) {
*/
- Use the address of the end of the last reported execution
- range.
u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
etmq, addr,
etmq->period_instructions);
etmq, tidq, addr,
if (err) return err;tidq->period_instructions);
etmq->period_instructions = 0;
}tidq->period_instructions = 0;
return 0; @@ -1278,10 +1343,11 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, return false; } -static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq, u64 magic)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *prev_packet = etmq->prev_packet;
- struct cs_etm_packet *packet = tidq->packet;
- struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SVC) @@ -1302,9 +1368,10 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, u64 magic) return false; } -static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_async_exception(struct cs_etm_traceid_queue *tidq,
u64 magic)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *packet = tidq->packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_DEBUG_HALT || @@ -1327,10 +1394,12 @@ static bool cs_etm__is_async_exception(struct cs_etm_queue *etmq, u64 magic) return false; } -static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) +static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq,
u64 magic)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *prev_packet = etmq->prev_packet;
- struct cs_etm_packet *packet = tidq->packet;
- struct cs_etm_packet *prev_packet = tidq->prev_packet;
if (magic == __perf_cs_etmv3_magic) if (packet->exception_number == CS_ETMV3_EXC_SMC || @@ -1373,10 +1442,11 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, u64 magic) return false; } -static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) +static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{
- struct cs_etm_packet *packet = etmq->packet;
- struct cs_etm_packet *prev_packet = etmq->prev_packet;
- struct cs_etm_packet *packet = tidq->packet;
- struct cs_etm_packet *prev_packet = tidq->prev_packet; u64 magic; int ret;
@@ -1478,7 +1548,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) return ret; /* The exception is for system call. */
if (cs_etm__is_syscall(etmq, magic))
if (cs_etm__is_syscall(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_SYSCALLRET;
@@ -1486,7 +1556,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * The exceptions are triggered by external signals from bus, * interrupt controller, debug module, PE reset or halt. */
else if (cs_etm__is_async_exception(etmq, magic))
else if (cs_etm__is_async_exception(tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
@@ -1495,7 +1565,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq) * Otherwise, exception is caused by trap, instruction & * data fault, or alignment errors. */
else if (cs_etm__is_sync_exception(etmq, magic))
else if (cs_etm__is_sync_exception(etmq, tidq, magic)) packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_INTERRUPT;
@@ -1577,17 +1647,18 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; } -static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) +static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq)
{ int ret; struct cs_etm_packet_queue *packet_queue;
- packet_queue = cs_etm__etmq_get_packet_queue(etmq);
- packet_queue = &tidq->packet_queue;
/* Process each packet in this chunk */ while (1) {
Indention?
That is a rebase gone wrong from a previous patchset. I fixed it now.
ret = cs_etm_decoder__get_packet(packet_queue,
etmq->packet);
tidq->packet); if (ret <= 0) /* * Stop processing this chunk on
@@ -1602,18 +1673,18 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * prior to switch() statement to use address * information before packets swapping. */
ret = cs_etm__set_sample_flags(etmq);
ret = cs_etm__set_sample_flags(etmq, tidq); if (ret < 0) break;
switch (etmq->packet->sample_type) {
switch (tidq->packet->sample_type) { case CS_ETM_RANGE: /* * If the packet contains an instruction * range, generate instruction sequence * events. */
cs_etm__sample(etmq);
cs_etm__sample(etmq, tidq); break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
@@ -1622,14 +1693,14 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) * make sure the previous instruction * range packet to be handled properly. */
cs_etm__exception(etmq);
cs_etm__exception(tidq); break; case CS_ETM_DISCONTINUITY: /* * Discontinuity in trace, flush * previous branch stack */
cs_etm__flush(etmq);
cs_etm__flush(etmq, tidq); break; case CS_ETM_EMPTY: /*
@@ -1649,6 +1720,11 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) static int cs_etm__run_decoder(struct cs_etm_queue *etmq) { int err = 0;
- struct cs_etm_traceid_queue *tidq;
- tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID);
- if (!tidq)
return -EINVAL;
I am confused by this. For CPU wide trace case, since every CPU has its own trace id and the trace id is not the same with CS_ETM_PER_THREAD_TRACEID; if so we create the queue with trace id = CS_ETM_PER_THREAD_TRACEID, will this queue be redundant and cannot be used for any CPU in system?
Furthermore, we can see this 'tidq' will be always used by below decoding loop, should the 'tidq' be dynamically updated based on latest tracing data generated from different CPU?
Function cs_etm__run_decoder() is used only in cs_etm__process_timeless_queues(). For CPU-wide scenarios function cs_etm__process_queues() is called.
/* Go through each buffer in the queue and decode them one by one */ while (1) { @@ -1667,13 +1743,13 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) * an error occurs other than hoping the next one will * be better. */
err = cs_etm__process_decoder_queue(etmq);
err = cs_etm__process_traceid_queue(etmq, tidq);
} while (etmq->buf_len); if (err == 0) /* Flush any remaining branch stack entries */
err = cs_etm__end_block(etmq);
}err = cs_etm__end_block(etmq, tidq);
return err; diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index b6d0e0d0c43f..0dfc79a678f8 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -137,6 +137,16 @@ struct cs_etm_packet { #define CS_ETM_PACKET_MAX_BUFFER 1024 #define CS_ETM_PACKET_QUEUE_NR 1 +/*
- When working with per-thread scenarios the process under trace can
- be scheduled on any CPU and as such, more than one traceID may be
- associated with the same process. Since a traceID of '0' is illegal
- as per the CoreSight architecture, use that specific value to
- identify the queue where all packets (with any traceID) are
- aggregated.
- */
+#define CS_ETM_PER_THREAD_TRACEID 0
struct cs_etm_packet_queue { u32 packet_count; u32 head; @@ -173,7 +183,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); struct cs_etm_packet_queue -*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq); +*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else static inline int cs_etm__process_auxtrace_info(union perf_event *event __maybe_unused, @@ -189,7 +199,8 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, } static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(
struct cs_etm_queue *etmq __maybe_unused)
struct cs_etm_queue *etmq __maybe_unused,
u8 trace_chan_id __maybe_unused)
{ return NULL; } -- 2.17.1
On Tue, Apr 16, 2019 at 04:55:16PM -0600, Mathieu Poirier wrote:
[...]
@@ -1649,6 +1720,11 @@ static int cs_etm__process_decoder_queue(struct cs_etm_queue *etmq) static int cs_etm__run_decoder(struct cs_etm_queue *etmq) { int err = 0;
- struct cs_etm_traceid_queue *tidq;
- tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID);
- if (!tidq)
return -EINVAL;
I am confused by this. For CPU wide trace case, since every CPU has its own trace id and the trace id is not the same with CS_ETM_PER_THREAD_TRACEID; if so we create the queue with trace id = CS_ETM_PER_THREAD_TRACEID, will this queue be redundant and cannot be used for any CPU in system?
Furthermore, we can see this 'tidq' will be always used by below decoding loop, should the 'tidq' be dynamically updated based on latest tracing data generated from different CPU?
Function cs_etm__run_decoder() is used only in cs_etm__process_timeless_queues(). For CPU-wide scenarios function cs_etm__process_queues() is called.
Ah, thanks for the explaination.
/* Go through each buffer in the queue and decode them one by one */ while (1) { @@ -1667,13 +1743,13 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) * an error occurs other than hoping the next one will * be better. */
err = cs_etm__process_decoder_queue(etmq);
err = cs_etm__process_traceid_queue(etmq, tidq);
} while (etmq->buf_len); if (err == 0) /* Flush any remaining branch stack entries */
err = cs_etm__end_block(etmq);
}err = cs_etm__end_block(etmq, tidq);
return err;
On Wed, Mar 06, 2019 at 03:59:33PM -0700, Mathieu Poirier wrote:
[...]
+static struct cs_etm_traceid_queue +*cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) +{
- struct cs_etm_traceid_queue *tidq;
- struct cs_etm_auxtrace *etm = etmq->etm;
- if (!etm->timeless_decoding)
return NULL;
Should be as below:
if (etm->timeless_decoding) return NULL;
- tidq = etmq->traceid_queues;
- if (tidq)
return tidq;
- tidq = calloc(CS_ETM_PACKET_QUEUE_NR, sizeof(*tidq));
- if (!tidq)
return NULL;
- if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id))
goto out_free;
- etmq->traceid_queues = tidq;
- return etmq->traceid_queues;
+out_free:
- free(tidq);
- return NULL;
+}
[...]
On Wed, Apr 03, 2019 at 06:13:17PM +0800, Leo Yan wrote:
On Wed, Mar 06, 2019 at 03:59:33PM -0700, Mathieu Poirier wrote:
[...]
+static struct cs_etm_traceid_queue +*cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) +{
- struct cs_etm_traceid_queue *tidq;
- struct cs_etm_auxtrace *etm = etmq->etm;
- if (!etm->timeless_decoding)
return NULL;
Should be as below:
if (etm->timeless_decoding) return NULL;
You are a shrewd reviewer :o)
At that stage in the patchset only per-thread mode (i.e.timeless decoding) is supported, hence returning an error is something else if found. This is to make sure per-thead scenarios work if a bisect is performed.
Later in 14/18 the code is fixed to accomodate both per-thread and CPU-wide scenarios.
Thanks for taking a look at this, Mathieu
- tidq = etmq->traceid_queues;
- if (tidq)
return tidq;
- tidq = calloc(CS_ETM_PACKET_QUEUE_NR, sizeof(*tidq));
- if (!tidq)
return NULL;
- if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id))
goto out_free;
- etmq->traceid_queues = tidq;
- return etmq->traceid_queues;
+out_free:
- free(tidq);
- return NULL;
+}
[...]
Nowadays the synthesize code is using the packet's cpu information, making cs_etm_queue::cpu useless. As such simply remove it.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 7d862d26c107..7b516d4dad28 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -79,7 +79,6 @@ struct cs_etm_queue { struct auxtrace_buffer *buffer; unsigned int queue_nr; pid_t pid, tid; - int cpu; u64 offset; const unsigned char *buf; size_t buf_len, buf_used; @@ -599,7 +598,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, queue->priv = etmq; etmq->etm = etm; etmq->queue_nr = queue_nr; - etmq->cpu = queue->cpu; etmq->tid = queue->tid; etmq->pid = -1; etmq->offset = 0; @@ -831,11 +829,8 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, etmq->thread = machine__find_thread(etm->machine, -1, etmq->tid);
- if (etmq->thread) { + if (etmq->thread) etmq->pid = etmq->thread->pid_; - if (queue->cpu == -1) - etmq->cpu = etmq->thread->cpu; - } }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
The thread field of structure cs_etm_queue is CPU dependent and as such need to be part of the cs_etm_traceid_queue in order to support CPU-wide trace scenarios.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 7b516d4dad28..0dcbfe4a7932 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -65,6 +65,7 @@ struct cs_etm_traceid_queue { u64 period_instructions; size_t last_branch_pos; union perf_event *event_buf; + struct thread *thread; struct branch_stack *last_branch; struct branch_stack *last_branch_rb; struct cs_etm_packet *prev_packet; @@ -74,7 +75,6 @@ struct cs_etm_traceid_queue {
struct cs_etm_queue { struct cs_etm_auxtrace *etm; - struct thread *thread; struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr; @@ -415,7 +415,7 @@ static void cs_etm__free_queue(void *priv) if (!etmq) return;
- thread__zput(etmq->thread); + thread__zput(etmq->traceid_queues->thread); cs_etm_decoder__free(etmq->decoder); zfree(&etmq->traceid_queues->event_buf); zfree(&etmq->traceid_queues->last_branch); @@ -503,7 +503,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, machine = etmq->etm->machine; cpumode = cs_etm__cpu_mode(etmq, address);
- thread = etmq->thread; + thread = etmq->traceid_queues->thread; if (!thread) { if (cpumode != PERF_RECORD_MISC_KERNEL) return 0; @@ -819,18 +819,21 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, struct auxtrace_queue *queue) { + struct cs_etm_traceid_queue *tidq; struct cs_etm_queue *etmq = queue->priv;
+ tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID); + /* CPU-wide tracing isn't supported yet */ if (queue->tid == -1) return;
- if ((!etmq->thread) && (etmq->tid != -1)) - etmq->thread = machine__find_thread(etm->machine, -1, + if ((!tidq->thread) && (etmq->tid != -1)) + tidq->thread = machine__find_thread(etm->machine, -1, etmq->tid);
- if (etmq->thread) - etmq->pid = etmq->thread->pid_; + if (tidq->thread) + etmq->pid = tidq->thread->pid_; }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
The tid/pid fields of structure cs_etm_queue are CPU dependent and as such need to be part of the cs_etm_traceid_queue in order to support CPU-wide trace scenarios.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 18 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 0dcbfe4a7932..df128cba4f2a 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -62,6 +62,7 @@ struct cs_etm_auxtrace {
struct cs_etm_traceid_queue { u8 trace_chan_id; + pid_t pid, tid; u64 period_instructions; size_t last_branch_pos; union perf_event *event_buf; @@ -78,7 +79,6 @@ struct cs_etm_queue { struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr; - pid_t pid, tid; u64 offset; const unsigned char *buf; size_t buf_len, buf_used; @@ -159,10 +159,14 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) { int rc = -ENOMEM; + struct auxtrace_queue *queue; struct cs_etm_auxtrace *etm = etmq->etm;
cs_etm__clear_packet_queue(&tidq->packet_queue);
+ queue = &etmq->etm->queues.queue_array[etmq->queue_nr]; + tidq->tid = queue->tid; + tidq->pid = -1; tidq->trace_chan_id = trace_chan_id;
tidq->packet = zalloc(sizeof(struct cs_etm_packet)); @@ -598,8 +602,6 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, queue->priv = etmq; etmq->etm = etm; etmq->queue_nr = queue_nr; - etmq->tid = queue->tid; - etmq->pid = -1; etmq->offset = 0;
out: @@ -817,23 +819,19 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) }
static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, - struct auxtrace_queue *queue) + struct auxtrace_queue *queue, + struct cs_etm_traceid_queue *tidq) { - struct cs_etm_traceid_queue *tidq; - struct cs_etm_queue *etmq = queue->priv; - - tidq = cs_etm__etmq_get_traceid_queue(etmq, CS_ETM_PER_THREAD_TRACEID); - /* CPU-wide tracing isn't supported yet */ if (queue->tid == -1) return;
- if ((!tidq->thread) && (etmq->tid != -1)) + if ((!tidq->thread) && (tidq->tid != -1)) tidq->thread = machine__find_thread(etm->machine, -1, - etmq->tid); + tidq->tid);
if (tidq->thread) - etmq->pid = tidq->thread->pid_; + tidq->pid = tidq->thread->pid_; }
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, @@ -850,8 +848,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, event->sample.header.size = sizeof(struct perf_event_header);
sample.ip = addr; - sample.pid = etmq->pid; - sample.tid = etmq->tid; + sample.pid = tidq->pid; + sample.tid = tidq->tid; sample.id = etmq->etm->instructions_id; sample.stream_id = etmq->etm->instructions_id; sample.period = period; @@ -909,8 +907,8 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, event->sample.header.size = sizeof(struct perf_event_header);
sample.ip = ip; - sample.pid = etmq->pid; - sample.tid = etmq->tid; + sample.pid = tidq->pid; + sample.tid = tidq->tid; sample.addr = cs_etm__first_executed_instr(tidq->packet); sample.id = etmq->etm->branches_id; sample.stream_id = etmq->etm->branches_id; @@ -1762,9 +1760,19 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, for (i = 0; i < queues->nr_queues; i++) { struct auxtrace_queue *queue = &etm->queues.queue_array[i]; struct cs_etm_queue *etmq = queue->priv; + struct cs_etm_traceid_queue *tidq; + + if (!etmq) + continue; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, + CS_ETM_PER_THREAD_TRACEID); + + if (!tidq) + continue;
- if (etmq && ((tid == -1) || (etmq->tid == tid))) { - cs_etm__set_pid_tid_cpu(etm, queue); + if ((tid == -1) || (tidq->tid == tid)) { + cs_etm__set_pid_tid_cpu(etm, queue, tidq); cs_etm__run_decoder(etmq); } }
In order to support CPU wide trace scenarios we need to use the memory callback API that provides a traceID, something that is available starting with version 0.11.0 of the openCSD library.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/build/feature/test-libopencsd.c | 4 ++-- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/build/feature/test-libopencsd.c b/tools/build/feature/test-libopencsd.c index d68eb4fb40cc..2b0e02c38870 100644 --- a/tools/build/feature/test-libopencsd.c +++ b/tools/build/feature/test-libopencsd.c @@ -4,9 +4,9 @@ /* * Check OpenCSD library version is sufficient to provide required features */ -#define OCSD_MIN_VER ((0 << 16) | (10 << 8) | (0)) +#define OCSD_MIN_VER ((0 << 16) | (11 << 8) | (0)) #if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER) -#error "OpenCSD >= 0.10.0 is required" +#error "OpenCSD >= 0.11.0 is required" #endif
int main(void) diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 275c72264820..4303d2d00d31 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -356,6 +356,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, break; case OCSD_INSTR_ISB: case OCSD_INSTR_DSB_DMB: + case OCSD_INSTR_WFI_WFE: case OCSD_INSTR_OTHER: default: packet->last_instr_taken_branch = false;
When working with CPU-wide traces different traceID may be found in the same stream. As such we need to use the decoder callback that provides the traceID in order to know the thread context being decoded.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 14 +++---- .../perf/util/cs-etm-decoder/cs-etm-decoder.h | 3 +- tools/perf/util/cs-etm.c | 41 +++++++++++++------ 3 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 4303d2d00d31..87264b79de0e 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -41,15 +41,14 @@ static u32 cs_etm_decoder__mem_access(const void *context, const ocsd_vaddr_t address, const ocsd_mem_space_acc_t mem_space __maybe_unused, + const u8 trace_chan_id, const u32 req_size, u8 *buffer) { struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
- return decoder->mem_access(decoder->data, - address, - req_size, - buffer); + return decoder->mem_access(decoder->data, trace_chan_id, + address, req_size, buffer); }
int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, @@ -58,9 +57,10 @@ int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder, { decoder->mem_access = cb_func;
- if (ocsd_dt_add_callback_mem_acc(decoder->dcd_tree, start, end, - OCSD_MEM_SPACE_ANY, - cs_etm_decoder__mem_access, decoder)) + if (ocsd_dt_add_callback_trcid_mem_acc(decoder->dcd_tree, start, end, + OCSD_MEM_SPACE_ANY, + cs_etm_decoder__mem_access, + decoder)) return -1;
return 0; diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 6ae7ab4cf5fe..11f3391d06f2 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -19,8 +19,7 @@ struct cs_etm_packet_queue;
struct cs_etm_queue;
-typedef u32 (*cs_etm_mem_cb_type)(struct cs_etm_queue *, u64, - size_t, u8 *); +typedef u32 (*cs_etm_mem_cb_type)(struct cs_etm_queue *, u8, u64, size_t, u8 *);
struct cs_etmv3_trace_params { u32 reg_ctrl; diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index df128cba4f2a..70c43ac416c3 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -491,8 +491,8 @@ static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address) } }
-static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, - size_t size, u8 *buffer) +static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id, + u64 address, size_t size, u8 *buffer) { u8 cpumode; u64 offset; @@ -501,6 +501,8 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u64 address, struct machine *machine; struct addr_location al;
+ (void)trace_chan_id; + if (!etmq) return 0;
@@ -687,10 +689,12 @@ void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq) }
static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, - u64 addr) { + u8 trace_chan_id, u64 addr) +{ u8 instrBytes[2];
- cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes); + cs_etm__mem_access(etmq, trace_chan_id, addr, + ARRAY_SIZE(instrBytes), instrBytes); /* * T32 instruction size is indicated by bits[15:11] of the first * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111 @@ -719,6 +723,7 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) }
static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, + u64 trace_chan_id, const struct cs_etm_packet *packet, u64 offset) { @@ -726,7 +731,8 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, u64 addr = packet->start_addr;
while (offset > 0) { - addr += cs_etm__t32_instr_size(etmq, addr); + addr += cs_etm__t32_instr_size(etmq, + trace_chan_id, addr); offset--; } return addr; @@ -1063,6 +1069,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret; + u8 trace_chan_id = tidq->trace_chan_id; u64 instrs_executed = tidq->packet->instr_count;
tidq->period_instructions += instrs_executed; @@ -1094,7 +1101,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq, tidq->packet, offset); + u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, + tidq->packet, offset);
ret = cs_etm__synth_instruction_sample( etmq, tidq, addr, etm->instructions_sample_period); @@ -1272,7 +1280,7 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq) return etmq->buf_len; }
-static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, +static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, struct cs_etm_packet *packet, u64 end_addr) { @@ -1295,7 +1303,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, * so below only read 2 bytes as instruction size for T32. */ addr = end_addr - 2; - cs_etm__mem_access(etmq, addr, sizeof(instr16), (u8 *)&instr16); + cs_etm__mem_access(etmq, trace_chan_id, addr, + sizeof(instr16), (u8 *)&instr16); if ((instr16 & 0xFF00) == 0xDF00) return true;
@@ -1310,7 +1319,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, * +---------+---------+-------------------------+ */ addr = end_addr - 4; - cs_etm__mem_access(etmq, addr, sizeof(instr32), (u8 *)&instr32); + cs_etm__mem_access(etmq, trace_chan_id, addr, + sizeof(instr32), (u8 *)&instr32); if ((instr32 & 0x0F000000) == 0x0F000000 && (instr32 & 0xF0000000) != 0xF0000000) return true; @@ -1326,7 +1336,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, * +-----------------------+---------+-----------+ */ addr = end_addr - 4; - cs_etm__mem_access(etmq, addr, sizeof(instr32), (u8 *)&instr32); + cs_etm__mem_access(etmq, trace_chan_id, addr, + sizeof(instr32), (u8 *)&instr32); if ((instr32 & 0xFFE0001F) == 0xd4000001) return true;
@@ -1342,6 +1353,7 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 magic) { + u8 trace_chan_id = tidq->trace_chan_id; struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -1356,7 +1368,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, */ if (magic == __perf_cs_etmv4_magic) { if (packet->exception_number == CS_ETMV4_EXC_CALL && - cs_etm__is_svc_instr(etmq, prev_packet, + cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet, prev_packet->end_addr)) return true; } @@ -1394,6 +1406,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 magic) { + u8 trace_chan_id = tidq->trace_chan_id; struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -1419,7 +1432,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, * (SMC, HVC) are taken as sync exceptions. */ if (packet->exception_number == CS_ETMV4_EXC_CALL && - !cs_etm__is_svc_instr(etmq, prev_packet, + !cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet, prev_packet->end_addr)) return true;
@@ -1443,6 +1456,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, { struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet; + u8 trace_chan_id = tidq->trace_chan_id; u64 magic; int ret;
@@ -1523,7 +1537,8 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, if (prev_packet->flags == (PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_INTERRUPT) && - cs_etm__is_svc_instr(etmq, packet, packet->start_addr)) + cs_etm__is_svc_instr(etmq, trace_chan_id, + packet, packet->start_addr)) prev_packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_SYSCALLRET;
When operating in CPU-wide trace mode with a source/sink topology of N:1 packets with multiple traceID will end up in the same cs_etm_queue. In order to properly decode packets they need to be split in different queues, i.e one queue per traceID.
As such add support for multiple traceID per cs_etm_queue by adding a new cs_etm_traceid_queue every time a new traceID is discovered in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 130 +++++++++++++++++++++++++++++++-------- 1 file changed, 106 insertions(+), 24 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 70c43ac416c3..f696b8dd6e0a 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -82,7 +82,9 @@ struct cs_etm_queue { u64 offset; const unsigned char *buf; size_t buf_len, buf_used; - struct cs_etm_traceid_queue *traceid_queues; + /* Conversion between traceID and index in traceid_queues array */ + struct intlist *traceid_queues_list; + struct cs_etm_traceid_queue **traceid_queues; };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); @@ -210,29 +212,69 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, static struct cs_etm_traceid_queue *cs_etm__etmq_get_traceid_queue(struct cs_etm_queue *etmq, u8 trace_chan_id) { - struct cs_etm_traceid_queue *tidq; + int idx; + struct int_node *inode; + struct intlist *traceid_queues_list; + struct cs_etm_traceid_queue *tidq, **traceid_queues; struct cs_etm_auxtrace *etm = etmq->etm;
- if (!etm->timeless_decoding) - return NULL; + if (etm->timeless_decoding) + trace_chan_id = CS_ETM_PER_THREAD_TRACEID;
- tidq = etmq->traceid_queues; + traceid_queues_list = etmq->traceid_queues_list;
- if (tidq) - return tidq; + /* + * Check if the traceid_queue exist for this traceID by looking + * in the queue list. + */ + inode = intlist__find(traceid_queues_list, trace_chan_id); + if (inode) { + idx = (int)(intptr_t)inode->priv; + return etmq->traceid_queues[idx]; + }
+ /* We couldn't find a traceid_queue for this traceID, allocate one */ tidq = calloc(CS_ETM_PACKET_QUEUE_NR, sizeof(*tidq)); if (!tidq) return NULL;
+ /* Get a valid index for the new traceid_queue */ + idx = intlist__nr_entries(traceid_queues_list); + /* Memory for the inode is free'ed in cs_etm_free_traceid_queues () */ + inode = intlist__findnew(traceid_queues_list, trace_chan_id); + if (!inode) + goto out_free; + + /* Associate this traceID with this index */ + inode->priv = (void *)(intptr_t)idx; + if (cs_etm__init_traceid_queue(etmq, tidq, trace_chan_id)) goto out_free;
- etmq->traceid_queues = tidq; + /* Grow the traceid_queues array by one unit */ + traceid_queues = etmq->traceid_queues; + traceid_queues = reallocarray(traceid_queues, + idx + CS_ETM_PACKET_QUEUE_NR, + sizeof(*traceid_queues)); + + /* + * On failure reallocarray() returns NULL and the original block of + * memory is left untouched. + */ + if (!traceid_queues) + goto out_free; + + traceid_queues[idx] = tidq; + etmq->traceid_queues = traceid_queues;
- return etmq->traceid_queues; + return etmq->traceid_queues[idx];
out_free: + /* + * Function intlist__remove() removes the inode from the list + * and delete the memory associated to it. + */ + intlist__remove(traceid_queues_list, inode); free(tidq);
return NULL; @@ -412,6 +454,44 @@ static int cs_etm__flush_events(struct perf_session *session, return cs_etm__process_timeless_queues(etm, -1); }
+static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq) +{ + int idx; + uintptr_t priv; + struct int_node *inode, *tmp; + struct cs_etm_traceid_queue *tidq; + struct intlist *traceid_queues_list = etmq->traceid_queues_list; + + intlist__for_each_entry_safe(inode, tmp, traceid_queues_list) { + priv = (uintptr_t)inode->priv; + idx = priv; + + /* Free this traceid_queue from the array */ + tidq = etmq->traceid_queues[idx]; + thread__zput(tidq->thread); + zfree(&tidq->event_buf); + zfree(&tidq->last_branch); + zfree(&tidq->last_branch_rb); + zfree(&tidq->prev_packet); + zfree(&tidq->packet); + zfree(&tidq); + + /* + * Function intlist__remove() removes the inode from the list + * and delete the memory associated to it. + */ + intlist__remove(traceid_queues_list, inode); + } + + /* Then the RB tree itself */ + intlist__delete(traceid_queues_list); + etmq->traceid_queues_list = NULL; + + /* finally free the traceid_queues array */ + free(etmq->traceid_queues); + etmq->traceid_queues = NULL; +} + static void cs_etm__free_queue(void *priv) { struct cs_etm_queue *etmq = priv; @@ -419,14 +499,8 @@ static void cs_etm__free_queue(void *priv) if (!etmq) return;
- thread__zput(etmq->traceid_queues->thread); cs_etm_decoder__free(etmq->decoder); - zfree(&etmq->traceid_queues->event_buf); - zfree(&etmq->traceid_queues->last_branch); - zfree(&etmq->traceid_queues->last_branch_rb); - zfree(&etmq->traceid_queues->prev_packet); - zfree(&etmq->traceid_queues->packet); - zfree(&etmq->traceid_queues); + cs_etm__free_traceid_queues(etmq); free(etmq); }
@@ -497,19 +571,21 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id, u8 cpumode; u64 offset; int len; - struct thread *thread; - struct machine *machine; - struct addr_location al; - - (void)trace_chan_id; + struct thread *thread; + struct machine *machine; + struct addr_location al; + struct cs_etm_traceid_queue *tidq;
if (!etmq) return 0;
machine = etmq->etm->machine; cpumode = cs_etm__cpu_mode(etmq, address); + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (!tidq) + return 0;
- thread = etmq->traceid_queues->thread; + thread = tidq->thread; if (!thread) { if (cpumode != PERF_RECORD_MISC_KERNEL) return 0; @@ -545,6 +621,10 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) if (!etmq) return NULL;
+ etmq->traceid_queues_list = intlist__new(NULL); + if (!etmq->traceid_queues_list) + goto out_free; + /* Use metadata to fill in trace parameters for trace decoder */ t_params = zalloc(sizeof(*t_params) * etm->num_cpu);
@@ -579,6 +659,7 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) out_free_decoder: cs_etm_decoder__free(etmq->decoder); out_free: + intlist__delete(etmq->traceid_queues_list); free(etmq);
return NULL; @@ -1284,8 +1365,9 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, struct cs_etm_packet *packet, u64 end_addr) { - u16 instr16; - u32 instr32; + /* Initialise to keep compiler happy */ + u16 instr16 = 0; + u32 instr32 = 0; u64 addr;
switch (packet->isa) {
On Wed, Mar 06, 2019 at 03:59:39PM -0700, Mathieu Poirier wrote:
When operating in CPU-wide trace mode with a source/sink topology of N:1 packets with multiple traceID will end up in the same cs_etm_queue. In order to properly decode packets they need to be split in different queues, i.e one queue per traceID.
As such add support for multiple traceID per cs_etm_queue by adding a new cs_etm_traceid_queue every time a new traceID is discovered in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
[...]
@@ -1284,8 +1365,9 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, struct cs_etm_packet *packet, u64 end_addr) {
- u16 instr16;
- u32 instr32;
- /* Initialise to keep compiler happy */
- u16 instr16 = 0;
- u32 instr32 = 0;
Use one saperate patch for this?
Except this, other changes in this patch looks good to me.
u64 addr; switch (packet->isa) { -- 2.17.1
On Wed, Apr 03, 2019 at 06:25:34PM +0800, Leo Yan wrote:
On Wed, Mar 06, 2019 at 03:59:39PM -0700, Mathieu Poirier wrote:
When operating in CPU-wide trace mode with a source/sink topology of N:1 packets with multiple traceID will end up in the same cs_etm_queue. In order to properly decode packets they need to be split in different queues, i.e one queue per traceID.
As such add support for multiple traceID per cs_etm_queue by adding a new cs_etm_traceid_queue every time a new traceID is discovered in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
[...]
@@ -1284,8 +1365,9 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, struct cs_etm_packet *packet, u64 end_addr) {
- u16 instr16;
- u32 instr32;
- /* Initialise to keep compiler happy */
- u16 instr16 = 0;
- u32 instr32 = 0;
Use one saperate patch for this?
I thought about it but:
1) The patch would have consisted on only those changes. 2) The code was fine and the compiler not complaining until I touched function cs_etm__mem_access(), hence putting all the changes in the same file. Let me know if you really think we need two patches.
Mathieu
Except this, other changes in this patch looks good to me.
u64 addr; switch (packet->isa) { -- 2.17.1
On Tue, Apr 16, 2019 at 05:20:19PM -0600, Mathieu Poirier wrote:
On Wed, Apr 03, 2019 at 06:25:34PM +0800, Leo Yan wrote:
On Wed, Mar 06, 2019 at 03:59:39PM -0700, Mathieu Poirier wrote:
When operating in CPU-wide trace mode with a source/sink topology of N:1 packets with multiple traceID will end up in the same cs_etm_queue. In order to properly decode packets they need to be split in different queues, i.e one queue per traceID.
As such add support for multiple traceID per cs_etm_queue by adding a new cs_etm_traceid_queue every time a new traceID is discovered in the trace stream.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
[...]
@@ -1284,8 +1365,9 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, struct cs_etm_packet *packet, u64 end_addr) {
- u16 instr16;
- u32 instr32;
- /* Initialise to keep compiler happy */
- u16 instr16 = 0;
- u32 instr32 = 0;
Use one saperate patch for this?
I thought about it but:
- The patch would have consisted on only those changes.
- The code was fine and the compiler not complaining until I touched function
cs_etm__mem_access(), hence putting all the changes in the same file. Let me know if you really think we need two patches.
Agree, this will not introduce any difficulty for code maintenance. No need to split to two patches if so.
Thanks, Leo Yan
Link contextID packets received from the decoder with the perf tool thread mechanic so that we know the specifics of the process currently executing.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 20 ++++++++++++ tools/perf/util/cs-etm.c | 32 +++++++++++++++---- tools/perf/util/cs-etm.h | 10 ++++++ 3 files changed, 56 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 87264b79de0e..ce85e52f989c 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -402,6 +402,24 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue, CS_ETM_EXCEPTION_RET); }
+static ocsd_datapath_resp_t +cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, + const ocsd_generic_trace_elem *elem, + const uint8_t trace_chan_id) +{ + pid_t tid; + + /* Ignore PE_CONTEXT packets that don't have a valid contextID */ + if (!elem->context.ctxt_id_valid) + return OCSD_RESP_CONT; + + tid = elem->context.context_id; + if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id)) + return OCSD_RESP_FATAL_SYS_ERR; + + return OCSD_RESP_CONT; +} + static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( const void *context, const ocsd_trc_index_t indx __maybe_unused, @@ -440,6 +458,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT: + resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id); + break; case OCSD_GEN_TRC_ELEM_ADDR_NACC: case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f696b8dd6e0a..e97124f218fd 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -906,13 +906,8 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) }
static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, - struct auxtrace_queue *queue, struct cs_etm_traceid_queue *tidq) { - /* CPU-wide tracing isn't supported yet */ - if (queue->tid == -1) - return; - if ((!tidq->thread) && (tidq->tid != -1)) tidq->thread = machine__find_thread(etm->machine, -1, tidq->tid); @@ -921,6 +916,31 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, tidq->pid = tidq->thread->pid_; }
+int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, + pid_t tid, u8 trace_chan_id) +{ + int cpu, err = -EINVAL; + struct cs_etm_auxtrace *etm = etmq->etm; + struct cs_etm_traceid_queue *tidq; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (!tidq) + return err; + + if (cs_etm__get_cpu(trace_chan_id, &cpu) < 0) + return err; + + err = machine__set_current_tid(etm->machine, cpu, tid, tid); + if (err) + return err; + + tidq->tid = tid; + thread__zput(tidq->thread); + + cs_etm__set_pid_tid_cpu(etm, tidq); + return 0; +} + static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) @@ -1869,7 +1889,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, continue;
if ((tid == -1) || (tidq->tid == tid)) { - cs_etm__set_pid_tid_cpu(etm, queue, tidq); + cs_etm__set_pid_tid_cpu(etm, tidq); cs_etm__run_decoder(etmq); } } diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0dfc79a678f8..5f32b271a50e 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -182,6 +182,8 @@ struct cs_etm_packet_queue { int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); +int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, + pid_t tid, u8 trace_chan_id); struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -198,6 +200,14 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, return -1; }
+static inline int cs_etm__etmq_set_tid( + struct cs_etm_queue *etmq __maybe_unused, + pid_t tid __maybe_unused, + u8 trace_chan_id __maybe_unused) +{ + return -1; +} + static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused)
On Wed, Mar 06, 2019 at 03:59:40PM -0700, Mathieu Poirier wrote:
Link contextID packets received from the decoder with the perf tool thread mechanic so that we know the specifics of the process currently executing.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 20 ++++++++++++ tools/perf/util/cs-etm.c | 32 +++++++++++++++---- tools/perf/util/cs-etm.h | 10 ++++++ 3 files changed, 56 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 87264b79de0e..ce85e52f989c 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -402,6 +402,24 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue, CS_ETM_EXCEPTION_RET); } +static ocsd_datapath_resp_t +cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
const ocsd_generic_trace_elem *elem,
const uint8_t trace_chan_id)
+{
- pid_t tid;
- /* Ignore PE_CONTEXT packets that don't have a valid contextID */
- if (!elem->context.ctxt_id_valid)
return OCSD_RESP_CONT;
Just curious how to handle the case for trace discontinuity? If there have trace discontinuity, should we will receive a PE_CONTEXT packet at the beginning for the new trace block? Otherwise, it might be use the 'stale' context id before the discontinuity.
- tid = elem->context.context_id;
- if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id))
return OCSD_RESP_FATAL_SYS_ERR;
- return OCSD_RESP_CONT;
+}
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( const void *context, const ocsd_trc_index_t indx __maybe_unused, @@ -440,6 +458,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id);
case OCSD_GEN_TRC_ELEM_ADDR_NACC: case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT:break;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f696b8dd6e0a..e97124f218fd 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -906,13 +906,8 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) } static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
struct auxtrace_queue *queue, struct cs_etm_traceid_queue *tidq)
{
- /* CPU-wide tracing isn't supported yet */
- if (queue->tid == -1)
return;
- if ((!tidq->thread) && (tidq->tid != -1)) tidq->thread = machine__find_thread(etm->machine, -1, tidq->tid);
@@ -921,6 +916,31 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, tidq->pid = tidq->thread->pid_; } +int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq,
pid_t tid, u8 trace_chan_id)
+{
- int cpu, err = -EINVAL;
- struct cs_etm_auxtrace *etm = etmq->etm;
- struct cs_etm_traceid_queue *tidq;
- tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
- if (!tidq)
return err;
- if (cs_etm__get_cpu(trace_chan_id, &cpu) < 0)
return err;
- err = machine__set_current_tid(etm->machine, cpu, tid, tid);
- if (err)
return err;
- tidq->tid = tid;
- thread__zput(tidq->thread);
- cs_etm__set_pid_tid_cpu(etm, tidq);
- return 0;
+}
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) @@ -1869,7 +1889,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, continue; if ((tid == -1) || (tidq->tid == tid)) {
cs_etm__set_pid_tid_cpu(etm, queue, tidq);
} }cs_etm__set_pid_tid_cpu(etm, tidq); cs_etm__run_decoder(etmq);
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0dfc79a678f8..5f32b271a50e 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -182,6 +182,8 @@ struct cs_etm_packet_queue { int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); +int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq,
pid_t tid, u8 trace_chan_id);
struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -198,6 +200,14 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, return -1; } +static inline int cs_etm__etmq_set_tid(
struct cs_etm_queue *etmq __maybe_unused,
pid_t tid __maybe_unused,
u8 trace_chan_id __maybe_unused)
+{
- return -1;
+}
static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused) -- 2.17.1
Hi
On Wed, 3 Apr 2019 at 11:31, Leo Yan leo.yan@linaro.org wrote:
On Wed, Mar 06, 2019 at 03:59:40PM -0700, Mathieu Poirier wrote:
Link contextID packets received from the decoder with the perf tool thread mechanic so that we know the specifics of the process currently executing.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 20 ++++++++++++ tools/perf/util/cs-etm.c | 32 +++++++++++++++---- tools/perf/util/cs-etm.h | 10 ++++++ 3 files changed, 56 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 87264b79de0e..ce85e52f989c 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -402,6 +402,24 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue, CS_ETM_EXCEPTION_RET); }
+static ocsd_datapath_resp_t +cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
const ocsd_generic_trace_elem *elem,
const uint8_t trace_chan_id)
+{
pid_t tid;
/* Ignore PE_CONTEXT packets that don't have a valid contextID */
if (!elem->context.ctxt_id_valid)
return OCSD_RESP_CONT;
Just curious how to handle the case for trace discontinuity? If there have trace discontinuity, should we will receive a PE_CONTEXT packet at the beginning for the new trace block? Otherwise, it might be use the 'stale' context id before the discontinuity.
The ETM spec requires a context packet output after a discontinuity so this should be safe.
Mike
tid = elem->context.context_id;
if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id))
return OCSD_RESP_FATAL_SYS_ERR;
return OCSD_RESP_CONT;
+}
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( const void *context, const ocsd_trc_index_t indx __maybe_unused, @@ -440,6 +458,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id);
break; case OCSD_GEN_TRC_ELEM_ADDR_NACC: case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT:
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f696b8dd6e0a..e97124f218fd 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -906,13 +906,8 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) }
static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
struct auxtrace_queue *queue, struct cs_etm_traceid_queue *tidq)
{
/* CPU-wide tracing isn't supported yet */
if (queue->tid == -1)
return;
if ((!tidq->thread) && (tidq->tid != -1)) tidq->thread = machine__find_thread(etm->machine, -1, tidq->tid);
@@ -921,6 +916,31 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm, tidq->pid = tidq->thread->pid_; }
+int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq,
pid_t tid, u8 trace_chan_id)
+{
int cpu, err = -EINVAL;
struct cs_etm_auxtrace *etm = etmq->etm;
struct cs_etm_traceid_queue *tidq;
tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
if (!tidq)
return err;
if (cs_etm__get_cpu(trace_chan_id, &cpu) < 0)
return err;
err = machine__set_current_tid(etm->machine, cpu, tid, tid);
if (err)
return err;
tidq->tid = tid;
thread__zput(tidq->thread);
cs_etm__set_pid_tid_cpu(etm, tidq);
return 0;
+}
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) @@ -1869,7 +1889,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, continue;
if ((tid == -1) || (tidq->tid == tid)) {
cs_etm__set_pid_tid_cpu(etm, queue, tidq);
cs_etm__set_pid_tid_cpu(etm, tidq); cs_etm__run_decoder(etmq); } }
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 0dfc79a678f8..5f32b271a50e 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -182,6 +182,8 @@ struct cs_etm_packet_queue { int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session); int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); +int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq,
pid_t tid, u8 trace_chan_id);
struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -198,6 +200,14 @@ static inline int cs_etm__get_cpu(u8 trace_chan_id __maybe_unused, return -1; }
+static inline int cs_etm__etmq_set_tid(
struct cs_etm_queue *etmq __maybe_unused,
pid_t tid __maybe_unused,
u8 trace_chan_id __maybe_unused)
+{
return -1;
+}
static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused) -- 2.17.1
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
This patch deals with timestamp packets received from the decoding library in order to give the front end packet processing loop a handle on the time instruction conveyed by range packets have been executed at.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 112 +++++++++++++++++- tools/perf/util/cs-etm.c | 19 +++ tools/perf/util/cs-etm.h | 17 +++ 3 files changed, 144 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index ce85e52f989c..33e975c8d11b 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -269,6 +269,76 @@ cs_etm_decoder__create_etm_packet_printer(struct cs_etm_trace_params *t_params, trace_config); }
+static ocsd_datapath_resp_t +cs_etm_decoder__do_soft_timestamp(struct cs_etm_queue *etmq, + struct cs_etm_packet_queue *packet_queue, + const uint8_t trace_chan_id) +{ + /* No timestamp packet has been received, nothing to do */ + if (!packet_queue->timestamp) + return OCSD_RESP_CONT; + + packet_queue->timestamp = packet_queue->next_timestamp; + + /* Estimate the timestamp for the next range packet */ + packet_queue->next_timestamp += packet_queue->instr_count; + packet_queue->instr_count = 0; + + /* Tell the front end which traceid_queue needs attention */ + cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id); + + return OCSD_RESP_WAIT; +} + +static ocsd_datapath_resp_t +cs_etm_decoder__do_hard_timestamp(struct cs_etm_queue *etmq, + const ocsd_generic_trace_elem *elem, + const uint8_t trace_chan_id) +{ + struct cs_etm_packet_queue *packet_queue; + + /* First get the packet queue for this traceID */ + packet_queue = cs_etm__etmq_get_packet_queue(etmq, trace_chan_id); + if (!packet_queue) + return OCSD_RESP_FATAL_SYS_ERR; + + /* + * We've seen a timestamp packet before - simply record the new value. + * Function do_soft_timestamp() will report the value to the front end, + * hence asking the decoder to keep decoding rather than stopping. + */ + if (packet_queue->timestamp) { + packet_queue->next_timestamp = elem->timestamp; + return OCSD_RESP_CONT; + } + + /* + * This is the first timestamp we've seen since the beginning of traces + * or a discontinuity. Since timestamps packets are generated *after* + * range packets have been generated, we need to estimate the time at + * which instructions started by substracting the number of instructions + * executed to the timestamp. + */ + packet_queue->timestamp = elem->timestamp - + packet_queue->instr_count; + packet_queue->next_timestamp = elem->timestamp; + packet_queue->instr_count = 0; + + /* Tell the front end which traceid_queue needs attention */ + cs_etm__etmq_set_traceid_queue_timestamp(etmq, trace_chan_id); + + /* Halt processing until we are being told to proceed */ + return OCSD_RESP_WAIT; +} + +static void +cs_etm_decoder__reset_timestamp(struct cs_etm_packet_queue *packet_queue) +{ + packet_queue->timestamp = 0; + packet_queue->next_timestamp = 0; + packet_queue->instr_count = 0; +} + static ocsd_datapath_resp_t cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, const u8 trace_chan_id, @@ -310,7 +380,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, }
static ocsd_datapath_resp_t -cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue, +cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq, + struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { @@ -365,6 +436,23 @@ cs_etm_decoder__buffer_range(struct cs_etm_packet_queue *packet_queue,
packet->last_instr_size = elem->last_instr_sz;
+ /* per-thread scenario, no need to generate a timestamp */ + if (cs_etm__etmq_is_timeless(etmq)) + goto out; + + /* + * The packet queue is full and we haven't seen a timestamp (had we + * seen one the packet queue wouldn't be full). Let the front end + * deal with it. + */ + if (ret == OCSD_RESP_WAIT) + goto out; + + packet_queue->instr_count += elem->num_instr_range; + /* Tell the front end we have a new timestamp to process */ + ret = cs_etm_decoder__do_soft_timestamp(etmq, packet_queue, + trace_chan_id); +out: return ret; }
@@ -372,6 +460,11 @@ static ocsd_datapath_resp_t cs_etm_decoder__buffer_discontinuity(struct cs_etm_packet_queue *queue, const uint8_t trace_chan_id) { + /* + * Something happened and who knows when we'll get new traces so + * reset time statistics. + */ + cs_etm_decoder__reset_timestamp(queue); return cs_etm_decoder__buffer_packet(queue, trace_chan_id, CS_ETM_DISCONTINUITY); } @@ -404,6 +497,7 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue,
static ocsd_datapath_resp_t cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, + struct cs_etm_packet_queue *packet_queue, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { @@ -417,6 +511,12 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id)) return OCSD_RESP_FATAL_SYS_ERR;
+ /* + * A timestamp is generated after a PE_CONTEXT element so make sure + * to rely on that coming one. + */ + cs_etm_decoder__reset_timestamp(packet_queue); + return OCSD_RESP_CONT; }
@@ -446,7 +546,7 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( trace_chan_id); break; case OCSD_GEN_TRC_ELEM_INSTR_RANGE: - resp = cs_etm_decoder__buffer_range(packet_queue, elem, + resp = cs_etm_decoder__buffer_range(etmq, packet_queue, elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_EXCEPTION: @@ -457,11 +557,15 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( resp = cs_etm_decoder__buffer_exception_ret(packet_queue, trace_chan_id); break; + case OCSD_GEN_TRC_ELEM_TIMESTAMP: + resp = cs_etm_decoder__do_hard_timestamp(etmq, elem, + trace_chan_id); + break; case OCSD_GEN_TRC_ELEM_PE_CONTEXT: - resp = cs_etm_decoder__set_tid(etmq, elem, trace_chan_id); + resp = cs_etm_decoder__set_tid(etmq, packet_queue, + elem, trace_chan_id); break; case OCSD_GEN_TRC_ELEM_ADDR_NACC: - case OCSD_GEN_TRC_ELEM_TIMESTAMP: case OCSD_GEN_TRC_ELEM_CYCLE_COUNT: case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN: case OCSD_GEN_TRC_ELEM_EVENT: diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index e97124f218fd..79978f4428b0 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -79,6 +79,7 @@ struct cs_etm_queue { struct cs_etm_decoder *decoder; struct auxtrace_buffer *buffer; unsigned int queue_nr; + u8 pending_timestamp; u64 offset; const unsigned char *buf; size_t buf_len, buf_used; @@ -132,6 +133,19 @@ int cs_etm__get_cpu(u8 trace_chan_id, int *cpu) return 0; }
+void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, + u8 trace_chan_id) +{ + /* + * Wnen a timestamp packet is encountered the backend code + * is stopped so that the front end has time to process packets + * that were accumulated in the traceID queue. Since there can + * be more than one channel per cs_etm_queue, we need to specify + * what traceID queue needs servicing. + */ + etmq->pending_timestamp = trace_chan_id; +} + static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) { int i; @@ -941,6 +955,11 @@ int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, return 0; }
+bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq) +{ + return !!etmq->etm->timeless_decoding; +} + static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 5f32b271a50e..1f6ec7babe70 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -151,6 +151,9 @@ struct cs_etm_packet_queue { u32 packet_count; u32 head; u32 tail; + u32 instr_count; + u64 timestamp; + u64 next_timestamp; struct cs_etm_packet packet_buffer[CS_ETM_PACKET_MAX_BUFFER]; };
@@ -184,6 +187,9 @@ int cs_etm__process_auxtrace_info(union perf_event *event, int cs_etm__get_cpu(u8 trace_chan_id, int *cpu); int cs_etm__etmq_set_tid(struct cs_etm_queue *etmq, pid_t tid, u8 trace_chan_id); +bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq); +void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, + u8 trace_chan_id); struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id); #else @@ -208,6 +214,17 @@ static inline int cs_etm__etmq_set_tid( return -1; }
+static inline bool cs_etm__etmq_is_timeless( + struct cs_etm_queue *etmq __maybe_unused) +{ + /* What else to return? */ + return true; +} + +static inline void cs_etm__etmq_set_traceid_queue_timestamp( + struct cs_etm_queue *etmq __maybe_unused, + u8 trace_chan_id __maybe_unused) {} + static inline struct cs_etm_packet_queue *cs_etm__etmq_get_packet_queue( struct cs_etm_queue *etmq __maybe_unused, u8 trace_chan_id __maybe_unused)
Add support for CPU-wide trace scenarios by correlating range packets with timestamp packets. That way range packets received on different ETMQ/traceID channels can be processed and synthesized in chronological order.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- tools/perf/util/cs-etm.c | 254 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 246 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 79978f4428b0..8496190ad553 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -89,12 +89,26 @@ struct cs_etm_queue { };
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); +static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); +static int cs_etm__get_data_block(struct cs_etm_queue *etmq); +static int cs_etm__decode_data_block(struct cs_etm_queue *etmq);
/* PTMs ETMIDR [11:8] set to b0011 */ #define ETMIDR_PTM_VERSION 0x00000300
+/* + * A struct auxtrace_heap_item only has a queue_nr and a timestamp to + * work with. One option is to modify to auxtrace_heap_XYZ() API or simply + * encode the etm queue number as the upper 16 bit and the channel as + * the lower 16 bit. + */ +#define TO_CS_QUEUE_NR(queue_nr, trace_id_chan) \ + (queue_nr << 16 | trace_chan_id) +#define TO_QUEUE_NR(cs_queue_nr) (cs_queue_nr >> 16) +#define TO_TRACE_CHAN_ID(cs_queue_nr) (cs_queue_nr & 0x0000ffff) + static u32 cs_etm__get_v7_protocol_version(u32 etmidr) { etmidr &= ETMIDR_PTM_VERSION; @@ -146,6 +160,29 @@ void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, etmq->pending_timestamp = trace_chan_id; }
+static u64 cs_etm__etmq_get_timestamp(struct cs_etm_queue *etmq, + u8 *trace_chan_id) +{ + struct cs_etm_packet_queue *packet_queue; + + if (!etmq->pending_timestamp) + return 0; + + if (trace_chan_id) + *trace_chan_id = etmq->pending_timestamp; + + packet_queue = cs_etm__etmq_get_packet_queue(etmq, + etmq->pending_timestamp); + if (!packet_queue) + return 0; + + /* Acknowledge pending status */ + etmq->pending_timestamp = 0; + + /* See function cs_etm_decoder__do_{hard|soft}_timestamp() */ + return packet_queue->timestamp; +} + static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) { int i; @@ -170,6 +207,20 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) } }
+static void cs_etm__clear_all_packet_queues(struct cs_etm_queue *etmq) +{ + int idx; + struct int_node *inode; + struct cs_etm_traceid_queue *tidq; + struct intlist *traceid_queues_list = etmq->traceid_queues_list; + + intlist__for_each_entry(inode, traceid_queues_list) { + idx = (int)(intptr_t)inode->priv; + tidq = etmq->traceid_queues[idx]; + cs_etm__clear_packet_queue(&tidq->packet_queue); + } +} + static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u8 trace_chan_id) @@ -457,15 +508,15 @@ static int cs_etm__flush_events(struct perf_session *session, if (!tool->ordered_events) return -EINVAL;
- if (!etm->timeless_decoding) - return -EINVAL; - ret = cs_etm__update_queues(etm);
if (ret < 0) return ret;
- return cs_etm__process_timeless_queues(etm, -1); + if (etm->timeless_decoding) + return cs_etm__process_timeless_queues(etm, -1); + + return cs_etm__process_queues(etm); }
static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq) @@ -684,6 +735,9 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, unsigned int queue_nr) { int ret = 0; + unsigned int cs_queue_nr; + u8 trace_chan_id; + u64 timestamp; struct cs_etm_queue *etmq = queue->priv;
if (list_empty(&queue->head) || etmq) @@ -701,6 +755,67 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->queue_nr = queue_nr; etmq->offset = 0;
+ if (etm->timeless_decoding) + goto out; + + /* + * We are under a CPU-wide trace scenario. As such we need to know + * when the code that generated the traces started to execute so that + * it can be correlated with execution on other CPUs. So we get a + * handle on the beginning of traces and decode until we find a + * timestamp. The timestamp is then added to the auxtrace min heap + * in order to know what nibble (of all the etmqs) to decode first. + */ + while (1) { + /* + * Fetch an aux_buffer from this etmq. Bail if no more + * blocks or an error has been encountered. + */ + ret = cs_etm__get_data_block(etmq); + if (ret <= 0) + goto out; + + /* + * Run decoder on the trace block. The decoder will stop when + * encountering a timestamp, a full packet queue or the end of + * trace for that block. + */ + ret = cs_etm__decode_data_block(etmq); + if (ret) + goto out; + + /* + * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all + * the timestamp calculation for us. + */ + timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id); + + /* We found a timestamp, no need to continue. */ + if (timestamp) + break; + + /* + * We didn't find a timestamp so empty all the traceid packet + * queues before looking for another timestamp packet, either + * in the current data block or a new one. Packets that were + * just decoded are useless since no timestamp has been + * associated with them. As such simply discard them. + */ + cs_etm__clear_all_packet_queues(etmq); + } + + /* + * We have a timestamp. Add it to the min heap to reflect when + * instructions conveyed by the range packets of this traceID queue + * started to execute. Once the same has been done for all the traceID + * queues of each etmq, redenring and decoding can start in + * chronological order. + * + * Note that packets decoded above are still in the traceID's packet + * queue and will be processed in cs_etm__process_queues(). + */ + cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_id_chan); + ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp); out: return ret; } @@ -1849,6 +1964,28 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, return ret; }
+static void cs_etm__clear_all_traceid_queues(struct cs_etm_queue *etmq) +{ + int idx; + struct int_node *inode; + struct cs_etm_traceid_queue *tidq; + struct intlist *traceid_queues_list = etmq->traceid_queues_list; + + intlist__for_each_entry(inode, traceid_queues_list) { + idx = (int)(intptr_t)inode->priv; + tidq = etmq->traceid_queues[idx]; + + /* Ignore return value */ + cs_etm__process_traceid_queue(etmq, tidq); + + /* + * Generate an instruction sample with the remaining + * branchstack entries. + */ + cs_etm__flush(etmq, tidq); + } +} + static int cs_etm__run_decoder(struct cs_etm_queue *etmq) { int err = 0; @@ -1916,6 +2053,105 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, return 0; }
+static int cs_etm__process_queues(struct cs_etm_auxtrace *etm) +{ + int ret = 0; + unsigned int cs_queue_nr, queue_nr; + u8 trace_chan_id; + u64 timestamp; + struct auxtrace_queue *queue; + struct cs_etm_queue *etmq; + struct cs_etm_traceid_queue *tidq; + + while (1) { + if (!etm->heap.heap_cnt) + goto out; + + /* Take the entry at the top of the min heap */ + cs_queue_nr = etm->heap.heap_array[0].queue_nr; + queue_nr = TO_QUEUE_NR(cs_queue_nr); + trace_chan_id = TO_TRACE_CHAN_ID(cs_queue_nr); + queue = &etm->queues.queue_array[queue_nr]; + etmq = queue->priv; + + /* + * Remove the top entry from the heap since we are about + * to process it. + */ + auxtrace_heap__pop(&etm->heap); + + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (!tidq) { + /* + * No traceID queue has been allocated for this traceID, + * which means something somewhere went very wrong. No + * other choice than simply exit. + */ + ret = -EINVAL; + goto out; + } + + /* + * Packets associated with this timestamp are already in + * the etmq's traceID queue, so process them. + */ + ret = cs_etm__process_traceid_queue(etmq, tidq); + if (ret < 0) + goto out; + + /* + * Packets for this timestamp have been processed, time to + * move on to the next timestamp, fetching a new auxtrace_buffer + * if need be. + */ +refetch: + ret = cs_etm__get_data_block(etmq); + if (ret < 0) + goto out; + + /* + * No more auxtrace_buffers to process in this etmq, simply + * move on to another entry in the auxtrace_heap. + */ + if (!ret) + continue; + + ret = cs_etm__decode_data_block(etmq); + if (ret) + goto out; + + timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id); + + if (!timestamp) { + /* + * Function cs_etm__decode_data_block() returns when + * there is no more traces to decode in the current + * auxtrace_buffer OR when a timestamp has been + * encountered on any of the traceID queues. Since we + * did not get a timestamp, there is no more traces to + * process in this auxtrace_buffer. As such empty and + * flush all traceID queues. + */ + cs_etm__clear_all_traceid_queues(etmq); + + /* Fetch another auxtrace_buffer for this etmq */ + goto refetch; + } + + /* + * Add to the min heap the timestamp for packets that have + * just been decoded. They will be processed and synthesized + * during the next call to cs_etm__process_traceid_queue() for + * this queue/traceID. + */ + cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_chan_id); + ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp); + } + +out: + return ret; +} + static int cs_etm__process_itrace_start(struct cs_etm_auxtrace *etm, union perf_event *event) { @@ -1994,9 +2230,6 @@ static int cs_etm__process_event(struct perf_session *session, return -EINVAL; }
- if (!etm->timeless_decoding) - return -EINVAL; - if (sample->time && (sample->time != (u64) -1)) timestamp = sample->time; else @@ -2008,7 +2241,8 @@ static int cs_etm__process_event(struct perf_session *session, return err; }
- if (event->header.type == PERF_RECORD_EXIT) + if (etm->timeless_decoding && + event->header.type == PERF_RECORD_EXIT) return cs_etm__process_timeless_queues(etm, event->fork.tid);
@@ -2017,6 +2251,10 @@ static int cs_etm__process_event(struct perf_session *session, else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);
+ if (!etm->timeless_decoding && + event->header.type == PERF_RECORD_AUX) + return cs_etm__process_queues(etm); + return 0; }
On Wed, Mar 06, 2019 at 03:59:25PM -0700, Mathieu Poirier wrote:
This patchset complements the kernel portion of the solution by adding the capability to decode and render traces based on the time they were executed.
Most of the changes are related to decoding traces with multiple traceIDs, something that is mandatory when dealing with N:1 source/sink topologies. Both kernel and user space patches have been rebased on yesterday's linux-next and are available here[1].
Note that compiling the perf tools on both target and host platforms in the 5.1 cycle requires the addition of "CORESIGHT=1", i.e:
$ make -C tools/perf CORESIGHT=1 VF=1
Suggest to update HOWTO.md in OpenCSD repository to reflect the changes for perf building.
[...]
Thanks, Leo Yan
Hi Leo,
On Wed, 3 Apr 2019 at 08:50, Leo Yan leo.yan@linaro.org wrote:
On Wed, Mar 06, 2019 at 03:59:25PM -0700, Mathieu Poirier wrote:
This patchset complements the kernel portion of the solution by adding the capability to decode and render traces based on the time they were executed.
Most of the changes are related to decoding traces with multiple traceIDs, something that is mandatory when dealing with N:1 source/sink topologies. Both kernel and user space patches have been rebased on yesterday's linux-next and are available here[1].
Note that compiling the perf tools on both target and host platforms in the 5.1 cycle requires the addition of "CORESIGHT=1", i.e:
$ make -C tools/perf CORESIGHT=1 VF=1
Suggest to update HOWTO.md in OpenCSD repository to reflect the changes for perf building.
Mathieu has sent me a change for this - will appear in OpenCSD v0.11.2
Thanks
Mike
[...]
Thanks, Leo Yan _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On Wed, Apr 03, 2019 at 09:32:36AM +0100, Mike Leach wrote:
Hi Leo,
On Wed, 3 Apr 2019 at 08:50, Leo Yan leo.yan@linaro.org wrote:
On Wed, Mar 06, 2019 at 03:59:25PM -0700, Mathieu Poirier wrote:
This patchset complements the kernel portion of the solution by adding the capability to decode and render traces based on the time they were executed.
Most of the changes are related to decoding traces with multiple traceIDs, something that is mandatory when dealing with N:1 source/sink topologies. Both kernel and user space patches have been rebased on yesterday's linux-next and are available here[1].
Note that compiling the perf tools on both target and host platforms in the 5.1 cycle requires the addition of "CORESIGHT=1", i.e:
$ make -C tools/perf CORESIGHT=1 VF=1
Suggest to update HOWTO.md in OpenCSD repository to reflect the changes for perf building.
Mathieu has sent me a change for this - will appear in OpenCSD v0.11.2
Good to know this, thanks!