This is an RFC patch to explore the solution to a problem we have in the CoreSight ETM/ETE PMU.
CoreSight ETM trace allows instruction level tracing of Arm CPUs. The ETM generates the CPU excecution trace and pumps it into CoreSight AMBA Trace Bus and is collected by a different CoreSight component (traditionally CoreSight TMC-ETR /ETB/ETF), called "sink". Important to note that there is no guarantee that every CPU has a dedicated sink. Thus multiple ETMs could pump the trace data into the same "sink" and thus they apply additional formatting of the trace data for the user to decode it properly and attribute the trace data to the corresponding ETM.
However, with the introduction of Arm Trace buffer Extensions (TRBE), we now have a dedicated per-CPU architected sink for collecting the trace. Since the TRBE is always per-CPU, it doesn't apply any formatting of the trace. The support for this driver is under review [1].
Now a system could have a per-cpu TRBE and one or more shared TMC-ETRs on the system. A user could choose a "specific" sink for a perf session (e.g, a TMC-ETR) or the driver could automatically select the nearest sink for a given ETM. It is possible that some ETMs could end up using TMC-ETR (e.g, if the TRBE is not usable on the CPU) while the others using TRBE in a single perf session. Thus we now have "formatted" trace collected from TMC-ETR and "unformatted" trace collected from TRBE. However, we don't get into a situation where a single event could end up using TMC-ETR & TRBE. i.e, any AUX buffer is guaranteed to be either RAW or FORMATTED, but not a mix of both.
As for perf decoding, we need to know the type of the data in the individual AUX buffers, so that it can set up the "OpenCSD" (library for decoding CoreSight trace) decoder instance appropriately. Thus the perf.data file must conatin the hints for the tool to decode the data correctly.
Since this is a runtime variable, and perf tool doesn't have a control on what sink gets used (in case of automatic sink selection), we need this information made available from the PMU driver for each AUX record.
This patch is an attempt to solve the problem by, adding an AUX flag for each AUX record to indicate the type of the trace in them. It can be defined as a PMU specific flag, which each PMU could interpret in its on way (e.g, PERF_AUX_FLAG_PMU_FLAG_1 or could be a dedicated flag for the CoreSight in a "generic" form PERF_AUX_FLAG_ALT_FMT (Thanks Mike Leach for the name).
We are looking for suggestions on how best to solve this problem and happy to explore other options if there is a preferred way of solving this.
[1] https://lkml.kernel.org/r/1610511498-4058-1-git-send-email-anshuman.khandual...
Suzuki K Poulose (1): perf: Handle multiple formatted AUX records
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ include/linux/coresight.h | 1 + include/uapi/linux/perf_event.h | 1 + 3 files changed, 4 insertions(+)
CoreSight PMU supports aux-buffer for the ETM tracing. The trace generated by the ETM (associated with individual CPUs, like Intel PT) is captured by a separate IP (CoreSight TMC-ETR/ETF until now).
The TMC-ETR applies formatting of the raw ETM trace data, as it can collect traces from multiple ETMs, with the TraceID to indicate the source of a given trace packet.
Arm Trace Buffer Extension is new "sink" IP, attached to individual CPUs and thus do not provide additional formatting, like TMC-ETR.
Additionally, a system could have both TRBE *and* TMC-ETR for the trace collection. e.g, TMC-ETR could be used as a single trace buffer to collect data from multiple ETMs to correlate the traces from different CPUs. It is possible to have a perf session where some events end up collecting the trace in TMC-ETR while the others in TRBE. Thus we need a way to identify the type of the trace for each AUX record.
This patch adds a new flag to indicate the trace format for the given record. Also, includes the changes that demonstrates how this can be used in the CoreSight PMU to solve the problem.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ include/linux/coresight.h | 1 + include/uapi/linux/perf_event.h | 1 + 3 files changed, 4 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index e776a07b0852..81602bd8da59 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -429,6 +429,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
size = sink_ops(sink)->update_buffer(sink, handle, event_data->snk_config); + if (!sink->formatted_trace) + perf_aux_output_flag(handle, PERF_AUX_FLAG_ALT_FMT); perf_aux_output_end(handle, size); }
diff --git a/include/linux/coresight.h b/include/linux/coresight.h index e019182521a1..45c173c391a4 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -241,6 +241,7 @@ struct coresight_device { int nr_links; bool has_conns_grp; bool ect_enabled; /* true only if associated ect device is enabled */ + bool formatted_trace; /* Trace is CoreSight formatted ? */ };
/* diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index b15e3447cd9f..ea7dcc7b30f0 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -1109,6 +1109,7 @@ enum perf_callchain_context { #define PERF_AUX_FLAG_OVERWRITE 0x02 /* snapshot from overwrite mode */ #define PERF_AUX_FLAG_PARTIAL 0x04 /* record contains gaps */ #define PERF_AUX_FLAG_COLLISION 0x08 /* sample collided with another */ +#define PERF_AUX_FLAG_ALT_FMT 0x10 /* this record is in alternate trace format */
#define PERF_FLAG_FD_NO_GROUP (1UL << 0) #define PERF_FLAG_FD_OUTPUT (1UL << 1)
On Fri, Jan 22, 2021 at 03:18:29PM +0000, Suzuki K Poulose wrote:
CoreSight PMU supports aux-buffer for the ETM tracing. The trace generated by the ETM (associated with individual CPUs, like Intel PT) is captured by a separate IP (CoreSight TMC-ETR/ETF until now).
The TMC-ETR applies formatting of the raw ETM trace data, as it can collect traces from multiple ETMs, with the TraceID to indicate the source of a given trace packet.
Arm Trace Buffer Extension is new "sink" IP, attached to individual CPUs and thus do not provide additional formatting, like TMC-ETR.
Additionally, a system could have both TRBE *and* TMC-ETR for the trace collection. e.g, TMC-ETR could be used as a single trace buffer to collect data from multiple ETMs to correlate the traces from different CPUs. It is possible to have a perf session where some events end up collecting the trace in TMC-ETR while the others in TRBE. Thus we need a way to identify the type of the trace for each AUX record.
This patch adds a new flag to indicate the trace format for the given record. Also, includes the changes that demonstrates how this can be used in the CoreSight PMU to solve the problem.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index b15e3447cd9f..ea7dcc7b30f0 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -1109,6 +1109,7 @@ enum perf_callchain_context { #define PERF_AUX_FLAG_OVERWRITE 0x02 /* snapshot from overwrite mode */ #define PERF_AUX_FLAG_PARTIAL 0x04 /* record contains gaps */ #define PERF_AUX_FLAG_COLLISION 0x08 /* sample collided with another */ +#define PERF_AUX_FLAG_ALT_FMT 0x10 /* this record is in alternate trace format */
Since we have a whole u64, do we want to reserve a whole nibble (or maybe even a byte) for a format type? Because with a single bit like this, we'll kick ourselves when we end up with the need for a 3rd format type.
Hi Peter
On 1/25/21 10:25 AM, Peter Zijlstra wrote:
On Fri, Jan 22, 2021 at 03:18:29PM +0000, Suzuki K Poulose wrote:
CoreSight PMU supports aux-buffer for the ETM tracing. The trace generated by the ETM (associated with individual CPUs, like Intel PT) is captured by a separate IP (CoreSight TMC-ETR/ETF until now).
The TMC-ETR applies formatting of the raw ETM trace data, as it can collect traces from multiple ETMs, with the TraceID to indicate the source of a given trace packet.
Arm Trace Buffer Extension is new "sink" IP, attached to individual CPUs and thus do not provide additional formatting, like TMC-ETR.
Additionally, a system could have both TRBE *and* TMC-ETR for the trace collection. e.g, TMC-ETR could be used as a single trace buffer to collect data from multiple ETMs to correlate the traces from different CPUs. It is possible to have a perf session where some events end up collecting the trace in TMC-ETR while the others in TRBE. Thus we need a way to identify the type of the trace for each AUX record.
This patch adds a new flag to indicate the trace format for the given record. Also, includes the changes that demonstrates how this can be used in the CoreSight PMU to solve the problem.
Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index b15e3447cd9f..ea7dcc7b30f0 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -1109,6 +1109,7 @@ enum perf_callchain_context { #define PERF_AUX_FLAG_OVERWRITE 0x02 /* snapshot from overwrite mode */ #define PERF_AUX_FLAG_PARTIAL 0x04 /* record contains gaps */ #define PERF_AUX_FLAG_COLLISION 0x08 /* sample collided with another */ +#define PERF_AUX_FLAG_ALT_FMT 0x10 /* this record is in alternate trace format */
Since we have a whole u64, do we want to reserve a whole nibble (or maybe even a byte) for a format type? Because with a single bit like this, we'll kick ourselves when we end up with the need for a 3rd format type.
Sure, makes sense. We could do:
#define PERF_AUX_FLAG_PMU_FORMAT_TYPE_MASK 0xff00
Additionally, the values could be allocated by individual PMUs and interpreted by the corresponding counterpart. That way we don't have to worry about centralized allocation of the "TYPE" fields.
e,g:
#define PERF_AUX_FLAG_CORESIGHT_FORMAT_CORESIGHT 0x0000 #define PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW 0x0100
#define PERF_AUX_FLAG_RANDOM_PMU_FORMAT_FMT1 0x0000 #define PERF_AUX_FLAG_RANDOM_PMU_FORMAT_FMT2 0x0100
What do you think ?
Cheers Suzuki
On Mon, Jan 25, 2021 at 10:45:06AM +0000, Suzuki K Poulose wrote:
On 1/25/21 10:25 AM, Peter Zijlstra wrote:
Since we have a whole u64, do we want to reserve a whole nibble (or maybe even a byte) for a format type? Because with a single bit like this, we'll kick ourselves when we end up with the need for a 3rd format type.
Sure, makes sense. We could do:
#define PERF_AUX_FLAG_PMU_FORMAT_TYPE_MASK 0xff00
Additionally, the values could be allocated by individual PMUs and interpreted by the corresponding counterpart. That way we don't have to worry about centralized allocation of the "TYPE" fields.
e,g:
#define PERF_AUX_FLAG_CORESIGHT_FORMAT_CORESIGHT 0x0000 #define PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW 0x0100
#define PERF_AUX_FLAG_RANDOM_PMU_FORMAT_FMT1 0x0000 #define PERF_AUX_FLAG_RANDOM_PMU_FORMAT_FMT2 0x0100
What do you think ?
Sounds good to me.