This set adds support for CoreSight CPU-wide trace sessions. It borrows most of its code from the per-thread implementation with exception that range packets are processed and synthesised according to the time the trace they contain has been executed.
This is done using the timestamp and contextID feature available on ETM4x tracers (ETM3x/PTM aren't addressed yet). Decoding between processors is done in chronological order using a min heap.
Of special interest is the way timestamp packets are used to account for temporal execution of traced instructions. Since a timestamp typically happen after range packets have been recorded, the timestamp from the previous range is used as the start time of the current range. When a timestamp for the previous range doesn't exist (i.e start of trace or discontinuity) the start time is estimated.
Open question: --------------
At this time the implementation supports tracing a single CPU since the only HW we have exhibit an N:1 source/sink topology. The HW itself does support collecting traces from more than one source but using the feature in this way could be very confusing and mislead users.
For example the following:
# perf record -e cs_etm/20070000.etr/ -C 2,3 application1
would end up tracing everyting that is happening on CPU 2 and 3 for as long as appliation1 is executing. Because the HW doesn't give us an interrupt when buffers are full, traces from one CPU could easily clobber traces from the other, giving the impression that nothing was executed on the latter.
So this would work:
# perf record -e cs_etm/20070000.etr/ -C 3 application1
I am open to discussion on the topic should someone think of something.
As with the cleanup set this code has been uploaded here [1].
Thanks, Mathieu
[1].https://git.linaro.org/people/mathieu.poirier/coresight.git perf-opencsd-master-cpu-wide-support
Mathieu Poirier (12): perf tools: Add defines for CONTEXTID configuration perf tools: Configure contextID tracing in CPU-wide mode perf tools: Configure timestsamp generation in CPU-wide mode perf tools: Configure SWITCH_EVENTS in CPU-wide mode perf tools: Add handling of itrace start events perf tools: Add handling of switch-CPU-wide events perf tools: Linking PE contextID with perf thread mechanic perf tools: Allocate decoder tree as needed perf tools: Make cs_etm__dump_event() work with CPU-wide scenarios perf tools: Add notion of time to the decoding code perf tools: Make function cs_etm_decoder__clear_buffer() public perf tools: Add support for CPU-wide trace scenarios
include/linux/coresight-pmu.h | 2 + tools/include/linux/coresight-pmu.h | 2 + tools/perf/arch/arm/util/cs-etm.c | 174 ++++++++++-- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 140 +++++++++- tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 4 +- tools/perf/util/cs-etm.c | 334 ++++++++++++++++++++++-- tools/perf/util/cs-etm.h | 17 ++ 7 files changed, 623 insertions(+), 50 deletions(-)