On Mon, 18 Jun 2018 at 16:43, Kim Phillips kim.phillips@arm.com wrote:
On Thu, 14 Jun 2018 19:02:45 -0500 Kim Phillips kim.phillips@arm.com wrote:
On Thu, 14 Jun 2018 14:26:45 -0600 Mathieu Poirier mathieu.poirier@linaro.org wrote:
This set adds support for CoreSight CPU-wide trace sessions. It borrows most of its code from the per-thread implementation with exception that range packets are processed and synthesised according to the time the trace they contain has been executed.
...
For example the following:
# perf record -e cs_etm/20070000.etr/ -C 2,3 application1
Just to help my understanding:
So prior to this series we'd always have to use --per-thread, right?:
perf record -e cs_etm/20070000.etr/ --per-thread <workload>
Because otherwise we'd get the misleading 'failed to mmap with 12 (Cannot allocate memory)' error.
And based on the above, with this series, we now are able to have more than one CPU specified in the user-specified cpu mask, like so?:
perf record -e cs_etm/20070000.etr/ -C 2,3 application1
Good, because that returns the same unhelpful error message today.
But we want to get coresight to the same point where intel-pt is, i.e., we shouldn't have to specify either --per-thread, nor manual CPU masks, like so:
perf record -e cs_etm/20070000.etr/ <workload>
...which also returns the mmap failed error today (all this on Juno).
So does this patchseries fix that, too, and on heterogeneous machines like Juno? How about record -a?
FWIW, I tested this series on Juno, and it still fails 'to mmap with 12 (Cannot allocate memory)', with the same style of invocation you provide above, i.e., 'perf record -e cs_etm/20070000.etr/ -C 2,3 application1', in addition to without the -C specification.
Am I testing it wrong?
The "Open question" section of the original cover letter clearly addresses that topic - the current implementation supports a single CPU. I decided to proceed this way because:
1) I didn't know if users would favour traces over CPUs. 2) CPU wide multi-CPU support is much harder and an extension of the CPU wide single-CPU support. 3) With this patchset it is possible to review and test the algorithm that deal with temporal correlation of the code.
Based on the comments from Al and Robert, we will be proceeding with multi-CPU support, something I'm currently working on. The good news is that tackling that will likely implement the foundation for the complex configuration feature, also discussed on the CS mailing list a couple of months back.
Kim