On Mon, Dec 19, 2016 at 10:01 AM, Mike Leach mike.leach@linaro.org wrote:
OK - a bit of further follow up. Just re-read the PMU spec - it is not possible to chain event counts as I thought above - so counting predicted & mis-predicted branches as a single count isn't possible. So unless a given core implementation provides an implementation specific event that counts all branches, PMU probably can't help in this case.
Thanks Mike, I will keep this in mind. I will investigate how we can get perf to interrupt and collect the trace based on a PMU event.
Sebastian
Mike
On 19 December 2016 at 15:00, Mike Leach mike.leach@linaro.org wrote:
Hi Sebastian,
The short answer is no - there is nothing in the ETM hardware that does any counting of this nature, that could then trigger an interrupt to halt trace.
However, the ARM Core hardware PMU - associated with the Cortex core rather than trace, does have event counters. It is implementation defined which types of events are supported on a per core basis (i.e. A57 has a different subset than A53). Looking at the A57 PMU, there is no event that simply counts all executed branches, but there are counters for predicted & mis-predicted branches (speculatively executed - so this might over-count actual branches taken), So theoretically it might be possible to run both trace and the ARM h/w PMU counting branches, and interrupt when a counter hits a level - but I have no idea if this level of support / combining of events is currently possible in perf.
Regards
Mike.
On 16 December 2016 at 22:38, Sebastian Pop sebpop@gmail.com wrote:
Thanks Mike: with your explanations, I now can see how all my observations fit in.
Indeed, for AutoFDO we do not need to collect a full trace, and losing a few branch records is not a problem.
The main reason I was trying to get the largest trace was because I wanted to compare how large the ETM traces vs. Intel-PT traces, mostly to understand how efficient the encoding of the trace is.
I have seen that an ETM trace of about 1MB corresponds to about 10^5 branches. Is there a way to trigger the collection of traces based on an HW counter overflowing? I would like something like: "perf record -e branches -c 100000" in order to drain the ETM buffer every 10^5 executed branches.
Thanks, Sebastian
On Fri, Dec 16, 2016 at 5:02 AM, Mike Leach mike.leach@linaro.org wrote:
Hi Sebastian,
I think some of the variation you are seeing - especially related to the "AUX lost" messages - is down to the way that the ARM trace hardware operates when compared to IntelPT. The AUX lost messages occur fro teh ARM trace when the circular buffer allocated to the trace wraps. Now, the IntelPT system can generate an interrupt when a buffer is approaching a full state, but unfortunately the current ARM ETR hardware does not have this capability. So trace will run when the process is scheduled in, and be stopped when scheduled out. At this point the buffer will be examined to see if it has wrapped or not, and be copied out to the perf.data file.
Therefore, when you have a larger trace buffer, more valid trace is captured despite the buffer wrap and your totals are larger. i.e. if a process can generate 6MB of trace during the time it is scheduled in, then the 3MB buffer wraps once, the 1MB 5 times. However both these cases will be counted as a single instance of trace loss - once a buffer has wrapped we cannot determine how many times round. The trace protocol contains regular synchronisation packets so even though we may lose the "start of trace" on a wrapped buffer, we can synchronise as some point within the buffer.
The IntelPT is more deterministic as the process will be scheduled out by the interrupt that fires when the trace buffer is full.
For the case of AutoFDO, where the focus is coverage analysis, I do not believe that this is an issue. If we wanted to trace the exact execution path to a particular problem then we would have to consider other trace techniques to reduce the amount captured - filtering / triggering etc. The sheer amount of trace that can be generated vs buffer size is always a tricky compromise (though it is easier when the trace is streamed off chip to an external capture device where the buffer sizes are in the GB)
Regards
Mike
On 15 December 2016 at 23:11, Sebastian Pop sebpop@gmail.com wrote:
I have also seen a similar behavior on intel-pt when recording huge traces:
$ perf record -e intel_pt//u taskset -c 2 ./sort Bubble sorting array of 70000 elements 7462 ms [ perf record: Woken up 25808 times to write data ] Warning: AUX data lost 8 times out of 51593!
[ perf record: Captured and wrote 1616.765 MB perf.data ]
On Thu, Dec 15, 2016 at 4:03 PM, Sebastian Pop sebpop@gmail.com wrote:
Using "taskset -c" seems to help collecting longer traces: the largest trace I have seen with taskset is 77MB and the largest trace that I see when the process is allowed to be moved to different cpus is 28MB.
I also see warnings on dmesg: coresight-tmc 20070000.etr: timeout while waiting for completion of Manual Flush
Sebastian
++ seq 1 10
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 10201 ms [ perf record: Woken up 24 times to write data ] Warning: AUX data lost 4 times out of 80!
[ perf record: Captured and wrote 48.059 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 10378 ms [ perf record: Woken up 24 times to write data ] Warning: AUX data lost 9 times out of 68!
[ perf record: Captured and wrote 52.353 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 11220 ms [ perf record: Woken up 29 times to write data ] Warning: AUX data lost 3 times out of 113!
[ perf record: Captured and wrote 60.112 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 10912 ms [ perf record: Woken up 28 times to write data ] Warning: AUX data lost 5 times out of 99!
[ perf record: Captured and wrote 57.794 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 11706 ms [ perf record: Woken up 32 times to write data ] Warning: AUX data lost 2 times out of 125!
[ perf record: Captured and wrote 62.741 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 11314 ms [ perf record: Woken up 30 times to write data ] Warning: AUX data lost 4 times out of 105!
[ perf record: Captured and wrote 59.480 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 11737 ms [ perf record: Woken up 35 times to write data ] Warning: AUX data lost 15 times out of 71!
[ perf record: Captured and wrote 70.489 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 10418 ms [ perf record: Woken up 26 times to write data ] Warning: AUX data lost 9 times out of 69!
[ perf record: Captured and wrote 51.939 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 11825 ms [ perf record: Woken up 26 times to write data ] Warning: AUX data lost 25 times out of 31!
[ perf record: Captured and wrote 76.481 MB perf.data ]
- for i in '$(seq 1 10)'
- /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
--per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 11824 ms [ perf record: Woken up 26 times to write data ] Warning: AUX data lost 25 times out of 31!
[ perf record: Captured and wrote 76.712 MB perf.data ]
On Thu, Dec 15, 2016 at 3:18 PM, Sebastian Pop sebpop@gmail.com wrote: > Mathieu, > with today's perf-opencsd-4.9 branch I get larger traces, > although I still see big variability and warnings about lost data. > Changing the default ETR arm,buffer-size to 3MB I get larger > traces, > and still variable length: > > 1MB ETR: > > ++ seq 1 10 > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 7895 ms > [ perf record: Woken up 3 times to write data ] > Warning: > AUX data lost 3 times out of 4! > > [ perf record: Captured and wrote 3.116 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 6946 ms > [ perf record: Woken up 9 times to write data ] > Warning: > AUX data lost 9 times out of 15! > > [ perf record: Captured and wrote 10.019 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8245 ms > [ perf record: Woken up 7 times to write data ] > Warning: > AUX data lost 7 times out of 22! > > [ perf record: Captured and wrote 7.584 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 7181 ms > [ perf record: Woken up 5 times to write data ] > Warning: > AUX data lost 5 times out of 7! > > [ perf record: Captured and wrote 5.245 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8354 ms > [ perf record: Woken up 9 times to write data ] > Warning: > AUX data lost 8 times out of 17! > > [ perf record: Captured and wrote 9.167 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 6458 ms > [ perf record: Woken up 5 times to write data ] > Warning: > AUX data lost 4 times out of 10! > > [ perf record: Captured and wrote 5.535 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8247 ms > [ perf record: Woken up 6 times to write data ] > Warning: > AUX data lost 6 times out of 14! > > [ perf record: Captured and wrote 7.926 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8495 ms > [ perf record: Woken up 11 times to write data ] > Warning: > AUX data lost 9 times out of 33! > > [ perf record: Captured and wrote 12.972 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 7446 ms > [ perf record: Woken up 13 times to write data ] > Warning: > AUX data lost 11 times out of 24! > > [ perf record: Captured and wrote 13.055 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 6652 ms > [ perf record: Woken up 5 times to write data ] > Warning: > AUX data lost 5 times out of 9! > > [ perf record: Captured and wrote 5.426 MB perf.data ] > > > > > With 3MB ETR: > > ++ seq 1 10 > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8283 ms > [ perf record: Woken up 10 times to write data ] > Warning: > AUX data lost 8 times out of 10! > > [ perf record: Captured and wrote 24.446 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 7418 ms > [ perf record: Woken up 8 times to write data ] > Warning: > AUX data lost 4 times out of 11! > > [ perf record: Captured and wrote 16.099 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 7869 ms > [ perf record: Woken up 10 times to write data ] > Warning: > AUX data lost 7 times out of 14! > > [ perf record: Captured and wrote 21.717 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 9250 ms > [ perf record: Woken up 11 times to write data ] > Warning: > AUX data lost 7 times out of 14! > > [ perf record: Captured and wrote 23.431 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 7748 ms > [ perf record: Woken up 6 times to write data ] > Warning: > AUX data lost 5 times out of 7! > > [ perf record: Captured and wrote 15.176 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8185 ms > [ perf record: Woken up 4 times to write data ] > Warning: > AUX data lost 4 times out of 5! > > [ perf record: Captured and wrote 12.007 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8886 ms > [ perf record: Woken up 11 times to write data ] > Warning: > AUX data lost 6 times out of 20! > > [ perf record: Captured and wrote 23.402 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8866 ms > [ perf record: Woken up 8 times to write data ] > Warning: > AUX data lost 6 times out of 12! > > [ perf record: Captured and wrote 19.989 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8994 ms > [ perf record: Woken up 10 times to write data ] > Warning: > AUX data lost 8 times out of 10! > > [ perf record: Captured and wrote 24.038 MB perf.data ] > + for i in '$(seq 1 10)' > + /root/etm/OpenCSD/tools/perf/perf record -e > cs_etm/@20070000.etr/u > --per-thread ./sort > Bubble sorting array of 30000 elements > 8753 ms > [ perf record: Woken up 13 times to write data ] > Warning: > AUX data lost 8 times out of 34! > > [ perf record: Captured and wrote 27.545 MB perf.data ]
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK