OK - a bit of further follow up. Just re-read the PMU spec - it is not possible to chain event counts as I thought above - so counting predicted & mis-predicted branches as a single count isn't possible. So unless a given core implementation provides an implementation specific event that counts all branches, PMU probably can't help in this case.

Mike

On 19 December 2016 at 15:00, Mike Leach <mike.leach@linaro.org> wrote:
Hi Sebastian,

The short answer is no - there is nothing in the ETM hardware that does any counting of this nature, that could then trigger an interrupt to halt trace.

However, the ARM Core hardware PMU - associated with the Cortex core rather than trace, does have event counters. It is implementation defined which types of events are supported on a per core basis (i.e. A57 has a different subset than A53). Looking at the A57 PMU, there is no event that simply counts all executed branches, but there are counters for predicted & mis-predicted branches (speculatively executed - so this might over-count actual branches taken), So theoretically it might be possible to run both trace and the ARM h/w PMU counting branches, and interrupt when a counter hits a level - but I have no idea if this level of support / combining of events is currently possible in perf.

Regards

Mike.

On 16 December 2016 at 22:38, Sebastian Pop <sebpop@gmail.com> wrote:
Thanks Mike: with your explanations, I now can see how all my
observations fit in.

Indeed, for AutoFDO we do not need to collect a full trace, and losing
a few branch records is not a problem.

The main reason I was trying to get the largest trace was because I
wanted to compare how large the ETM traces vs. Intel-PT traces, mostly
to understand how efficient the encoding of the trace is.

I have seen that an ETM trace of about 1MB corresponds to about 10^5 branches.
Is there a way to trigger the collection of traces based on an HW
counter overflowing?
I would like something like: "perf record -e branches -c 100000"
in order to drain the ETM buffer every 10^5 executed branches.

Thanks,
Sebastian

On Fri, Dec 16, 2016 at 5:02 AM, Mike Leach <mike.leach@linaro.org> wrote:
> Hi Sebastian,
>
> I think some of the variation you are seeing - especially related to the
> "AUX lost" messages - is down to the way that the ARM trace hardware
> operates when compared to IntelPT.
> The AUX lost messages occur fro teh ARM trace when the circular buffer
> allocated to the trace wraps. Now, the IntelPT system can generate an
> interrupt when a buffer is approaching a full state, but unfortunately the
> current ARM ETR hardware does not have this capability. So trace will run
> when the process is scheduled in, and be stopped when scheduled out. At this
> point the buffer will be examined to see if it has wrapped or not, and be
> copied out to the perf.data file.
>
> Therefore, when you have a larger trace buffer, more valid trace is captured
> despite the buffer wrap and your totals are larger. i.e. if a process can
> generate 6MB of trace during the time it is scheduled in, then the 3MB
> buffer wraps once, the 1MB 5 times. However both these cases will be counted
> as a single instance of trace loss - once a buffer has wrapped we cannot
> determine how many times round. The trace protocol contains regular
> synchronisation packets so even though we may lose the "start of trace" on a
> wrapped buffer, we can synchronise as some point within the buffer.
>
> The IntelPT is more deterministic as the process will be scheduled out by
> the interrupt that fires when the trace buffer is full.
>
> For the case of AutoFDO, where the focus is coverage analysis, I do not
> believe that this is an issue. If we wanted to trace the exact execution
> path to a particular problem then we would have to consider other trace
> techniques to reduce the amount captured - filtering / triggering etc. The
> sheer amount of trace that can be generated vs buffer size is always a
> tricky compromise (though it is easier when the trace is streamed off chip
> to an external capture device where the buffer sizes are in the GB)
>
> Regards
>
> Mike
>
>
> On 15 December 2016 at 23:11, Sebastian Pop <sebpop@gmail.com> wrote:
>>
>> I have also seen a similar behavior on intel-pt when recording huge
>> traces:
>>
>> $ perf record -e intel_pt//u taskset -c 2 ./sort
>> Bubble sorting array of 70000 elements
>> 7462 ms
>> [ perf record: Woken up 25808 times to write data ]
>> Warning:
>> AUX data lost 8 times out of 51593!
>>
>> [ perf record: Captured and wrote 1616.765 MB perf.data ]
>>
>>
>>
>> On Thu, Dec 15, 2016 at 4:03 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>> > Using "taskset -c" seems to help collecting longer traces: the largest
>> > trace I have seen with taskset is 77MB and the largest trace that I
>> > see when the process is allowed to be moved to different cpus is 28MB.
>> >
>> > I also see warnings on dmesg:
>> > coresight-tmc 20070000.etr: timeout while waiting for completion of
>> > Manual Flush
>> >
>> > Sebastian
>> >
>> > ++ seq 1 10
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 10201 ms
>> > [ perf record: Woken up 24 times to write data ]
>> > Warning:
>> > AUX data lost 4 times out of 80!
>> >
>> > [ perf record: Captured and wrote 48.059 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 10378 ms
>> > [ perf record: Woken up 24 times to write data ]
>> > Warning:
>> > AUX data lost 9 times out of 68!
>> >
>> > [ perf record: Captured and wrote 52.353 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 11220 ms
>> > [ perf record: Woken up 29 times to write data ]
>> > Warning:
>> > AUX data lost 3 times out of 113!
>> >
>> > [ perf record: Captured and wrote 60.112 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 10912 ms
>> > [ perf record: Woken up 28 times to write data ]
>> > Warning:
>> > AUX data lost 5 times out of 99!
>> >
>> > [ perf record: Captured and wrote 57.794 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 11706 ms
>> > [ perf record: Woken up 32 times to write data ]
>> > Warning:
>> > AUX data lost 2 times out of 125!
>> >
>> > [ perf record: Captured and wrote 62.741 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 11314 ms
>> > [ perf record: Woken up 30 times to write data ]
>> > Warning:
>> > AUX data lost 4 times out of 105!
>> >
>> > [ perf record: Captured and wrote 59.480 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 11737 ms
>> > [ perf record: Woken up 35 times to write data ]
>> > Warning:
>> > AUX data lost 15 times out of 71!
>> >
>> > [ perf record: Captured and wrote 70.489 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 10418 ms
>> > [ perf record: Woken up 26 times to write data ]
>> > Warning:
>> > AUX data lost 9 times out of 69!
>> >
>> > [ perf record: Captured and wrote 51.939 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 11825 ms
>> > [ perf record: Woken up 26 times to write data ]
>> > Warning:
>> > AUX data lost 25 times out of 31!
>> >
>> > [ perf record: Captured and wrote 76.481 MB perf.data ]
>> > + for i in '$(seq 1 10)'
>> > + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> > --per-thread taskset -c 2 ./sort
>> > Bubble sorting array of 30000 elements
>> > 11824 ms
>> > [ perf record: Woken up 26 times to write data ]
>> > Warning:
>> > AUX data lost 25 times out of 31!
>> >
>> > [ perf record: Captured and wrote 76.712 MB perf.data ]
>> >
>> > On Thu, Dec 15, 2016 at 3:18 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>> >> Mathieu,
>> >> with today's perf-opencsd-4.9 branch I get larger traces,
>> >> although I still see big variability and warnings about lost data.
>> >> Changing the default ETR arm,buffer-size to 3MB I get larger traces,
>> >> and still variable length:
>> >>
>> >> 1MB ETR:
>> >>
>> >> ++ seq 1 10
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 7895 ms
>> >> [ perf record: Woken up 3 times to write data ]
>> >> Warning:
>> >> AUX data lost 3 times out of 4!
>> >>
>> >> [ perf record: Captured and wrote 3.116 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 6946 ms
>> >> [ perf record: Woken up 9 times to write data ]
>> >> Warning:
>> >> AUX data lost 9 times out of 15!
>> >>
>> >> [ perf record: Captured and wrote 10.019 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8245 ms
>> >> [ perf record: Woken up 7 times to write data ]
>> >> Warning:
>> >> AUX data lost 7 times out of 22!
>> >>
>> >> [ perf record: Captured and wrote 7.584 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 7181 ms
>> >> [ perf record: Woken up 5 times to write data ]
>> >> Warning:
>> >> AUX data lost 5 times out of 7!
>> >>
>> >> [ perf record: Captured and wrote 5.245 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8354 ms
>> >> [ perf record: Woken up 9 times to write data ]
>> >> Warning:
>> >> AUX data lost 8 times out of 17!
>> >>
>> >> [ perf record: Captured and wrote 9.167 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 6458 ms
>> >> [ perf record: Woken up 5 times to write data ]
>> >> Warning:
>> >> AUX data lost 4 times out of 10!
>> >>
>> >> [ perf record: Captured and wrote 5.535 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8247 ms
>> >> [ perf record: Woken up 6 times to write data ]
>> >> Warning:
>> >> AUX data lost 6 times out of 14!
>> >>
>> >> [ perf record: Captured and wrote 7.926 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8495 ms
>> >> [ perf record: Woken up 11 times to write data ]
>> >> Warning:
>> >> AUX data lost 9 times out of 33!
>> >>
>> >> [ perf record: Captured and wrote 12.972 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 7446 ms
>> >> [ perf record: Woken up 13 times to write data ]
>> >> Warning:
>> >> AUX data lost 11 times out of 24!
>> >>
>> >> [ perf record: Captured and wrote 13.055 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 6652 ms
>> >> [ perf record: Woken up 5 times to write data ]
>> >> Warning:
>> >> AUX data lost 5 times out of 9!
>> >>
>> >> [ perf record: Captured and wrote 5.426 MB perf.data ]
>> >>
>> >>
>> >>
>> >>
>> >> With 3MB ETR:
>> >>
>> >> ++ seq 1 10
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8283 ms
>> >> [ perf record: Woken up 10 times to write data ]
>> >> Warning:
>> >> AUX data lost 8 times out of 10!
>> >>
>> >> [ perf record: Captured and wrote 24.446 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 7418 ms
>> >> [ perf record: Woken up 8 times to write data ]
>> >> Warning:
>> >> AUX data lost 4 times out of 11!
>> >>
>> >> [ perf record: Captured and wrote 16.099 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 7869 ms
>> >> [ perf record: Woken up 10 times to write data ]
>> >> Warning:
>> >> AUX data lost 7 times out of 14!
>> >>
>> >> [ perf record: Captured and wrote 21.717 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 9250 ms
>> >> [ perf record: Woken up 11 times to write data ]
>> >> Warning:
>> >> AUX data lost 7 times out of 14!
>> >>
>> >> [ perf record: Captured and wrote 23.431 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 7748 ms
>> >> [ perf record: Woken up 6 times to write data ]
>> >> Warning:
>> >> AUX data lost 5 times out of 7!
>> >>
>> >> [ perf record: Captured and wrote 15.176 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8185 ms
>> >> [ perf record: Woken up 4 times to write data ]
>> >> Warning:
>> >> AUX data lost 4 times out of 5!
>> >>
>> >> [ perf record: Captured and wrote 12.007 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8886 ms
>> >> [ perf record: Woken up 11 times to write data ]
>> >> Warning:
>> >> AUX data lost 6 times out of 20!
>> >>
>> >> [ perf record: Captured and wrote 23.402 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8866 ms
>> >> [ perf record: Woken up 8 times to write data ]
>> >> Warning:
>> >> AUX data lost 6 times out of 12!
>> >>
>> >> [ perf record: Captured and wrote 19.989 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8994 ms
>> >> [ perf record: Woken up 10 times to write data ]
>> >> Warning:
>> >> AUX data lost 8 times out of 10!
>> >>
>> >> [ perf record: Captured and wrote 24.038 MB perf.data ]
>> >> + for i in '$(seq 1 10)'
>> >> + /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/@20070000.etr/u
>> >> --per-thread ./sort
>> >> Bubble sorting array of 30000 elements
>> >> 8753 ms
>> >> [ perf record: Woken up 13 times to write data ]
>> >> Warning:
>> >> AUX data lost 8 times out of 34!
>> >>
>> >> [ perf record: Captured and wrote 27.545 MB perf.data ]
>> _______________________________________________
>> CoreSight mailing list
>> CoreSight@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/coresight
>
>
>
>
> --
> Mike Leach
> Principal Engineer, ARM Ltd.
> Blackburn Design Centre. UK



--
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK



--
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK