Hi 
Thanks Leo for the clarifications. It is a flow problem betwen a producer and consumer here. You can handle it either by reducing the amount of traces generated before the context switch (use filters to only trace important sections of the code, reduce the time where the process is scheduled etc...) or increase the size of the buffer ( well, I think it is not possible in your case) or implement a flow control. And here I guess that this should be possible once the cti are fully supported on platforms where a cti is connected to the interrupt controller....
Is this use case considered for the cti drivers?

Kind Regards
Zied Guermazi

On Thu, 5 Mar 2020, 1:29 PM Leo Yan <leo.yan@linaro.org> wrote:
Hi Andrea,

On Thu, Mar 05, 2020 at 10:28:27AM +0000, Andrea Brunato wrote:
> Thank you Leo, this is really valuable!

You are welcome!

> I'm going to update my kernel version to the latest one - hopefully I'll manage to do this as soon as possible
>
> As I'm trying different configurations, I've got a very interesting result:
>
> $ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u  ~/afdo/coremark/coremark/coremark.exe  0x3415 0x3415 0x66 0 7 1 2000  > run2.log
> [ perf record: Woken up 28 times to write data ]
> Warning:
> AUX data lost 25 times out of 27!
>
> [ perf record: Captured and wrote 3.323 MB perf.data ]
>
> While instead, when NOT pinning the program to a specific core
>
> $ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u  ~/afdo/coremark/coremark/coremark.exe  0x3415 0x3415 0x66 0 7 1 2000  > run2.log
> [ perf record: Woken up 4 times to write data ]
> Warning:
> AUX data lost 4 times out of 4!
>
> [ perf record: Captured and wrote 0.502 MB perf.data ]
>
> While the information lost rate is still high, the `time` AUX data has been lost is very different: 27 vs 4
> Also the reported perf.data file is way bigger when pinning the task to a specific core.
>
> Interestingly enough, when instead tracing a short-lived program such as `ls`, there is no difference in the perf.data reported.
>
> Is anybody aware of any specific part in the code base whose behavior may change according to the traced program being rescheduled to another core?
> Any idea/suggestion is highly appreciated

As I know Arm CoreSight cannot produce interrupts on many platforms, so
every time Perf tool only captures trace data when the profiled program
is switched out from a CPU.

So when set the CPU affinity to CPU0 in your first command, usually,
CPU0 is the primary CPU for handling interrupts and many interrupt
threads run on it, thus this gives many chance for the profiled
program to be scheduled out, and finally, you could see Perf can
capture trace data for many times (27 times).

In your second command it doesn't use taskset.  With this command, Linux
kernel scheduler spreads tasks to different CPUs as possible, this
gives more chance for the profiled program to occupy a CPU without
scheduled out.  I think this is the main reason why in the second
command Perf tool only captured trace data for 4 times.

Thanks,
Leo