Hi Thanks Leo for the clarifications. It is a flow problem betwen a producer and consumer here. You can handle it either by reducing the amount of traces generated before the context switch (use filters to only trace important sections of the code, reduce the time where the process is scheduled etc...) or increase the size of the buffer ( well, I think it is not possible in your case) or implement a flow control. And here I guess that this should be possible once the cti are fully supported on platforms where a cti is connected to the interrupt controller.... Is this use case considered for the cti drivers?
Kind Regards Zied Guermazi
On Thu, 5 Mar 2020, 1:29 PM Leo Yan leo.yan@linaro.org wrote:
Hi Andrea,
On Thu, Mar 05, 2020 at 10:28:27AM +0000, Andrea Brunato wrote:
Thank you Leo, this is really valuable!
You are welcome!
I'm going to update my kernel version to the latest one - hopefully I'll
manage to do this as soon as possible
As I'm trying different configurations, I've got a very interesting
result:
$ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u
~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
[ perf record: Woken up 28 times to write data ] Warning: AUX data lost 25 times out of 27!
[ perf record: Captured and wrote 3.323 MB perf.data ]
While instead, when NOT pinning the program to a specific core
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u
~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
[ perf record: Woken up 4 times to write data ] Warning: AUX data lost 4 times out of 4!
[ perf record: Captured and wrote 0.502 MB perf.data ]
While the information lost rate is still high, the `time` AUX data has
been lost is very different: 27 vs 4
Also the reported perf.data file is way bigger when pinning the task to
a specific core.
Interestingly enough, when instead tracing a short-lived program such as
`ls`, there is no difference in the perf.data reported.
Is anybody aware of any specific part in the code base whose behavior
may change according to the traced program being rescheduled to another core?
Any idea/suggestion is highly appreciated
As I know Arm CoreSight cannot produce interrupts on many platforms, so every time Perf tool only captures trace data when the profiled program is switched out from a CPU.
So when set the CPU affinity to CPU0 in your first command, usually, CPU0 is the primary CPU for handling interrupts and many interrupt threads run on it, thus this gives many chance for the profiled program to be scheduled out, and finally, you could see Perf can capture trace data for many times (27 times).
In your second command it doesn't use taskset. With this command, Linux kernel scheduler spreads tasks to different CPUs as possible, this gives more chance for the profiled program to occupy a CPU without scheduled out. I think this is the main reason why in the second command Perf tool only captured trace data for 4 times.
Thanks, Leo