Hi Andrea,
On Thu, Mar 05, 2020 at 10:28:27AM +0000, Andrea Brunato wrote:
Thank you Leo, this is really valuable!
You are welcome!
I'm going to update my kernel version to the latest one - hopefully I'll manage to do this as soon as possible
As I'm trying different configurations, I've got a very interesting result:
$ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 28 times to write data ] Warning: AUX data lost 25 times out of 27!
[ perf record: Captured and wrote 3.323 MB perf.data ]
While instead, when NOT pinning the program to a specific core
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 4 times to write data ] Warning: AUX data lost 4 times out of 4!
[ perf record: Captured and wrote 0.502 MB perf.data ]
While the information lost rate is still high, the `time` AUX data has been lost is very different: 27 vs 4 Also the reported perf.data file is way bigger when pinning the task to a specific core.
Interestingly enough, when instead tracing a short-lived program such as `ls`, there is no difference in the perf.data reported.
Is anybody aware of any specific part in the code base whose behavior may change according to the traced program being rescheduled to another core? Any idea/suggestion is highly appreciated
As I know Arm CoreSight cannot produce interrupts on many platforms, so every time Perf tool only captures trace data when the profiled program is switched out from a CPU.
So when set the CPU affinity to CPU0 in your first command, usually, CPU0 is the primary CPU for handling interrupts and many interrupt threads run on it, thus this gives many chance for the profiled program to be scheduled out, and finally, you could see Perf can capture trace data for many times (27 times).
In your second command it doesn't use taskset. With this command, Linux kernel scheduler spreads tasks to different CPUs as possible, this gives more chance for the profiled program to occupy a CPU without scheduled out. I think this is the main reason why in the second command Perf tool only captured trace data for 4 times.
Thanks, Leo