Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/), kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is very tiny, given that the application runs for several seconds. Does anybody have any idea about what it may be causing this?
I've tried adding --mmap-pages=32K,64K to the command for increasing the buffer size, but it's not actually solving the problem
Thanks, Andrea IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Andrea,
On Tue, 3 Mar 2020 at 09:34, Andrea Brunato Andrea.Brunato@arm.com wrote:
Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
What is "sinkid=0xa6509eae" ?
[ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/), kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is very tiny, given that the application runs for several seconds. Does anybody have any idea about what it may be causing this?
I've tried adding --mmap-pages=32K,64K to the command for increasing the buffer size, but it's not actually solving the problem
Thanks, Andrea IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Andreas Is this a linux distribution or an android running on a linux kernel for android? Kind regards /Zied
On Tue, 3 Mar 2020, 7:23 PM Mathieu Poirier mathieu.poirier@linaro.org wrote:
Hi Andrea,
On Tue, 3 Mar 2020 at 09:34, Andrea Brunato Andrea.Brunato@arm.com wrote:
Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u
~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
What is "sinkid=0xa6509eae" ?
[ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/),
kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is
very tiny, given that the application runs for several seconds.
Does anybody have any idea about what it may be causing this?
I've tried adding --mmap-pages=32K,64K to the command for increasing the
buffer size, but it's not actually solving the problem
Thanks, Andrea IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Andrea,
On Tue, Mar 03, 2020 at 11:23:29AM -0700, Mathieu Poirier wrote:
Hi Andrea,
On Tue, 3 Mar 2020 at 09:34, Andrea Brunato Andrea.Brunato@arm.com wrote:
Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
What is "sinkid=0xa6509eae" ?
Sink might be ETF/ETR/ETB, I think this is why Mathieu asked the meaning for 'sinkid', which we don't which sink you are using.
[ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/), kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is very tiny, given that the application runs for several seconds. Does anybody have any idea about what it may be causing this?
Just give some info based on my understanding.
For the log "AUX data lost 6 times out of 6!", since you are using per-thread mode, so every time when the task switching out, the trace data in the sink will be read out and save into aux buffer. But usually the sink's buffer is not big (maximum to MBs), so it's easily to overwrite the hardware's cycle buffer and the previous trace data will be lost. For this reason, it reports "AUX data lost".
I've tried adding --mmap-pages=32K,64K to the command for increasing the buffer size, but it's not actually solving the problem
If it's the overflow in the hardware buffer in SRAM, "--mmap-pages" option doesn't work, since it only enlarges aux buffer page size but cannot change the SRAM size in CoreSight.
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Thanks, Leo
On Tue, 3 Mar 2020 at 18:54, Leo Yan leo.yan@linaro.org wrote:
Hi Andrea,
On Tue, Mar 03, 2020 at 11:23:29AM -0700, Mathieu Poirier wrote:
Hi Andrea,
On Tue, 3 Mar 2020 at 09:34, Andrea Brunato Andrea.Brunato@arm.com wrote:
Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
What is "sinkid=0xa6509eae" ?
Sink might be ETF/ETR/ETB, I think this is why Mathieu asked the meaning for 'sinkid', which we don't which sink you are using.
[ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/), kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is very tiny, given that the application runs for several seconds. Does anybody have any idea about what it may be causing this?
Just give some info based on my understanding.
For the log "AUX data lost 6 times out of 6!", since you are using per-thread mode, so every time when the task switching out, the trace data in the sink will be read out and save into aux buffer. But usually the sink's buffer is not big (maximum to MBs), so it's easily to overwrite the hardware's cycle buffer and the previous trace data will be lost. For this reason, it reports "AUX data lost".
I've tried adding --mmap-pages=32K,64K to the command for increasing the buffer size, but it's not actually solving the problem
If it's the overflow in the hardware buffer in SRAM, "--mmap-pages" option doesn't work, since it only enlarges aux buffer page size but cannot change the SRAM size in CoreSight.
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Everything Leo said above is accurate.
Thanks, Leo
Hi, Is this a linux distribution or an android running on a linux kernel for android? I'm running a linux distribution, Debian Buster What is "sinkid=0xa6509eae" ? It's the sinkid corresponding to tmc_etr0. When casting the perf command using @tmc_etr0 unfortunately I'm displayed the following error:
$ perf record --per-thread -e cs_etm/@tmc_etr/u -- ls
failed to set sink "" on event cs_etm/@tmc_etr/u with 21 (Is a directory)
so I found myself forced to manually specify the identifier. I think this problem has been solved in the latest kernel versions but I still have to double check.
My CoreSight subsystem is the following:
$ ls /sys/bus/coresight/devices/
etm0 etm1 etm2 etm3 etm4 etm5 etm6 etm7 funnel0 funnel1 funnel2 funnel3 funnel4 replicator0 stm0 tmc_etf0 tmc_etr0
$ grep . /sys/devices/cs_etm/sinks/* /sys/devices/cs_etm/sinks/tmc_etf0:0x97979549 /sys/devices/cs_etm/sinks/tmc_etr0:0xa6509eae
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Thanks for the suggestion! Is there a way for me to enforce this? Is it possible to check out how much data I'm losing/overwriting? Do you know what is the best buffer mode for avoiding losing data? How can I switch between a buffer mode and another one? Is `--per-thread` the best way to trace a single-threaded workload?
Thanks everybody for the interest on this! Andrea ________________________________ From: Mathieu Poirier mathieu.poirier@linaro.org Sent: 04 March 2020 02:43 To: leo.yan@linaro.org leo.yan@linaro.org Cc: Andrea Brunato Andrea.Brunato@arm.com; coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: perf record cs_etm: data lost
On Tue, 3 Mar 2020 at 18:54, Leo Yan leo.yan@linaro.org wrote:
Hi Andrea,
On Tue, Mar 03, 2020 at 11:23:29AM -0700, Mathieu Poirier wrote:
Hi Andrea,
On Tue, 3 Mar 2020 at 09:34, Andrea Brunato Andrea.Brunato@arm.com wrote:
Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
What is "sinkid=0xa6509eae" ?
Sink might be ETF/ETR/ETB, I think this is why Mathieu asked the meaning for 'sinkid', which we don't which sink you are using.
[ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/), kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is very tiny, given that the application runs for several seconds. Does anybody have any idea about what it may be causing this?
Just give some info based on my understanding.
For the log "AUX data lost 6 times out of 6!", since you are using per-thread mode, so every time when the task switching out, the trace data in the sink will be read out and save into aux buffer. But usually the sink's buffer is not big (maximum to MBs), so it's easily to overwrite the hardware's cycle buffer and the previous trace data will be lost. For this reason, it reports "AUX data lost".
I've tried adding --mmap-pages=32K,64K to the command for increasing the buffer size, but it's not actually solving the problem
If it's the overflow in the hardware buffer in SRAM, "--mmap-pages" option doesn't work, since it only enlarges aux buffer page size but cannot change the SRAM size in CoreSight.
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Everything Leo said above is accurate.
Thanks, Leo
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Andrea,
On Wed, 4 Mar 2020 at 09:50, Andrea Brunato Andrea.Brunato@arm.com wrote:
Hi,
Is this a linux distribution or an android running on a linux kernel for android?
I'm running a linux distribution, Debian Buster
What is "sinkid=0xa6509eae" ?
It's the sinkid corresponding to tmc_etr0. When casting the perf command using @tmc_etr0 unfortunately I'm displayed the following error:
$ perf record --per-thread -e cs_etm/@tmc_etr/u -- ls
failed to set sink "" on event cs_etm/@tmc_etr/u with 21 (Is a directory)
The command above is wrong - you have missed the '0' off tmc_etr0. However I would expect the error to be "failed to set sink "tmc_etr" on event cs_etm/@tmc_etr/u with 2 (No such file or directory)" if this was the only problem.
When I use the same command on my DB410 I get:
linaro@linaro-developer:~/perf-tests$ ./perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls perf perf-save-data.bash test_170428_1048.tgz test_170502_1550.tgz perf-200302_1356-cti-v10-lnk perf.data test_170428_1107.tgz test_170920_1514.tgz perf-4.11-rc1-arm64 perf.data.old test_170501_1307.tgz test_171101_1726.tgz perf-4.9-rc1-arm64 sort test_170501_1427.tgz test_190311_1240.tgz perf-5.0-rc6-cpu-wide sort_3k test_170501_1442.tgz trace-sysfs.bash perf-clean-data.bash sort_any test_170501_2033.tgz trace-sysfs.bash~ perf-gcc5.3-4.13-cs-etr-arm64 test_170307_1304.tgz test_170501_2116.tgz perf-gcc5.3-rel-ocsd-4-13-base test_170307_1308.tgz test_170501_2121.tgz [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.110 MB perf.data ]
Note - I have a local perf version built against the same tree as the kernel I am running. Ensure that you are doing the same otherwise odd things can happen.
so I found myself forced to manually specify the identifier. I think this problem has been solved in the latest kernel versions but I still have to double check.
My CoreSight subsystem is the following:
$ ls /sys/bus/coresight/devices/
etm0 etm1 etm2 etm3 etm4 etm5 etm6 etm7 funnel0 funnel1 funnel2 funnel3 funnel4 replicator0 stm0 tmc_etf0 tmc_etr0
$ grep . /sys/devices/cs_etm/sinks/* /sys/devices/cs_etm/sinks/tmc_etf0:0x97979549 /sys/devices/cs_etm/sinks/tmc_etr0:0xa6509eae
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Thanks for the suggestion! Is there a way for me to enforce this?
These are hardware dependent features. CATU is a CS-SoC600 hardware component that takes over the physical memory mapping from the ETR - which in SoC600 uses VA only. The ETR in CS-Soc400 had the mapping h/w build in that could in theory operate in Scatter-Gather (sg) or Contiguous (flat) memory mapping mode. However, some hardware implementations got the interconnect design wrong, and did not support certain read protocols that the SG mode in ETR requited, so the drivers were updated to have a device tree flag to indicate if sg was permitted. As I recall, sg was permitted in Juno, but others were limited to flat only.
Is it possible to check out how much data I'm losing/overwriting?
No, this is a hardware controlled circular buffer. The only information available to software is wrapped / not full.
Do you know what is the best buffer mode for avoiding losing data?
In general, CS-Soc400 based devices cannot halt the capture on full, as there is no available interrupt - hence buffers wrap and trace can be lost.
CS-SoC600 base devices could have this feature (which would match a similar feature that intel-pt has) assuming the hardware has the relevant designed connections. However, at present the drivers do not have the capability to use such an interrupt - as devices with the hardware have not been available to developers to date.
How can I switch between a buffer mode and another one?
See above - this is largely a hardware feature.
Is `--per-thread` the best way to trace a single-threaded workload?
Given the limited buffer size I would say probably yes. Using CPU wide would trace everything on the CPU(s) - including any other tasks running concurrently with that of interest.
Regards
Mike
Thanks everybody for the interest on this! Andrea ________________________________ From: Mathieu Poirier mathieu.poirier@linaro.org Sent: 04 March 2020 02:43 To: leo.yan@linaro.org leo.yan@linaro.org Cc: Andrea Brunato Andrea.Brunato@arm.com; coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: perf record cs_etm: data lost
On Tue, 3 Mar 2020 at 18:54, Leo Yan leo.yan@linaro.org wrote:
Hi Andrea,
On Tue, Mar 03, 2020 at 11:23:29AM -0700, Mathieu Poirier wrote:
Hi Andrea,
On Tue, 3 Mar 2020 at 09:34, Andrea Brunato Andrea.Brunato@arm.com wrote:
Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
What is "sinkid=0xa6509eae" ?
Sink might be ETF/ETR/ETB, I think this is why Mathieu asked the meaning for 'sinkid', which we don't which sink you are using.
[ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/), kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is very tiny, given that the application runs for several seconds. Does anybody have any idea about what it may be causing this?
Just give some info based on my understanding.
For the log "AUX data lost 6 times out of 6!", since you are using per-thread mode, so every time when the task switching out, the trace data in the sink will be read out and save into aux buffer. But usually the sink's buffer is not big (maximum to MBs), so it's easily to overwrite the hardware's cycle buffer and the previous trace data will be lost. For this reason, it reports "AUX data lost".
I've tried adding --mmap-pages=32K,64K to the command for increasing the buffer size, but it's not actually solving the problem
If it's the overflow in the hardware buffer in SRAM, "--mmap-pages" option doesn't work, since it only enlarges aux buffer page size but cannot change the SRAM size in CoreSight.
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Everything Leo said above is accurate.
Thanks, Leo
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Thank you Mike - I'm getting a way better understanding of the whole system.
Sorry for sharing a wrong command, unfortunately on 5.5.0-rc7 I see the same error persisting when running:
$ perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls failed to set sink "" on event cs_etm/@tmc_etr0/u with 21 (Is a directory)
I'm using perf built as an Arm binary against the same kernel source code while linked against openCSD.
When I use the same command on my DB410 I get:
linaro@linaro-developer:~/perf-tests$ ./perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls perf perf-save-data.bash test_170428_1048.tgz test_170502_1550.tgz perf-200302_1356-cti-v10-lnk perf.data test_170428_1107.tgz test_170920_1514.tgz perf-4.11-rc1-arm64 perf.data.old test_170501_1307.tgz test_171101_1726.tgz perf-4.9-rc1-arm64 sort test_170501_1427.tgz test_190311_1240.tgz perf-5.0-rc6-cpu-wide sort_3k test_170501_1442.tgz trace-sysfs.bash perf-clean-data.bash sort_any test_170501_2033.tgz trace-sysfs.bash~ perf-gcc5.3-4.13-cs-etr-arm64 test_170307_1304.tgz test_170501_2116.tgz perf-gcc5.3-rel-ocsd-4-13-base test_170307_1308.tgz test_170501_2121.tgz [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.110 MB perf.data ]
This is interesting! Do you get any warning when tracing, with the same perf command, any single-threaded workload which takes several seconds to run? How big is your perf.data file?
When running `perf report -D -i perf.data` I'm able to see the trace data correctly:
0: I_ASYNC : Alignment Synchronisation. 12: I_TRACE_INFO : Trace Info. 17: I_TRACE_ON : Trace On. 18: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F24D80; Ctxt: AArch64,EL0, NS; 28: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE 29: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE 30: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE 32: I_ATOM_F6 : Atom format 6.; EEEEN 33: I_ATOM_F1 : Atom format 1.; E 34: I_EXCEPT : Exception.; Data Fault; Ret Addr Follows; 36: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; 45: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS; 56: I_TRACE_ON : Trace On. 57: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; Ctxt: AArch64,EL0, NS; 68: I_ATOM_F3 : Atom format 3.; NEE 69: I_ATOM_F3 : Atom format 3.; NEN
Is there any reference about the format in which the trace is here represented? I would just like to know what I_ASYNC, I_TRACE_INFO, I_TRACE_ON, I_ADDR_CTXT_L_64IS0 ,I_ATOM_F3 and all the other labels exactly mean
Thanks, Andrea
________________________________ From: Mike Leach mike.leach@linaro.org Sent: 04 March 2020 11:14 To: Andrea Brunato Andrea.Brunato@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org; leo.yan@linaro.org leo.yan@linaro.org; guermazi.zied@gmail.com guermazi.zied@gmail.com; coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: perf record cs_etm: data lost
Hi Andrea,
On Wed, 4 Mar 2020 at 09:50, Andrea Brunato Andrea.Brunato@arm.com wrote:
Hi,
Is this a linux distribution or an android running on a linux kernel for android?
I'm running a linux distribution, Debian Buster
What is "sinkid=0xa6509eae" ?
It's the sinkid corresponding to tmc_etr0. When casting the perf command using @tmc_etr0 unfortunately I'm displayed the following error:
$ perf record --per-thread -e cs_etm/@tmc_etr/u -- ls
failed to set sink "" on event cs_etm/@tmc_etr/u with 21 (Is a directory)
The command above is wrong - you have missed the '0' off tmc_etr0. However I would expect the error to be "failed to set sink "tmc_etr" on event cs_etm/@tmc_etr/u with 2 (No such file or directory)" if this was the only problem.
When I use the same command on my DB410 I get:
linaro@linaro-developer:~/perf-tests$ ./perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls perf perf-save-data.bash test_170428_1048.tgz test_170502_1550.tgz perf-200302_1356-cti-v10-lnk perf.data test_170428_1107.tgz test_170920_1514.tgz perf-4.11-rc1-arm64 perf.data.old test_170501_1307.tgz test_171101_1726.tgz perf-4.9-rc1-arm64 sort test_170501_1427.tgz test_190311_1240.tgz perf-5.0-rc6-cpu-wide sort_3k test_170501_1442.tgz trace-sysfs.bash perf-clean-data.bash sort_any test_170501_2033.tgz trace-sysfs.bash~ perf-gcc5.3-4.13-cs-etr-arm64 test_170307_1304.tgz test_170501_2116.tgz perf-gcc5.3-rel-ocsd-4-13-base test_170307_1308.tgz test_170501_2121.tgz [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.110 MB perf.data ]
Note - I have a local perf version built against the same tree as the kernel I am running. Ensure that you are doing the same otherwise odd things can happen.
so I found myself forced to manually specify the identifier. I think this problem has been solved in the latest kernel versions but I still have to double check.
My CoreSight subsystem is the following:
$ ls /sys/bus/coresight/devices/
etm0 etm1 etm2 etm3 etm4 etm5 etm6 etm7 funnel0 funnel1 funnel2 funnel3 funnel4 replicator0 stm0 tmc_etf0 tmc_etr0
$ grep . /sys/devices/cs_etm/sinks/* /sys/devices/cs_etm/sinks/tmc_etf0:0x97979549 /sys/devices/cs_etm/sinks/tmc_etr0:0xa6509eae
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Thanks for the suggestion! Is there a way for me to enforce this?
These are hardware dependent features. CATU is a CS-SoC600 hardware component that takes over the physical memory mapping from the ETR - which in SoC600 uses VA only. The ETR in CS-Soc400 had the mapping h/w build in that could in theory operate in Scatter-Gather (sg) or Contiguous (flat) memory mapping mode. However, some hardware implementations got the interconnect design wrong, and did not support certain read protocols that the SG mode in ETR requited, so the drivers were updated to have a device tree flag to indicate if sg was permitted. As I recall, sg was permitted in Juno, but others were limited to flat only.
Is it possible to check out how much data I'm losing/overwriting?
No, this is a hardware controlled circular buffer. The only information available to software is wrapped / not full.
Do you know what is the best buffer mode for avoiding losing data?
In general, CS-Soc400 based devices cannot halt the capture on full, as there is no available interrupt - hence buffers wrap and trace can be lost.
CS-SoC600 base devices could have this feature (which would match a similar feature that intel-pt has) assuming the hardware has the relevant designed connections. However, at present the drivers do not have the capability to use such an interrupt - as devices with the hardware have not been available to developers to date.
How can I switch between a buffer mode and another one?
See above - this is largely a hardware feature.
Is `--per-thread` the best way to trace a single-threaded workload?
Given the limited buffer size I would say probably yes. Using CPU wide would trace everything on the CPU(s) - including any other tasks running concurrently with that of interest.
Regards
Mike
Thanks everybody for the interest on this! Andrea ________________________________ From: Mathieu Poirier mathieu.poirier@linaro.org Sent: 04 March 2020 02:43 To: leo.yan@linaro.org leo.yan@linaro.org Cc: Andrea Brunato Andrea.Brunato@arm.com; coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: perf record cs_etm: data lost
On Tue, 3 Mar 2020 at 18:54, Leo Yan leo.yan@linaro.org wrote:
Hi Andrea,
On Tue, Mar 03, 2020 at 11:23:29AM -0700, Mathieu Poirier wrote:
Hi Andrea,
On Tue, 3 Mar 2020 at 09:34, Andrea Brunato Andrea.Brunato@arm.com wrote:
Good afternoon,
When using perf to record a simple benchmark I see the following warning:
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
What is "sinkid=0xa6509eae" ?
Sink might be ETF/ETR/ETB, I think this is why Mathieu asked the meaning for 'sinkid', which we don't which sink you are using.
[ perf record: Woken up 6 times to write data ] Warning: AUX data lost 6 times out of 6!
[ perf record: Captured and wrote 0.378 MB perf.data ]
This is on DB845c (https://www.96boards.org/product/rb3-platform/), kernel 5.5.0-rc7.
It looks like perf is losing data and the reported perf.data file is very tiny, given that the application runs for several seconds. Does anybody have any idea about what it may be causing this?
Just give some info based on my understanding.
For the log "AUX data lost 6 times out of 6!", since you are using per-thread mode, so every time when the task switching out, the trace data in the sink will be read out and save into aux buffer. But usually the sink's buffer is not big (maximum to MBs), so it's easily to overwrite the hardware's cycle buffer and the previous trace data will be lost. For this reason, it reports "AUX data lost".
I've tried adding --mmap-pages=32K,64K to the command for increasing the buffer size, but it's not actually solving the problem
If it's the overflow in the hardware buffer in SRAM, "--mmap-pages" option doesn't work, since it only enlarges aux buffer page size but cannot change the SRAM size in CoreSight.
To allow the "--mmap-pages" to work with big buffer, the sink needs to directly output trace data to DDR, you also need to take care what's the buffer mode you are using (flat, sg, CATU).
Everything Leo said above is accurate.
Thanks, Leo
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi allI am adding another question for my understanding. This may also help reducing the lost traces. When does perf pmu collects the traces from etf/etb buffer? Is it when the process gets scheduled (or unscheduled)?Kind RegardsZied Guermazi
Sent from Yahoo Mail on Android
On Wed, Mar 4, 2020 at 4:46 PM, Andrea BrunatoAndrea.Brunato@arm.com wrote: _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Andrea,
On Wed, Mar 04, 2020 at 03:45:59PM +0000, Andrea Brunato wrote:
Thank you Mike - I'm getting a way better understanding of the whole system.
Sorry for sharing a wrong command, unfortunately on 5.5.0-rc7 I see the same error persisting when running:
$ perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls failed to set sink "" on event cs_etm/@tmc_etr0/u with 21 (Is a directory)
I'm using perf built as an Arm binary against the same kernel source code while linked against openCSD.
Just want to remind, please check if these two patches [1][2] have been merged into your local code base, otherwise, before we found the regression for inputting sink name in perf command.
Thanks, Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Thank you Leo, this is really valuable! I'm going to update my kernel version to the latest one - hopefully I'll manage to do this as soon as possible
As I'm trying different configurations, I've got a very interesting result:
$ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 28 times to write data ] Warning: AUX data lost 25 times out of 27!
[ perf record: Captured and wrote 3.323 MB perf.data ]
While instead, when NOT pinning the program to a specific core
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 4 times to write data ] Warning: AUX data lost 4 times out of 4!
[ perf record: Captured and wrote 0.502 MB perf.data ]
While the information lost rate is still high, the `time` AUX data has been lost is very different: 27 vs 4 Also the reported perf.data file is way bigger when pinning the task to a specific core.
Interestingly enough, when instead tracing a short-lived program such as `ls`, there is no difference in the perf.data reported.
Is anybody aware of any specific part in the code base whose behavior may change according to the traced program being rescheduled to another core? Any idea/suggestion is highly appreciated
Thanks, Andrea
________________________________ From: Leo Yan leo.yan@linaro.org Sent: 05 March 2020 01:55 To: Andrea Brunato Andrea.Brunato@arm.com Cc: Mike Leach mike.leach@linaro.org; Mathieu Poirier mathieu.poirier@linaro.org; guermazi.zied@gmail.com guermazi.zied@gmail.com; coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: perf record cs_etm: data lost
Hi Andrea,
On Wed, Mar 04, 2020 at 03:45:59PM +0000, Andrea Brunato wrote:
Thank you Mike - I'm getting a way better understanding of the whole system.
Sorry for sharing a wrong command, unfortunately on 5.5.0-rc7 I see the same error persisting when running:
$ perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls failed to set sink "" on event cs_etm/@tmc_etr0/u with 21 (Is a directory)
I'm using perf built as an Arm binary against the same kernel source code while linked against openCSD.
Just want to remind, please check if these two patches [1][2] have been merged into your local code base, otherwise, before we found the regression for inputting sink name in perf command.
Thanks, Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Andrea,
On Thu, Mar 05, 2020 at 10:28:27AM +0000, Andrea Brunato wrote:
Thank you Leo, this is really valuable!
You are welcome!
I'm going to update my kernel version to the latest one - hopefully I'll manage to do this as soon as possible
As I'm trying different configurations, I've got a very interesting result:
$ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 28 times to write data ] Warning: AUX data lost 25 times out of 27!
[ perf record: Captured and wrote 3.323 MB perf.data ]
While instead, when NOT pinning the program to a specific core
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 4 times to write data ] Warning: AUX data lost 4 times out of 4!
[ perf record: Captured and wrote 0.502 MB perf.data ]
While the information lost rate is still high, the `time` AUX data has been lost is very different: 27 vs 4 Also the reported perf.data file is way bigger when pinning the task to a specific core.
Interestingly enough, when instead tracing a short-lived program such as `ls`, there is no difference in the perf.data reported.
Is anybody aware of any specific part in the code base whose behavior may change according to the traced program being rescheduled to another core? Any idea/suggestion is highly appreciated
As I know Arm CoreSight cannot produce interrupts on many platforms, so every time Perf tool only captures trace data when the profiled program is switched out from a CPU.
So when set the CPU affinity to CPU0 in your first command, usually, CPU0 is the primary CPU for handling interrupts and many interrupt threads run on it, thus this gives many chance for the profiled program to be scheduled out, and finally, you could see Perf can capture trace data for many times (27 times).
In your second command it doesn't use taskset. With this command, Linux kernel scheduler spreads tasks to different CPUs as possible, this gives more chance for the profiled program to occupy a CPU without scheduled out. I think this is the main reason why in the second command Perf tool only captured trace data for 4 times.
Thanks, Leo
Hi Thanks Leo for the clarifications. It is a flow problem betwen a producer and consumer here. You can handle it either by reducing the amount of traces generated before the context switch (use filters to only trace important sections of the code, reduce the time where the process is scheduled etc...) or increase the size of the buffer ( well, I think it is not possible in your case) or implement a flow control. And here I guess that this should be possible once the cti are fully supported on platforms where a cti is connected to the interrupt controller.... Is this use case considered for the cti drivers?
Kind Regards Zied Guermazi
On Thu, 5 Mar 2020, 1:29 PM Leo Yan leo.yan@linaro.org wrote:
Hi Andrea,
On Thu, Mar 05, 2020 at 10:28:27AM +0000, Andrea Brunato wrote:
Thank you Leo, this is really valuable!
You are welcome!
I'm going to update my kernel version to the latest one - hopefully I'll
manage to do this as soon as possible
As I'm trying different configurations, I've got a very interesting
result:
$ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u
~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
[ perf record: Woken up 28 times to write data ] Warning: AUX data lost 25 times out of 27!
[ perf record: Captured and wrote 3.323 MB perf.data ]
While instead, when NOT pinning the program to a specific core
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u
~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
[ perf record: Woken up 4 times to write data ] Warning: AUX data lost 4 times out of 4!
[ perf record: Captured and wrote 0.502 MB perf.data ]
While the information lost rate is still high, the `time` AUX data has
been lost is very different: 27 vs 4
Also the reported perf.data file is way bigger when pinning the task to
a specific core.
Interestingly enough, when instead tracing a short-lived program such as
`ls`, there is no difference in the perf.data reported.
Is anybody aware of any specific part in the code base whose behavior
may change according to the traced program being rescheduled to another core?
Any idea/suggestion is highly appreciated
As I know Arm CoreSight cannot produce interrupts on many platforms, so every time Perf tool only captures trace data when the profiled program is switched out from a CPU.
So when set the CPU affinity to CPU0 in your first command, usually, CPU0 is the primary CPU for handling interrupts and many interrupt threads run on it, thus this gives many chance for the profiled program to be scheduled out, and finally, you could see Perf can capture trace data for many times (27 times).
In your second command it doesn't use taskset. With this command, Linux kernel scheduler spreads tasks to different CPUs as possible, this gives more chance for the profiled program to occupy a CPU without scheduled out. I think this is the main reason why in the second command Perf tool only captured trace data for 4 times.
Thanks, Leo
On Thu, 5 Mar 2020 at 03:28, Andrea Brunato Andrea.Brunato@arm.com wrote:
Thank you Leo, this is really valuable! I'm going to update my kernel version to the latest one - hopefully I'll manage to do this as soon as possible
As I'm trying different configurations, I've got a very interesting result:
$ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
Once again examples include "sinkid=0xa6509eae", a syntax that was __never__ supported in any version of the code that was published. Since there is now way for me to know where that code comes from, the platform it is running on or the modifications that were done to it, I will not answer any question or follow email threads where that syntax is used.
Thanks, Mathieu
[ perf record: Woken up 28 times to write data ] Warning: AUX data lost 25 times out of 27!
[ perf record: Captured and wrote 3.323 MB perf.data ]
While instead, when NOT pinning the program to a specific core
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 4 times to write data ] Warning: AUX data lost 4 times out of 4!
[ perf record: Captured and wrote 0.502 MB perf.data ]
While the information lost rate is still high, the `time` AUX data has been lost is very different: 27 vs 4 Also the reported perf.data file is way bigger when pinning the task to a specific core.
Interestingly enough, when instead tracing a short-lived program such as `ls`, there is no difference in the perf.data reported.
Is anybody aware of any specific part in the code base whose behavior may change according to the traced program being rescheduled to another core? Any idea/suggestion is highly appreciated
Thanks, Andrea
From: Leo Yan leo.yan@linaro.org Sent: 05 March 2020 01:55 To: Andrea Brunato Andrea.Brunato@arm.com Cc: Mike Leach mike.leach@linaro.org; Mathieu Poirier mathieu.poirier@linaro.org; guermazi.zied@gmail.com guermazi.zied@gmail.com; coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: perf record cs_etm: data lost
Hi Andrea,
On Wed, Mar 04, 2020 at 03:45:59PM +0000, Andrea Brunato wrote:
Thank you Mike - I'm getting a way better understanding of the whole system.
Sorry for sharing a wrong command, unfortunately on 5.5.0-rc7 I see the same error persisting when running:
$ perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls failed to set sink "" on event cs_etm/@tmc_etr0/u with 21 (Is a directory)
I'm using perf built as an Arm binary against the same kernel source code while linked against openCSD.
Just want to remind, please check if these two patches [1][2] have been merged into your local code base, otherwise, before we found the regression for inputting sink name in perf command.
Thanks, Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Thu, 5 Mar 2020 at 08:30, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On Thu, 5 Mar 2020 at 03:28, Andrea Brunato Andrea.Brunato@arm.com wrote:
Thank you Leo, this is really valuable! I'm going to update my kernel version to the latest one - hopefully I'll manage to do this as soon as possible
As I'm trying different configurations, I've got a very interesting result:
$ taskset -c 0 perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log
Once again examples include "sinkid=0xa6509eae", a syntax that was __never__ supported in any version of the code that was published. Since there is now way for me to know where that code comes from, the platform it is running on or the modifications that were done to it, I will not answer any question or follow email threads where that syntax is used.
After going back to the email thread on this topic I realise that I missed the reply where you did answer my question on the "sinkid" parameter. Missing that information prompted the above rebuke, something I hereby apologise for.
The "sinkid" parameter is published to be conformant to the perf framework and avoid problems with automated kernel interface testers. That being said we always encourage people to use the documented syntax. If that returns an error something else is wrong and should be fixed first.
Regards, Mathieu
Thanks, Mathieu
[ perf record: Woken up 28 times to write data ] Warning: AUX data lost 25 times out of 27!
[ perf record: Captured and wrote 3.323 MB perf.data ]
While instead, when NOT pinning the program to a specific core
$ perf record --per-thread -e cs_etm/sinkid=0xa6509eae/u ~/afdo/coremark/coremark/coremark.exe 0x3415 0x3415 0x66 0 7 1 2000 > run2.log [ perf record: Woken up 4 times to write data ] Warning: AUX data lost 4 times out of 4!
[ perf record: Captured and wrote 0.502 MB perf.data ]
While the information lost rate is still high, the `time` AUX data has been lost is very different: 27 vs 4 Also the reported perf.data file is way bigger when pinning the task to a specific core.
Interestingly enough, when instead tracing a short-lived program such as `ls`, there is no difference in the perf.data reported.
Is anybody aware of any specific part in the code base whose behavior may change according to the traced program being rescheduled to another core? Any idea/suggestion is highly appreciated
Thanks, Andrea
From: Leo Yan leo.yan@linaro.org Sent: 05 March 2020 01:55 To: Andrea Brunato Andrea.Brunato@arm.com Cc: Mike Leach mike.leach@linaro.org; Mathieu Poirier mathieu.poirier@linaro.org; guermazi.zied@gmail.com guermazi.zied@gmail.com; coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: perf record cs_etm: data lost
Hi Andrea,
On Wed, Mar 04, 2020 at 03:45:59PM +0000, Andrea Brunato wrote:
Thank you Mike - I'm getting a way better understanding of the whole system.
Sorry for sharing a wrong command, unfortunately on 5.5.0-rc7 I see the same error persisting when running:
$ perf record --per-thread -e cs_etm/@tmc_etr0/u -- ls failed to set sink "" on event cs_etm/@tmc_etr0/u with 21 (Is a directory)
I'm using perf built as an Arm binary against the same kernel source code while linked against openCSD.
Just want to remind, please check if these two patches [1][2] have been merged into your local code base, otherwise, before we found the regression for inputting sink name in perf command.
Thanks, Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.