Hi all,
[ + CoreSight Mailing List ]
On the mainlne kernel with the latest commit 407ab579637c "Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm", the perf cs-etm testing fails.
I found the perf cs-etm testing itself has a typo issue and a regression introduced by Perf tool in the recent mainline kernel, so I sent out the two patches for fixing the related issues [1].
Even after applying these two fixings for the testing script, I still observe the etm testing failure with snapshot mode. I manully checked the perf data and confirmed the perf data contains the raw trace data for snapshot (so 'perf script -D' can work well), but it fails to generate any branch samples, so the command 'perf script' will not output any samples.
Seems to me, this issue is more likely related with driver or Perf tool's common code change but not the testing itself; Before I proceed to use 'git bisect' to narrow down the snapshot mode failure, if you have any idea for fixing this issue, please let me know. Otherwise, I will dig into this issue and keep posted.
Thanks, Leo
On Tue, Nov 10, 2020 at 02:52:43PM +0800, Leo Yan wrote:
Hi all,
[ + CoreSight Mailing List ]
On the mainlne kernel with the latest commit 407ab579637c "Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm", the perf cs-etm testing fails.
I found the perf cs-etm testing itself has a typo issue and a regression introduced by Perf tool in the recent mainline kernel, so I sent out the two patches for fixing the related issues [1].
Even after applying these two fixings for the testing script, I still observe the etm testing failure with snapshot mode. I manully checked the perf data and confirmed the perf data contains the raw trace data for snapshot (so 'perf script -D' can work well), but it fails to generate any branch samples, so the command 'perf script' will not output any samples.
In case you want to reproduce this issue with perf testing, please use below steps:
- Apply the two patches on the mainline kernel:
$ b4 am 20201110063417.14467-1-leo.yan@linaro.org $ git am ./20201110_leo_yan_perf_test_fix_a_typo_in_cs_etm_testing.mbx
- On the board:
# cd $linux/tools/perf # make VF=1 CORESIGHT=1 # ./perf test list ... 72: Check Arm CoreSight trace data recording and synthesized samples ...
# ./perf test 72 -v ... Recording trace with snapshot mode Couldn't synthesize bpf events. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 8.014 MB /tmp/__perf_test.perf.data.HdqlK ] Looking at perf.data file for dumping branch samples: aaa test child finished with -1 ---- end ---- Check Arm CoreSight trace data recording and synthesized samples: FAILED!
So you could see the testing can pass for traversing ETM with sinks and system wide mode, but fails with snapshot mode.
Thanks, Leo
On Tue, Nov 10, 2020 at 02:52:43PM +0800, Leo Yan wrote:
Hi all,
[ + CoreSight Mailing List ]
On the mainlne kernel with the latest commit 407ab579637c "Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm", the perf cs-etm testing fails.
I found the perf cs-etm testing itself has a typo issue and a regression introduced by Perf tool in the recent mainline kernel, so I sent out the two patches for fixing the related issues [1].
Even after applying these two fixings for the testing script, I still observe the etm testing failure with snapshot mode. I manully checked the perf data and confirmed the perf data contains the raw trace data for snapshot (so 'perf script -D' can work well), but it fails to generate any branch samples, so the command 'perf script' will not output any samples.
Seems to me, this issue is more likely related with driver or Perf tool's common code change but not the testing itself;
If the trace data is available then it is likely the kernel drivers worked properly. I would concentrate on the perf tools - with the amount of code churn in that subsystem many regressions have been introduced before.
to use 'git bisect' to narrow down the snapshot mode failure, if you have any idea for fixing this issue, please let me know. Otherwise, I will dig into this issue and keep posted.
I'm afraid a bisect is your only option.
Thanks for looking into this, Mathieu
Thanks, Leo
Hi all,
On Tue, Nov 10, 2020 at 11:14:16AM -0700, Mathieu Poirier wrote:
On Tue, Nov 10, 2020 at 02:52:43PM +0800, Leo Yan wrote:
Hi all,
[ + CoreSight Mailing List ]
On the mainlne kernel with the latest commit 407ab579637c "Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm", the perf cs-etm testing fails.
I found the perf cs-etm testing itself has a typo issue and a regression introduced by Perf tool in the recent mainline kernel, so I sent out the two patches for fixing the related issues [1].
Even after applying these two fixings for the testing script, I still observe the etm testing failure with snapshot mode. I manully checked the perf data and confirmed the perf data contains the raw trace data for snapshot (so 'perf script -D' can work well), but it fails to generate any branch samples, so the command 'perf script' will not output any samples.
Seems to me, this issue is more likely related with driver or Perf tool's common code change but not the testing itself;
If the trace data is available then it is likely the kernel drivers worked properly. I would concentrate on the perf tools - with the amount of code churn in that subsystem many regressions have been introduced before.
to use 'git bisect' to narrow down the snapshot mode failure, if you have any idea for fixing this issue, please let me know. Otherwise, I will dig into this issue and keep posted.
I'm afraid a bisect is your only option.
Firstly, sorry for this is a long replying, in this email I will try to explain what's the problem, but I am also seeking your suggestions for how to fix the issue.
Let me give out a quick way for reproducing the issue:
# sh test_cs_etm_snapshot.sh => it is pasted at the end of email # perf script => it cannot output any samples for snapshot mode
Here have a question: why tool cannot output samples for both kernel and user space programs?
### Failure for parsing Kernel symbols
For output the kernel's trace data, perf needs to read out the symbols from the kernel's DSO. As far as I know, there have three ways to import the kernel's symbols:
- By default, perf tool to read out the runtime kernel symbols from /proc/kallsyms and /proc/kcore; to support both of these two proc nodes, we need to enable kernel configurations: CONFIG_PROC_KCORE=y CONFIG_KALLSYMS=y - Copy vmlinux in the 'pwd' folder when run 'perf' command; - Specify the vmlinux path with option '-k'.
For my case, since I missed to enable configs for KCORE and doesn't copy or specify vmlinux, thus the kernel symbols are failed to parsed and cannot synthesize the samples [1]. After enabled KCORE option, then tool outputs the kernel symbols properly. Since there have no any warning log to remind the failure for finding DSO or symbols, and I encounered the similiar issues and usually it took me serveral hours or even more time to locate the issue. So share the info at here and also I think it's better to commit patch to improve the verbose log.
### Failure for parsing user space symbols
After the kernel symbols parsing is fixed, the test case can pass. But the difficult part is for the userspace symbol's parsing. Sometime, we even don't notice that it fails to synthesize samples for the userspace trace data.
The rationale is: perf tool imports the userspace's DSOs when handling events PERF_RECORD_MMAP2; in theory, we should expect the events PERF_RECORD_MMAP2 should arrive ahead event PERF_RECORD_AUX, so that perf tool can load the userspace DSOs and get ready the symbol tree before parsing any address in the userspace.
But I found the events PERF_RECORD_MMAP2 and PERF_RECORD_AUX are interleaved with each other in the perf.data file and it's possible to process all AUX trace data but without user space program's DSO, thus we cannot see any samples from userspace. This is why I sent RFC patch to 'force' the PERF_RECORD_MMAP2 events to be processed prior to PERF_RECORD_AUX [2]; but the change in the RFC patch is more likely a workaround rather than a real fixing. The reason is if there have PERF_RECORD_AUX event is coming and even the DSO is not prepared, it should skip a very small amount trace data and exit from the perf cs-etm decoding loop, and perf tool can continue to import DSOs based on the sequential PERF_RECORD_MMAP2 events, afterwards the perf tool can decode CoreSight trace data and generate samples.
This is a common issue for all AUX trace cases (like cs-etm, SPE, Intel-pt, etc), after looking into the code for Intel-PT, I think Intel-pt uses the timestamp to correlate different events. The key point is it pushes the first Intel-PT's sample with timestamp in the auxtrace heap [3], if any new event's timestamp is prior to Intel-PT's trace timestamp, Intel-pt's decoding loop will bail out [4] and process other events ahead. Simply to say, the events are handled in the order of the timestamp.
I think this approach can be applied on Arm SPE, since Arm SPE uses arch timer's counter as timestamp, Arm SPE trace and other events can be correlated with each other with the same timeline axis.
For CoreSight, I'm not sure how to proceed; the Arch timer counter is not mandatory for connecting CoreSight, but if some platform has used Arch timer counter for CoreSight (Like Juno board), should we move forward for this?
Otherwise, as Adrian Hunter suggested: "If you are processing data based on PERF_RECORD_AUX timestamp then you need to pay attention to the offset. PERF_RECORD_AUX gives you aux_offset/aux_size and auxtrace_buffer (which may contain data from several PERF_RECORD_AUX) gives you offset/size.", so should we extract the aux_offset and aux_size for PERF_RECORD_AUX event and every time only decode the trace data no more than aux_size for every PERF_RECORD_AUX event? This means we need to add additional checking for buffer length in the functions cs_etm__process_queues() and cs_etm__run_decoder() to avoid processing any trace data not belonging to the event.
How about you think for this? Thanks for reading it!
Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... [2] https://lore.kernel.org/patchwork/patch/1340992/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
---8<---
test_cs_etm_snapshot.sh:
#!/bin/sh
arm_cs_etm_snapshot_test() { echo "Recording trace with snapshot mode"
perf record -e cs_etm/@tmc_etr0/ -S \ -- dd if=/dev/zero of=/dev/null &
PERFPID=$!
# Wait for perf program sleep 2
# Send signal to snapshot trace data kill -USR2 $PERFPID
# Stop perf program kill $PERFPID wait $PERFPID }
arm_cs_etm_snapshot_test exit 0
Hi Leo,
I've been looking into an issue with CoreSight decode of ETM data in perf, where, for some reason on some target, it appears that a hardware flush is not completing correctly. This results in a partial packet from one trace run appearing in the subsequent trace run. When this sequence appears as part of a single AUXTRACE buffer (e.g. > 1 AUX events apply to this buffer) then this can result in a decode error. This error will occur even on the perf report -D which should simply just dump trace packets.
A separate historical problem was that where multiple trace run hw buffers appear concatenated in a single AUXTRACE buffer in perf.data, it was impossible for the decoder to determine the boundaries where a wrapped hw buffer followed a previous hw buffer. This was addressed by having the drivers insert barrier packets into the hw trace stream when this occurred. This allowed the decoder to decode a full AUXTRACE buffer correctly, irrespective of the number of hw buffers it contained. However this workaround is not effective for the limitation mentioned in the previous paragraph.
Both these issues are related to how the decode semantics in perf cs-etm are handling PERF_RECORD_AUX and PERF_RECORD_AUXTRACE.
On Wed, 9 Dec 2020 at 11:19, Leo Yan leo.yan@linaro.org wrote:
Hi all,
On Tue, Nov 10, 2020 at 11:14:16AM -0700, Mathieu Poirier wrote:
On Tue, Nov 10, 2020 at 02:52:43PM +0800, Leo Yan wrote:
Hi all,
[ + CoreSight Mailing List ]
On the mainlne kernel with the latest commit 407ab579637c "Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm", the perf cs-etm testing fails.
I found the perf cs-etm testing itself has a typo issue and a regression introduced by Perf tool in the recent mainline kernel, so I sent out the two patches for fixing the related issues [1].
Even after applying these two fixings for the testing script, I still observe the etm testing failure with snapshot mode. I manully checked the perf data and confirmed the perf data contains the raw trace data for snapshot (so 'perf script -D' can work well), but it fails to generate any branch samples, so the command 'perf script' will not output any samples.
Seems to me, this issue is more likely related with driver or Perf tool's common code change but not the testing itself;
If the trace data is available then it is likely the kernel drivers worked properly. I would concentrate on the perf tools - with the amount of code churn in that subsystem many regressions have been introduced before.
to use 'git bisect' to narrow down the snapshot mode failure, if you have any idea for fixing this issue, please let me know. Otherwise, I will dig into this issue and keep posted.
I'm afraid a bisect is your only option.
Firstly, sorry for this is a long replying, in this email I will try to explain what's the problem, but I am also seeking your suggestions for how to fix the issue.
Let me give out a quick way for reproducing the issue:
# sh test_cs_etm_snapshot.sh => it is pasted at the end of email # perf script => it cannot output any samples for snapshot mode
Here have a question: why tool cannot output samples for both kernel and user space programs?
### Failure for parsing Kernel symbols
For output the kernel's trace data, perf needs to read out the symbols from the kernel's DSO. As far as I know, there have three ways to import the kernel's symbols:
- By default, perf tool to read out the runtime kernel symbols from /proc/kallsyms and /proc/kcore; to support both of these two proc nodes, we need to enable kernel configurations: CONFIG_PROC_KCORE=y CONFIG_KALLSYMS=y
- Copy vmlinux in the 'pwd' folder when run 'perf' command;
- Specify the vmlinux path with option '-k'.
For my case, since I missed to enable configs for KCORE and doesn't copy or specify vmlinux, thus the kernel symbols are failed to parsed and cannot synthesize the samples [1]. After enabled KCORE option, then tool outputs the kernel symbols properly. Since there have no any warning log to remind the failure for finding DSO or symbols, and I encounered the similiar issues and usually it took me serveral hours or even more time to locate the issue. So share the info at here and also I think it's better to commit patch to improve the verbose log.
### Failure for parsing user space symbols
After the kernel symbols parsing is fixed, the test case can pass. But the difficult part is for the userspace symbol's parsing. Sometime, we even don't notice that it fails to synthesize samples for the userspace trace data.
The rationale is: perf tool imports the userspace's DSOs when handling events PERF_RECORD_MMAP2; in theory, we should expect the events PERF_RECORD_MMAP2 should arrive ahead event PERF_RECORD_AUX, so that perf tool can load the userspace DSOs and get ready the symbol tree before parsing any address in the userspace.
But I found the events PERF_RECORD_MMAP2 and PERF_RECORD_AUX are interleaved with each other in the perf.data file and it's possible to process all AUX trace data but without user space program's DSO, thus we cannot see any samples from userspace. This is why I sent RFC patch to 'force' the PERF_RECORD_MMAP2 events to be processed prior to PERF_RECORD_AUX [2]; but the change in the RFC patch is more likely a workaround rather than a real fixing. The reason is if there have PERF_RECORD_AUX event is coming and even the DSO is not prepared, it should skip a very small amount trace data and exit from the perf cs-etm decoding loop, and perf tool can continue to import DSOs based on the sequential PERF_RECORD_MMAP2 events, afterwards the perf tool can decode CoreSight trace data and generate samples.
This is a common issue for all AUX trace cases (like cs-etm, SPE, Intel-pt, etc), after looking into the code for Intel-PT, I think Intel-pt uses the timestamp to correlate different events. The key point is it pushes the first Intel-PT's sample with timestamp in the auxtrace heap [3], if any new event's timestamp is prior to Intel-PT's trace timestamp, Intel-pt's decoding loop will bail out [4] and process other events ahead. Simply to say, the events are handled in the order of the timestamp.
I think this approach can be applied on Arm SPE, since Arm SPE uses arch timer's counter as timestamp, Arm SPE trace and other events can be correlated with each other with the same timeline axis.
For CoreSight, I'm not sure how to proceed; the Arch timer counter is not mandatory for connecting CoreSight, but if some platform has used Arch timer counter for CoreSight (Like Juno board), should we move forward for this?
Otherwise, as Adrian Hunter suggested: "If you are processing data based on PERF_RECORD_AUX timestamp then you need to pay attention to the offset. PERF_RECORD_AUX gives you aux_offset/aux_size and auxtrace_buffer (which may contain data from several PERF_RECORD_AUX) gives you offset/size.", so should we extract the aux_offset and aux_size for PERF_RECORD_AUX event and every time only decode the trace data no more than aux_size for every PERF_RECORD_AUX event?
Yes - I think this has to happen in order for accurate decode to occur.
This means we need to add additional checking for buffer length in the functions cs_etm__process_queues() and cs_etm__run_decoder() to avoid processing any trace data not belonging to the event.
So, although the issues I am seeing are not directly related to perf snapshots, I do believe that both problems require a re-thinking of the handling of the perf records.
a) PERF_RECORD_AUX events appear to be directly related to PERF_RECORD_AUXTRACE buffers by index, offset and size. This should allow the correct parsing of a single PERF_RECORD_AUXTRACE buffer into a set of hw capture buffers according to the offset and size of the AUX records. This will allow us to restart the decoder appropriately at the start of each hw buffer.
b) Even if the arch timer is not running the coresight timestamp, or the coresight timestamp is not enabled, will it be correct to associate the arch timestamp in the AUX record with that in the MMAP2 records, and therefore have all the information needed for correct decode of the block of trace in the AUXTRACE record associated with the AUX record?
c) At present the handler in cs-etm for the AUX records returns immediately when dumping trace. This leads to errors in dumping the AUXTRACE record for the reasons given above , Again this suggests that we need to split the AUXTRACE record up according to the size / offset parameters in the AUX records in the dump trace case.
d) Correctly splitting the AUXTRACE buffer using AUX records will allow us to remove the barrier packets in the ETR / ETB drivers. This is a best a workaround for a problem that my have been shown to reside elsewhere.
In short I think that for CoreSight we need to correlate and associate the AUX and AUXTRACE records to correctly dump the trace packets, and add in the correlation of the correct MMAP2 events for full trace decode.
Regards
Mike
How about you think for this? Thanks for reading it!
Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... [2] https://lore.kernel.org/patchwork/patch/1340992/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
---8<---
test_cs_etm_snapshot.sh:
#!/bin/sh
arm_cs_etm_snapshot_test() { echo "Recording trace with snapshot mode"
perf record -e cs_etm/@tmc_etr0/ -S \ -- dd if=/dev/zero of=/dev/null & PERFPID=$! # Wait for perf program sleep 2 # Send signal to snapshot trace data kill -USR2 $PERFPID # Stop perf program kill $PERFPID wait $PERFPID
}
arm_cs_etm_snapshot_test exit 0
-----Original Message----- From: CoreSight coresight-bounces@lists.linaro.org On Behalf Of Mike Leach Sent: 09 December 2020 16:38 To: leo.yan@linaro.org Cc: coresight@lists.linaro.org; James Clark James.Clark@arm.com Subject: Re: Issue report: CoreSight snapshot mode testing failure
Hi Leo,
I've been looking into an issue with CoreSight decode of ETM data in perf, where, for some reason on some target, it appears that a hardware flush is not completing correctly. This results in a partial packet from one trace run appearing in the subsequent trace run. When this sequence appears as part of a single AUXTRACE buffer (e.g. > 1 AUX events apply to this buffer) then this can result in a decode error. This error will occur even on the perf report -D which should simply just dump trace packets.
A separate historical problem was that where multiple trace run hw buffers appear concatenated in a single AUXTRACE buffer in perf.data, it was impossible for the decoder to determine the boundaries where a wrapped hw buffer followed a previous hw buffer. This was addressed by having the drivers insert barrier packets into the hw trace stream when this occurred. This allowed the decoder to decode a full AUXTRACE buffer correctly, irrespective of the number of hw buffers it contained. However this workaround is not effective for the limitation mentioned in the previous paragraph.
Both these issues are related to how the decode semantics in perf cs-etm are handling PERF_RECORD_AUX and PERF_RECORD_AUXTRACE.
On Wed, 9 Dec 2020 at 11:19, Leo Yan leo.yan@linaro.org wrote:
Hi all,
On Tue, Nov 10, 2020 at 11:14:16AM -0700, Mathieu Poirier wrote:
On Tue, Nov 10, 2020 at 02:52:43PM +0800, Leo Yan wrote:
Hi all,
[ + CoreSight Mailing List ]
On the mainlne kernel with the latest commit 407ab579637c "Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm", the perf cs-etm testing fails.
I found the perf cs-etm testing itself has a typo issue and a regression introduced by Perf tool in the recent mainline kernel, so I sent out the two patches for fixing the related issues [1].
Even after applying these two fixings for the testing script, I still observe the etm testing failure with snapshot mode. I manully checked the perf data and confirmed the perf data contains the raw trace data for snapshot (so 'perf script -D' can work well), but it fails to generate any branch samples, so the command 'perf script' will not output any samples.
Seems to me, this issue is more likely related with driver or Perf tool's common code change but not the testing itself;
If the trace data is available then it is likely the kernel drivers worked properly. I would concentrate on the perf tools - with the amount of code churn in that subsystem many regressions have been
introduced before.
to use 'git bisect' to narrow down the snapshot mode failure, if you have any idea for fixing this issue, please let me know. Otherwise, I will dig into this issue and keep posted.
I'm afraid a bisect is your only option.
Firstly, sorry for this is a long replying, in this email I will try to explain what's the problem, but I am also seeking your suggestions for how to fix the issue.
Let me give out a quick way for reproducing the issue:
# sh test_cs_etm_snapshot.sh => it is pasted at the end of email # perf script => it cannot output any samples for snapshot mode
Here have a question: why tool cannot output samples for both kernel and user space programs?
### Failure for parsing Kernel symbols
For output the kernel's trace data, perf needs to read out the symbols from the kernel's DSO. As far as I know, there have three ways to import the kernel's symbols:
- By default, perf tool to read out the runtime kernel symbols from /proc/kallsyms and /proc/kcore; to support both of these two proc nodes, we need to enable kernel configurations: CONFIG_PROC_KCORE=y CONFIG_KALLSYMS=y
- Copy vmlinux in the 'pwd' folder when run 'perf' command;
- Specify the vmlinux path with option '-k'.
For my case, since I missed to enable configs for KCORE and doesn't copy or specify vmlinux, thus the kernel symbols are failed to parsed and cannot synthesize the samples [1]. After enabled KCORE option, then tool outputs the kernel symbols properly. Since there have no any warning log to remind the failure for finding DSO or symbols, and I encounered the similiar issues and usually it took me serveral hours or even more time to locate the issue. So share the info at here and also I think it's better to commit patch to improve the verbose log.
### Failure for parsing user space symbols
After the kernel symbols parsing is fixed, the test case can pass. But the difficult part is for the userspace symbol's parsing. Sometime, we even don't notice that it fails to synthesize samples for the userspace trace data.
The rationale is: perf tool imports the userspace's DSOs when handling events PERF_RECORD_MMAP2; in theory, we should expect the events PERF_RECORD_MMAP2 should arrive ahead event PERF_RECORD_AUX, so
that
perf tool can load the userspace DSOs and get ready the symbol tree before parsing any address in the userspace.
But I found the events PERF_RECORD_MMAP2 and PERF_RECORD_AUX
are
interleaved with each other in the perf.data file and it's possible to process all AUX trace data but without user space program's DSO, thus we cannot see any samples from userspace. This is why I sent RFC patch to 'force' the PERF_RECORD_MMAP2 events to be processed prior
to
PERF_RECORD_AUX [2]; but the change in the RFC patch is more likely a workaround rather than a real fixing. The reason is if there have PERF_RECORD_AUX event is coming and even the DSO is not prepared, it should skip a very small amount trace data and exit from the perf cs-etm decoding loop, and perf tool can continue to import DSOs based on the sequential PERF_RECORD_MMAP2 events, afterwards the perf tool can decode CoreSight trace data and generate samples.
This is a common issue for all AUX trace cases (like cs-etm, SPE, Intel-pt, etc), after looking into the code for Intel-PT, I think Intel-pt uses the timestamp to correlate different events. The key point is it pushes the first Intel-PT's sample with timestamp in the auxtrace heap [3], if any new event's timestamp is prior to Intel-PT's trace timestamp, Intel-pt's decoding loop will bail out [4] and process other events ahead. Simply to say, the events are handled in the order of the timestamp.
I think this approach can be applied on Arm SPE, since Arm SPE uses arch timer's counter as timestamp, Arm SPE trace and other events can be correlated with each other with the same timeline axis.
For CoreSight, I'm not sure how to proceed; the Arch timer counter is not mandatory for connecting CoreSight, but if some platform has used Arch timer counter for CoreSight (Like Juno board), should we move forward for this?
Otherwise, as Adrian Hunter suggested: "If you are processing data based on PERF_RECORD_AUX timestamp then you need to pay attention
to
the offset. PERF_RECORD_AUX gives you aux_offset/aux_size and auxtrace_buffer (which may contain data from several
PERF_RECORD_AUX)
gives you offset/size.", so should we extract the aux_offset and aux_size for PERF_RECORD_AUX event and every time only decode the trace data no more than aux_size for every PERF_RECORD_AUX event?
Yes - I think this has to happen in order for accurate decode to occur.
This means we need to add additional checking for buffer length in the functions cs_etm__process_queues() and cs_etm__run_decoder() to
avoid
processing any trace data not belonging to the event.
So, although the issues I am seeing are not directly related to perf snapshots, I do believe that both problems require a re-thinking of the handling of the perf records.
a) PERF_RECORD_AUX events appear to be directly related to PERF_RECORD_AUXTRACE buffers by index, offset and size. This should allow the correct parsing of a single PERF_RECORD_AUXTRACE buffer into a set of hw capture buffers according to the offset and size of the AUX records. This will allow us to restart the decoder appropriately at the start of each hw buffer.
b) Even if the arch timer is not running the coresight timestamp, or the coresight timestamp is not enabled, will it be correct to associate the arch timestamp in the AUX record with that in the MMAP2 records, and therefore have all the information needed for correct decode of the block of trace in the AUXTRACE record associated with the AUX record?
The timestamps in the AUX record is a kernel timestamp, so it is correct to associate it with the timestamps in the MMAP records. The only risk is that the AUX buffer segment may contain some trace from any time prior to when the AUX record was generated, including times from before the MMAP occurred. At worst, you could have a situation like this:
- start trace collection into a buffer - client unloads abc.so (at an address) - client loads xyz.so into the same address range, generating MMAP - collect trace into an AUX segment, generating AUX
In the trace there may be references to a given address which may be from either abc.so or xyz.so, and the timestamps will not resolve this. This is quite unlikely, but it would be more likely if trace was being collected at a relatively slow rate, and/or with filtering. There may be similar issues with JIT, if you're using jitdump files to track when code is dynamically generated.
To do any better you'd need the timestamps embedded within the ETM stream itself, and convert those to kernel time, which we currently have no mechanism for.
Al
c) At present the handler in cs-etm for the AUX records returns immediately when dumping trace. This leads to errors in dumping the AUXTRACE record for the reasons given above , Again this suggests that we need to split the AUXTRACE record up according to the size / offset parameters in the AUX records in the dump trace case.
d) Correctly splitting the AUXTRACE buffer using AUX records will allow us to remove the barrier packets in the ETR / ETB drivers. This is a best a workaround for a problem that my have been shown to reside elsewhere.
In short I think that for CoreSight we need to correlate and associate the AUX and AUXTRACE records to correctly dump the trace packets, and add in the correlation of the correct MMAP2 events for full trace decode.
Regards
Mike
How about you think for this? Thanks for reading it!
Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre e/tools/perf/util/cs-etm.c#n695 [2] https://lore.kernel.org/patchwork/patch/1340992/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre e/tools/perf/util/intel-pt.c#n1240 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre e/tools/perf/util/intel-pt.c#n2329
---8<---
test_cs_etm_snapshot.sh:
#!/bin/sh
arm_cs_etm_snapshot_test() { echo "Recording trace with snapshot mode"
perf record -e cs_etm/@tmc_etr0/ -S \ -- dd if=/dev/zero of=/dev/null & PERFPID=$! # Wait for perf program sleep 2 # Send signal to snapshot trace data kill -USR2 $PERFPID # Stop perf program kill $PERFPID wait $PERFPID
}
arm_cs_etm_snapshot_test exit 0
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Al,
On Thu, Dec 10, 2020 at 11:23:20AM +0000, Al Grant wrote:
[...]
b) Even if the arch timer is not running the coresight timestamp, or the coresight timestamp is not enabled, will it be correct to associate the arch timestamp in the AUX record with that in the MMAP2 records, and therefore have all the information needed for correct decode of the block of trace in the AUXTRACE record associated with the AUX record?
The timestamps in the AUX record is a kernel timestamp, so it is correct to associate it with the timestamps in the MMAP records. The only risk is that the AUX buffer segment may contain some trace from any time prior to when the AUX record was generated, including times from before the MMAP occurred. At worst, you could have a situation like this:
- start trace collection into a buffer
- client unloads abc.so (at an address)
- client loads xyz.so into the same address range, generating MMAP
- collect trace into an AUX segment, generating AUX
In the trace there may be references to a given address which may be from either abc.so or xyz.so, and the timestamps will not resolve this. This is quite unlikely, but it would be more likely if trace was being collected at a relatively slow rate, and/or with filtering.
Yeah, this is a good example for why we need the timestamp to achieve more accurate output result.
There may be similar issues with JIT, if you're using jitdump files to track when code is dynamically generated.
For the JIT, it's a different topic (and it is difficult to resolve). Perf framework provides TEXT_POKE event for instruction alternation; for eBPF file, perf tool can use the BTF for symbol parsing.
To do any better you'd need the timestamps embedded within the ETM stream itself, and convert those to kernel time, which we currently have no mechanism for.
Thanks for explaination.
Leo
Hi Mike,
Thanks for the suggestions.
On Wed, Dec 09, 2020 at 04:38:24PM +0000, Mike Leach wrote:
[...]
Otherwise, as Adrian Hunter suggested: "If you are processing data based on PERF_RECORD_AUX timestamp then you need to pay attention to the offset. PERF_RECORD_AUX gives you aux_offset/aux_size and auxtrace_buffer (which may contain data from several PERF_RECORD_AUX) gives you offset/size.", so should we extract the aux_offset and aux_size for PERF_RECORD_AUX event and every time only decode the trace data no more than aux_size for every PERF_RECORD_AUX event?
Yes - I think this has to happen in order for accurate decode to occur.
This means we need to add additional checking for buffer length in the functions cs_etm__process_queues() and cs_etm__run_decoder() to avoid processing any trace data not belonging to the event.
So, although the issues I am seeing are not directly related to perf snapshots, I do believe that both problems require a re-thinking of the handling of the perf records.
a) PERF_RECORD_AUX events appear to be directly related to PERF_RECORD_AUXTRACE buffers by index, offset and size. This should allow the correct parsing of a single PERF_RECORD_AUXTRACE buffer into a set of hw capture buffers according to the offset and size of the AUX records. This will allow us to restart the decoder appropriately at the start of each hw buffer.
b) Even if the arch timer is not running the coresight timestamp, or the coresight timestamp is not enabled, will it be correct to associate the arch timestamp in the AUX record with that in the MMAP2 records, and therefore have all the information needed for correct decode of the block of trace in the AUXTRACE record associated with the AUX record?
Here have several concepts:
- The timestamp in PERF_RECORD_AUX event; - The timestamp in PERF_RECORD_MMAP2 event; - The timestamp which is contained in the CoreSight trace data.
The first two items' timestamps are coming from Kernel system time, the third item's timestamp is coming from CoreSight decoding packet and it only contains the raw counter value.
So at current stage, we can only know the timestamps for PERF_RECORD_AUX and PERF_RECORD_MMAP2 events; if the PERF_RECORD_AUX event arrives ahead than PERF_RECORD_MMAP2 event, we can simply handle the PERF_RECORD_AUX event and skips to synthenize samples due to lacking DSO loading. But this is inaccurate, the CoreSight trace data can generate many samples, so it's possible that samples might be earlier or later than PERF_RECORD_MMAP2 event.
If we cannot convert the CoreSight's timestamp to kernel's time, this means we must drop some CoreSight trace data which prior to PERF_RECORD_MMAP2 events. But this is better than current implementation, which will decode the trace data in one go and has no chance to handle PERF_RECORD_MMAP2 events.
c) At present the handler in cs-etm for the AUX records returns immediately when dumping trace. This leads to errors in dumping the AUXTRACE record for the reasons given above , Again this suggests that we need to split the AUXTRACE record up according to the size / offset parameters in the AUX records in the dump trace case.
d) Correctly splitting the AUXTRACE buffer using AUX records will allow us to remove the barrier packets in the ETR / ETB drivers. This is a best a workaround for a problem that my have been shown to reside elsewhere.
In short I think that for CoreSight we need to correlate and associate the AUX and AUXTRACE records to correctly dump the trace packets, and add in the correlation of the correct MMAP2 events for full trace decode.
Could you let me know what's the latest working status for this? If you have made progress, I am very glad to test and review the related changes. Otherwise, I will try to do experiment for this (after we agree with the approach).
Thanks, Leo
Hi Leo,
On Thu, 10 Dec 2020 at 13:09, Leo Yan leo.yan@linaro.org wrote:
Hi Mike,
Thanks for the suggestions.
On Wed, Dec 09, 2020 at 04:38:24PM +0000, Mike Leach wrote:
[...]
Otherwise, as Adrian Hunter suggested: "If you are processing data based on PERF_RECORD_AUX timestamp then you need to pay attention to the offset. PERF_RECORD_AUX gives you aux_offset/aux_size and auxtrace_buffer (which may contain data from several PERF_RECORD_AUX) gives you offset/size.", so should we extract the aux_offset and aux_size for PERF_RECORD_AUX event and every time only decode the trace data no more than aux_size for every PERF_RECORD_AUX event?
Yes - I think this has to happen in order for accurate decode to occur.
This means we need to add additional checking for buffer length in the functions cs_etm__process_queues() and cs_etm__run_decoder() to avoid processing any trace data not belonging to the event.
So, although the issues I am seeing are not directly related to perf snapshots, I do believe that both problems require a re-thinking of the handling of the perf records.
a) PERF_RECORD_AUX events appear to be directly related to PERF_RECORD_AUXTRACE buffers by index, offset and size. This should allow the correct parsing of a single PERF_RECORD_AUXTRACE buffer into a set of hw capture buffers according to the offset and size of the AUX records. This will allow us to restart the decoder appropriately at the start of each hw buffer.
b) Even if the arch timer is not running the coresight timestamp, or the coresight timestamp is not enabled, will it be correct to associate the arch timestamp in the AUX record with that in the MMAP2 records, and therefore have all the information needed for correct decode of the block of trace in the AUXTRACE record associated with the AUX record?
Here have several concepts:
- The timestamp in PERF_RECORD_AUX event;
- The timestamp in PERF_RECORD_MMAP2 event;
- The timestamp which is contained in the CoreSight trace data.
The first two items' timestamps are coming from Kernel system time, the third item's timestamp is coming from CoreSight decoding packet and it only contains the raw counter value.
So at current stage, we can only know the timestamps for PERF_RECORD_AUX and PERF_RECORD_MMAP2 events; if the PERF_RECORD_AUX event arrives ahead than PERF_RECORD_MMAP2 event, we can simply handle the PERF_RECORD_AUX event and skips to synthenize samples due to lacking DSO loading. But this is inaccurate, the CoreSight trace data can generate many samples, so it's possible that samples might be earlier or later than PERF_RECORD_MMAP2 event.
My observations - on a very small sample set of the two files I was analysing looking for decode issues - was that there are a block of MMAP2 events at the top of the file - as the perf session starts up, followed by a bunch of AUX / AUXTRACE records. In these particular files there did not seem to be much mixing of AUX / MMAP2 in the sequence.
I also observe that the MMAP2 events are associated with PID / TID - which we can use if we are tracing with Context IDs enabled and the relevant kernel associations set. So this gives another possible method for associating trace with MMAP2 events.
If we cannot convert the CoreSight's timestamp to kernel's time, this means we must drop some CoreSight trace data which prior to PERF_RECORD_MMAP2 events. But this is better than current implementation, which will decode the trace data in one go and has no chance to handle PERF_RECORD_MMAP2 events.
Agreed - it is difficult to associate later MMAP2 events with prior AUX events.
c) At present the handler in cs-etm for the AUX records returns immediately when dumping trace. This leads to errors in dumping the AUXTRACE record for the reasons given above , Again this suggests that we need to split the AUXTRACE record up according to the size / offset parameters in the AUX records in the dump trace case.
d) Correctly splitting the AUXTRACE buffer using AUX records will allow us to remove the barrier packets in the ETR / ETB drivers. This is a best a workaround for a problem that my have been shown to reside elsewhere.
In short I think that for CoreSight we need to correlate and associate the AUX and AUXTRACE records to correctly dump the trace packets, and add in the correlation of the correct MMAP2 events for full trace decode.
Could you let me know what's the latest working status for this? If you have made progress, I am very glad to test and review the related changes. Otherwise, I will try to do experiment for this (after we agree with the approach).
I have not actually attempted any code changes to fix this - at present I am not that familiar enough with the details of the code - and I have some other driver / OpenCSD work underway that is taking up my time at present.
I did some detailed analysis to try to discover why some users are seeing decode issues - my focus was on the problems with splitting the AUXTRACE buffer correctly so that the OpenCSD decoder was reset at the correct time to avoid stale data breaking the decoder. The conclusions I reached seemed to be relevant to the issue you raised also.
Regards
Thanks, Leo