-----Original Message----- From: CoreSight coresight-bounces@lists.linaro.org On Behalf Of Mike Leach Sent: 09 December 2020 16:38 To: leo.yan@linaro.org Cc: coresight@lists.linaro.org; James Clark James.Clark@arm.com Subject: Re: Issue report: CoreSight snapshot mode testing failure
Hi Leo,
I've been looking into an issue with CoreSight decode of ETM data in perf, where, for some reason on some target, it appears that a hardware flush is not completing correctly. This results in a partial packet from one trace run appearing in the subsequent trace run. When this sequence appears as part of a single AUXTRACE buffer (e.g. > 1 AUX events apply to this buffer) then this can result in a decode error. This error will occur even on the perf report -D which should simply just dump trace packets.
A separate historical problem was that where multiple trace run hw buffers appear concatenated in a single AUXTRACE buffer in perf.data, it was impossible for the decoder to determine the boundaries where a wrapped hw buffer followed a previous hw buffer. This was addressed by having the drivers insert barrier packets into the hw trace stream when this occurred. This allowed the decoder to decode a full AUXTRACE buffer correctly, irrespective of the number of hw buffers it contained. However this workaround is not effective for the limitation mentioned in the previous paragraph.
Both these issues are related to how the decode semantics in perf cs-etm are handling PERF_RECORD_AUX and PERF_RECORD_AUXTRACE.
On Wed, 9 Dec 2020 at 11:19, Leo Yan leo.yan@linaro.org wrote:
Hi all,
On Tue, Nov 10, 2020 at 11:14:16AM -0700, Mathieu Poirier wrote:
On Tue, Nov 10, 2020 at 02:52:43PM +0800, Leo Yan wrote:
Hi all,
[ + CoreSight Mailing List ]
On the mainlne kernel with the latest commit 407ab579637c "Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm", the perf cs-etm testing fails.
I found the perf cs-etm testing itself has a typo issue and a regression introduced by Perf tool in the recent mainline kernel, so I sent out the two patches for fixing the related issues [1].
Even after applying these two fixings for the testing script, I still observe the etm testing failure with snapshot mode. I manully checked the perf data and confirmed the perf data contains the raw trace data for snapshot (so 'perf script -D' can work well), but it fails to generate any branch samples, so the command 'perf script' will not output any samples.
Seems to me, this issue is more likely related with driver or Perf tool's common code change but not the testing itself;
If the trace data is available then it is likely the kernel drivers worked properly. I would concentrate on the perf tools - with the amount of code churn in that subsystem many regressions have been
introduced before.
to use 'git bisect' to narrow down the snapshot mode failure, if you have any idea for fixing this issue, please let me know. Otherwise, I will dig into this issue and keep posted.
I'm afraid a bisect is your only option.
Firstly, sorry for this is a long replying, in this email I will try to explain what's the problem, but I am also seeking your suggestions for how to fix the issue.
Let me give out a quick way for reproducing the issue:
# sh test_cs_etm_snapshot.sh => it is pasted at the end of email # perf script => it cannot output any samples for snapshot mode
Here have a question: why tool cannot output samples for both kernel and user space programs?
### Failure for parsing Kernel symbols
For output the kernel's trace data, perf needs to read out the symbols from the kernel's DSO. As far as I know, there have three ways to import the kernel's symbols:
- By default, perf tool to read out the runtime kernel symbols from /proc/kallsyms and /proc/kcore; to support both of these two proc nodes, we need to enable kernel configurations: CONFIG_PROC_KCORE=y CONFIG_KALLSYMS=y
- Copy vmlinux in the 'pwd' folder when run 'perf' command;
- Specify the vmlinux path with option '-k'.
For my case, since I missed to enable configs for KCORE and doesn't copy or specify vmlinux, thus the kernel symbols are failed to parsed and cannot synthesize the samples [1]. After enabled KCORE option, then tool outputs the kernel symbols properly. Since there have no any warning log to remind the failure for finding DSO or symbols, and I encounered the similiar issues and usually it took me serveral hours or even more time to locate the issue. So share the info at here and also I think it's better to commit patch to improve the verbose log.
### Failure for parsing user space symbols
After the kernel symbols parsing is fixed, the test case can pass. But the difficult part is for the userspace symbol's parsing. Sometime, we even don't notice that it fails to synthesize samples for the userspace trace data.
The rationale is: perf tool imports the userspace's DSOs when handling events PERF_RECORD_MMAP2; in theory, we should expect the events PERF_RECORD_MMAP2 should arrive ahead event PERF_RECORD_AUX, so
that
perf tool can load the userspace DSOs and get ready the symbol tree before parsing any address in the userspace.
But I found the events PERF_RECORD_MMAP2 and PERF_RECORD_AUX
are
interleaved with each other in the perf.data file and it's possible to process all AUX trace data but without user space program's DSO, thus we cannot see any samples from userspace. This is why I sent RFC patch to 'force' the PERF_RECORD_MMAP2 events to be processed prior
to
PERF_RECORD_AUX [2]; but the change in the RFC patch is more likely a workaround rather than a real fixing. The reason is if there have PERF_RECORD_AUX event is coming and even the DSO is not prepared, it should skip a very small amount trace data and exit from the perf cs-etm decoding loop, and perf tool can continue to import DSOs based on the sequential PERF_RECORD_MMAP2 events, afterwards the perf tool can decode CoreSight trace data and generate samples.
This is a common issue for all AUX trace cases (like cs-etm, SPE, Intel-pt, etc), after looking into the code for Intel-PT, I think Intel-pt uses the timestamp to correlate different events. The key point is it pushes the first Intel-PT's sample with timestamp in the auxtrace heap [3], if any new event's timestamp is prior to Intel-PT's trace timestamp, Intel-pt's decoding loop will bail out [4] and process other events ahead. Simply to say, the events are handled in the order of the timestamp.
I think this approach can be applied on Arm SPE, since Arm SPE uses arch timer's counter as timestamp, Arm SPE trace and other events can be correlated with each other with the same timeline axis.
For CoreSight, I'm not sure how to proceed; the Arch timer counter is not mandatory for connecting CoreSight, but if some platform has used Arch timer counter for CoreSight (Like Juno board), should we move forward for this?
Otherwise, as Adrian Hunter suggested: "If you are processing data based on PERF_RECORD_AUX timestamp then you need to pay attention
to
the offset. PERF_RECORD_AUX gives you aux_offset/aux_size and auxtrace_buffer (which may contain data from several
PERF_RECORD_AUX)
gives you offset/size.", so should we extract the aux_offset and aux_size for PERF_RECORD_AUX event and every time only decode the trace data no more than aux_size for every PERF_RECORD_AUX event?
Yes - I think this has to happen in order for accurate decode to occur.
This means we need to add additional checking for buffer length in the functions cs_etm__process_queues() and cs_etm__run_decoder() to
avoid
processing any trace data not belonging to the event.
So, although the issues I am seeing are not directly related to perf snapshots, I do believe that both problems require a re-thinking of the handling of the perf records.
a) PERF_RECORD_AUX events appear to be directly related to PERF_RECORD_AUXTRACE buffers by index, offset and size. This should allow the correct parsing of a single PERF_RECORD_AUXTRACE buffer into a set of hw capture buffers according to the offset and size of the AUX records. This will allow us to restart the decoder appropriately at the start of each hw buffer.
b) Even if the arch timer is not running the coresight timestamp, or the coresight timestamp is not enabled, will it be correct to associate the arch timestamp in the AUX record with that in the MMAP2 records, and therefore have all the information needed for correct decode of the block of trace in the AUXTRACE record associated with the AUX record?
The timestamps in the AUX record is a kernel timestamp, so it is correct to associate it with the timestamps in the MMAP records. The only risk is that the AUX buffer segment may contain some trace from any time prior to when the AUX record was generated, including times from before the MMAP occurred. At worst, you could have a situation like this:
- start trace collection into a buffer - client unloads abc.so (at an address) - client loads xyz.so into the same address range, generating MMAP - collect trace into an AUX segment, generating AUX
In the trace there may be references to a given address which may be from either abc.so or xyz.so, and the timestamps will not resolve this. This is quite unlikely, but it would be more likely if trace was being collected at a relatively slow rate, and/or with filtering. There may be similar issues with JIT, if you're using jitdump files to track when code is dynamically generated.
To do any better you'd need the timestamps embedded within the ETM stream itself, and convert those to kernel time, which we currently have no mechanism for.
Al
c) At present the handler in cs-etm for the AUX records returns immediately when dumping trace. This leads to errors in dumping the AUXTRACE record for the reasons given above , Again this suggests that we need to split the AUXTRACE record up according to the size / offset parameters in the AUX records in the dump trace case.
d) Correctly splitting the AUXTRACE buffer using AUX records will allow us to remove the barrier packets in the ETR / ETB drivers. This is a best a workaround for a problem that my have been shown to reside elsewhere.
In short I think that for CoreSight we need to correlate and associate the AUX and AUXTRACE records to correctly dump the trace packets, and add in the correlation of the correct MMAP2 events for full trace decode.
Regards
Mike
How about you think for this? Thanks for reading it!
Leo
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre e/tools/perf/util/cs-etm.c#n695 [2] https://lore.kernel.org/patchwork/patch/1340992/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre e/tools/perf/util/intel-pt.c#n1240 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre e/tools/perf/util/intel-pt.c#n2329
---8<---
test_cs_etm_snapshot.sh:
#!/bin/sh
arm_cs_etm_snapshot_test() { echo "Recording trace with snapshot mode"
perf record -e cs_etm/@tmc_etr0/ -S \ -- dd if=/dev/zero of=/dev/null & PERFPID=$! # Wait for perf program sleep 2 # Send signal to snapshot trace data kill -USR2 $PERFPID # Stop perf program kill $PERFPID wait $PERFPID
}
arm_cs_etm_snapshot_test exit 0
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight