The email account " coresight(a)lists.linaro.org " is almost full.
The email account currently uses 91.68% (229.19 MB / 250 MB) of
its capacity.
You should remove some emails from the mailbox as soon as
possible in order to prevent the loss of any future email. Use
the Email Disk Usage tool at
https://lists.linaro.org:2096/?goto_app=Email_DiskUsage , or ask
the system administrator to upgrade your account to a larger
quota.
The system generated this warning on Friday, September 21, 2018
at 6:27:01 (UTC).
You can disable the “Quota :: MailboxWarning” notice type through
the cPanel interface:
https://lists.linaro.org:2083/?goto_app=ContactInfo_Change
Do not reply to this automated message.
Hello,
I've successfully written a program that generates trace via sysFS,
passes it to the decoder, and uses a callback function to utilize it
afterward. When testing it by tracing a small test program over and over
again I noticed that occasionally I get a CSD_ERR_BAD_DECODE_PKT-error
message. In the few instances I was able to reproduce it the produced
OCSD_GEN_TRC_ELEM_INSTR_RANGE packets were always correct and the error
always occurred at the end in combination with an I_INCOMPLETE_EOT
packet. Therefore, and because the error is very rare, I could just
disable the error logger and my program would work, but if possible I
would like to avoid that. Does anyone have an idea where the error might
come from, whether I am doing something wrong, or where I could look for
the cause?
I've saved the binary trace data and decoded it using trc_pkt lister,
and the output is almost the same, but the broken trace has a few
additional lines and is missing one OCSD_GEN_TRC_ELEM_EO_TRACE-element:
[equal trace data]
+ Idx:364; ID:10; [0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x80 ]; I_ASYNC : Alignment Synchronisation.
+ Idx:377; ID:10; [0x01 0x01 0x00 ]; I_TRACE_INFO : Trace Info.;
INFO=0x0 { CC.0 }
+ Idx:380; ID:10; [0x9d 0x70 0x6a 0x0d 0x10 0x00 0x80 0xff 0xff ];
I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF8000100DD5C0;
[equal trace only having varying Idx]
+ Idx:506; ID:10; [0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x80 ]; I_ASYNC : Alignment Synchronisation.
+ Idx:519; ID:10; [0x01 0x01 0x00 ]; I_TRACE_INFO : Trace Info.;
INFO=0x0 { CC.0 }
+ Idx:522; ID:10; [0x9d 0x60 0x57 0xd1 0x10 ]; I_INCOMPLETE_EOT :
Incomplete packet at end of trace.[I_ADDR_L_64IS0]
+ DCD_ETMV4_0016 : 0x0019 (OCSD_ERR_BAD_DECODE_PKT) [Reserved or unknown
packet in decoder.]; TrcIdx=522; CS ID=10; Unknown packet type.Idx:522;
ID:10; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
- Idx:471; ID:10; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
[Equal EOT trace: ]
ID:10 END OF TRACE DATA
Idx:0; ID:12; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
ID:12 END OF TRACE DATA
Idx:0; ID:14; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
ID:14 END OF TRACE DATA
Idx:0; ID:16; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
ID:16 END OF TRACE DATA
Idx:0; ID:17; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
ID:17 END OF TRACE DATA
Idx:0; ID:18; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
ID:18 END OF TRACE DATA
Idx:0; ID:1a; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
ID:1a END OF TRACE DATA
Idx:0; ID:1c; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace])
ID:1c END OF TRACE DATA
Trace Packet Lister : Trace buffer done, processed 5xx bytes.
I've attached the full decoded traces as well. I don't know if it
matters, but my trace very often contains a 0x0e Exceptions (apparently
syscalls) but not in this run.
Any insight on this would be appreciated.
Thanks and regards,
Dominik
Populate the auxtrace queues using AUX records rather than whole
auxtrace buffers.
This is similar to the auxtrace_queues__process_index() ->
auxtrace_queues__add_indexed_event() flow where
perf_session__peek_event() is used to read AUXTRACE events out of random
positions in the file based on the auxtrace index.
Now there is a second pass using perf_session__peek_events() which loops
over all PERF_RECORD_AUX events, finds the corresponding auxtrace
buffer, and adds a fragment of that buffer to the auxtrace queues,
rather than the whole buffer.
This is a completely new implementation compared to v1 and v2 of the
patchset where significant modifications were made to the decoding flow,
including saving all of the AUX records, and changing the point where
decoding started. These changes had effects on the outcome of decoding,
but with this implementation, the output of decoding is identical to
perf/core, except in the case where decoding failed due to the issue
that this change addresses. No changes to decoding were made, apart
from populating the auxtrace queues.
Because the decoder is already reset between each auxtrace buffer,
populating the queues with fragments of whole buffers based on AUX
records has the same affect as resetting the decoder on the AUX record
bounds during decode. But this approach only requires a change to one
part of the whole flow, rather than many parts.
There is still one TODO about how buffers are found. Currently there is
a plain loop as a search, but this could be improved with a binary
search or keeping a record of which buffers have already been added and
starting the search from the last used place. AUX records and buffers
will come in similar orders in the file so the latter would be simpler
and probably be good enough. But I'm looking for feedback about the
approach as a whole before ironing out all of the details.
Other issues that were fixed by the previous patchsets such as fixing
the case where the first aux record comes before the first MMAP record
can now be fixed as completely independent issues.
Testing
=======
Testing was done with the following script, to diff the decoding results
between the patched and un-patched versions of perf:
#!/bin/bash
set -ex
$1 script -i $3 $4 > split.script
$2 script -i $3 $4 > default.script
diff split.script default.script | head -n 20
And it was run like this, with various itrace options depending on the
quantity of synthesised events:
compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns
No changes in output were observed in the following scenarios:
* Simple per-cpu
perf record -e cs_etm/@tmc_etr0/u top
* Per-thread, single thread
perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C
* Per-thread multiple threads (but only one thread collected data):
perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
* Per-thread multiple threads (both threads collected data):
perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
* Per-cpu explicit threads:
perf record -e cs_etm/@tmc_etr0/u --pid 853,854
* System-wide (per-cpu):
perf record -e cs_etm/@tmc_etr0/u -a
* No data collected (no aux buffers)
Can happen with any command when run for a short period
* Containing truncated records
Can happen with any command
* Containing aux records with 0 size
Can happen with any command
Snapshot mode still needs to be tested.
Signed-off-by: James Clark <james.clark(a)arm.com>
---
tools/perf/util/cs-etm.c | 151 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 148 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 64536a6ed10a..5d6c03488187 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2679,6 +2679,153 @@ static u64 *cs_etm__create_meta_blk(u64 *buff_in, int *buff_in_offset,
return metadata;
}
+/**
+ * Puts a fragment of an auxtrace buffer into the auxtrace queues based
+ * on the bounds of aux_event, if it matches with the buffer that's at
+ * file_offset.
+ *
+ * Normally, whole auxtrace buffers would be added to the queue. But we
+ * want to reset the decoder for every PERF_RECORD_AUX event, and the decoder
+ * is reset across each buffer, so splitting the buffers up in advance has
+ * the same effect.
+ */
+static int cs_etm__queue_aux_fragment(struct perf_session *session,
+ off_t file_offset, size_t sz,
+ struct perf_record_aux *aux_event,
+ struct perf_sample *sample)
+{
+ int err;
+ char buf[PERF_SAMPLE_MAX_SIZE];
+ union perf_event *auxtrace_event_union;
+ struct perf_record_auxtrace *auxtrace_event;
+ union perf_event auxtrace_fragment;
+ bool matchesCpuPid;
+ struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+ struct cs_etm_auxtrace,
+ auxtrace);
+
+ /*
+ * There should be a PERF_RECORD_AUXTRACE event at the file_offset that we got
+ * from looping through the auxtrace index.
+ */
+ err = perf_session__peek_event(session, file_offset, buf,
+ PERF_SAMPLE_MAX_SIZE, &auxtrace_event_union, NULL);
+ if (err)
+ return err;
+ auxtrace_event = &auxtrace_event_union->auxtrace;
+ if (auxtrace_event->header.type != PERF_RECORD_AUXTRACE)
+ return -EINVAL;
+
+ if (auxtrace_event->header.size < sizeof(struct perf_record_auxtrace) ||
+ auxtrace_event->header.size != sz) {
+ return -EINVAL;
+ }
+
+ /*
+ * In per-thread mode, CPU is set to -1, but TID will be set instead.
+ * See auxtrace_mmap_params__set_idx()
+ */
+ if (auxtrace_event->cpu == (__u32) -1)
+ matchesCpuPid = auxtrace_event->tid == sample->tid;
+ else
+ matchesCpuPid = auxtrace_event->cpu == sample->cpu;
+
+ if (matchesCpuPid &&
+ aux_event->aux_offset >= auxtrace_event->offset &&
+ aux_event->aux_offset + aux_event->aux_size <=
+ auxtrace_event->offset + auxtrace_event->size) {
+ /*
+ * If this AUX event was inside this buffer somewhere, create a new auxtrace event
+ * based on the sizes of the aux event, and queue that fragment.
+ */
+ auxtrace_fragment.auxtrace = *auxtrace_event;
+ auxtrace_fragment.auxtrace.size = aux_event->aux_size;
+ auxtrace_fragment.auxtrace.offset = aux_event->aux_offset;
+ file_offset += aux_event->aux_offset - auxtrace_event->offset +
+ auxtrace_event->header.size;
+ return auxtrace_queues__add_event(&etm->queues,
+ session,
+ &auxtrace_fragment,
+ file_offset,
+ NULL);
+ }
+
+ /* Wasn't inside this buffer, but there were no parse errors. 1 == 'not found' */
+ return 1;
+}
+
+static int cs_etm__queue_aux_records_cb(struct perf_session *session,
+ union perf_event *event, u64 offset __maybe_unused,
+ void *data __maybe_unused)
+{
+ struct perf_sample sample;
+ int ret;
+ struct auxtrace_index_entry *ent;
+ struct auxtrace_index *auxtrace_index;
+ struct evsel *evsel;
+ size_t i;
+ struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+ struct cs_etm_auxtrace,
+ auxtrace);
+
+ /* Don't care about any other events, we're only queuing buffers for AUX events */
+ if (event->header.type != PERF_RECORD_AUX)
+ return 0;
+
+ if (event->header.size < sizeof(struct perf_record_aux))
+ return -EINVAL;
+
+ /* Truncated Aux records can have 0 size and shouldn't result in anything being queued. */
+ if (!event->aux.aux_size)
+ return 0;
+
+ /*
+ * Parse the sample, we need the sample_id_all data that comes after the event so that the
+ * CPU or PID can be matched to an AUXTRACE buffer's CPU or PID.
+ */
+ evsel = evlist__event2evsel(session->evlist, event);
+ evsel__parse_sample(evsel, event, &sample);
+
+ /*
+ * Loop throuch the auxtrace index to find the buffer that matches up with this
+ * aux event.
+ * TODO: binary search?
+ */
+ list_for_each_entry(auxtrace_index, &session->auxtrace_index, list) {
+ for (i = 0; i < auxtrace_index->nr; i++) {
+ ent = &auxtrace_index->entries[i];
+ ret = cs_etm__queue_aux_fragment(session, ent->file_offset,
+ ent->sz, &event->aux, &sample);
+ if (!ret) {
+ etm->data_queued = true;
+ return 0;
+ } else if (ret < 0) {
+ /*
+ * Anything other than 1 is an error. Positive values are 'not
+ * found' we just want to go onto the next one in that case.
+ */
+ return ret;
+ }
+ }
+ }
+
+ /*
+ * We would get here if there are no entries in the index (either no auxtrace buffers or
+ * no index). Fail silently as there is the possibility of queueing them in
+ * cs_etm__process_auxtrace_event() if etm->data_queued is still false;
+ *
+ * In that scenario, buffers will not be split by AUX records.
+ */
+ return 0;
+}
+
+static int cs_etm__queue_aux_records(struct perf_session *session)
+{
+ return perf_session__peek_events(session, session->header.data_offset,
+ session->header.data_size,
+ cs_etm__queue_aux_records_cb, NULL);
+}
+
int cs_etm__process_auxtrace_info(union perf_event *event,
struct perf_session *session)
{
@@ -2879,12 +3026,10 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
if (err)
goto err_delete_thread;
- err = auxtrace_queues__process_index(&etm->queues, session);
+ err = cs_etm__queue_aux_records(session);
if (err)
goto err_delete_thread;
- etm->data_queued = etm->queues.populated;
-
return 0;
err_delete_thread:
--
2.28.0