Hello Mathieu,
Thank you very much for your input. Some comments below.
-----Original Message----- From: Mathieu Poirier mathieu.poirier@linaro.org Sent: Monday, May 20, 2019 9:21 PM To: Wojciech Żmuda wzmuda@n7space.com Cc: coresight@lists.linaro.org; Michal Mosdorf mmosdorf@n7space.com; Michał Kurowski mkurowski@n7space.com; leo.yan@linaro.org; Al.Grant@arm.com Subject: Re: [RFC PATCH 1/1] perf cs-etm: Set time value for samples
On Mon, 20 May 2019 at 12:06, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Hi Wojciech,
On Fri, May 10, 2019 at 01:54:25PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Populate the time field in branches and instruction samples generated from RANGE, EXCEPTION and EXCEPTION_RET packets. The time value is based on TIMESTAMP packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..838726748cd7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct
cs_etm_packet_queue *queue)
queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN;
queue->packet_buffer[i].timestamp = 0; }
}
@@ -1098,6 +1099,7 @@ static int
cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp; if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6
+1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue
*etmq,
sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp; /* * perf report cannot handle events without a branch stack @@
-1894,6 +1897,37 @@ static int cs_etm__decode_data_block(struct
cs_etm_queue *etmq)
return ret;
}
+static void +cs_etm__timestamp_traceid_queue_packet(struct cs_etm_traceid_queue +*tidq) {
/**
* At this point, packet queue's timestamp and next_timestamp
fields
* represent start and stop timestamp value for range packets
buffered
* in the queue. Since timestamp packets come once for every N
range packets,
* we cannot directly assign timestamp values to specific range
packets.
* Some estimation is done to tell at what approximate timestamp
* has execution of each instruction range started.
*/
if (!tidq->prev_packet->timestamp) {
/* First range packet in this queue */
tidq->packet->timestamp = tidq->packet_queue.timestamp;
} else {
/*
* Similarly to what is done in
cs_etm_decoder__do_{soft|hard}_timestamp(),
* adjust timestamp value by the number of executed
instructions,
* so that packets representing ranges of instructions
executed later,
* have higher timestamp value than those executed
earlier, in line with
* number of instruction executed in the previous range.
* Currently, the assumption is that execution time of
each instruction
* equals 1 timestamp cycle.
*
* TODO find better approximation.
*/
tidq->packet->timestamp = tidq->prev_packet->timestamp +
tidq->prev_packet->instr_count;
}
+}
This will work but as you said in the cover letter, only in CPU-wide trace scenarios.
Actually, the limitation I mentioned there was in what perf-script can print. The time field can be printed only if the PERF_SAMPLE_TIME flag is set. The current code removes this flag in per-thread mode. I'd like to tie this flag to the ETM configuration instead and set it if timestamps are included in the stream.
The question here is, if splitting the 'timeless' concept from the per-thread tracing does not have further implications. What especially confuses me, is the cs_etm__is_timeless_decoding() function, that turns off timeless decoding if it sees samples with this flag set. This introduces some kind of circular dependency between PERF_SAMPLE_TIME and timeless decoding mode.
To me the first thing to do his confine time management to cs-etm-decoder.c so that the front end (i.e cs-etm.c) only has to use time. I also think each range packet should be timestamp'ed as they arrive. To do so two things need to happen:
- Fix function cs_etm__do_soft_timestamp(). On line 282 [1] the
timestamp of the packet is equal to the timestamp of the queue. Line 287 and below should be moved to cs_etm_decoder__buffer_range(), exactly where cs_etm__do_soft_timestamp() currently is. The new function cs_etm__do_soft_timestamp() should be moved to line 436. That way we timestamp all packets and we keep the current code to trigger front end attention in CPU-wide scenarios.
The above addresses the case where we've already seen a timestamp.
- Fix function cs_etm_decoder__do_hard_timestamp() to timestamp all
range packet in the packet_queue when the first timestamp is
encountered.
I think I get your point. Let me send out V2 I prepared with this advice.
That should cover all basis. To answer your question from the cover letter, timeless_decoding == per-thread.
I forgot... This might also be of interest to you: https://lkml.org/lkml/2019/5/20/494
I need to take some time to understand this, but seems like a good inspiration to how to get rid of this coarse approximation we use, that one instruction is one cycle. IPC is much better metric than the adjusted timestamp. Thanks for letting me know about this set, I'll process it and try to use this idea in my patch set.
Patch V2 is coming.
Thanks, Wojciech
Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/tree/tools /perf/util/cs-etm-decoder/cs-etm-decoder.c?h=5.1-rc3-cpu-wide-v3#n282
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { @@ -1926,6 +1960,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
switch (tidq->packet->sample_type) { case CS_ETM_RANGE:
- cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the packet contains an instruction * range, generate instruction
sequence @@ -1935,6 +1970,7 @@ static int
cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
- cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the exception packet is coming, * make sure the previous instruction
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
-- 2.11.0