From: Wojciech Zmuda wzmuda@n7space.com
Hello,
After our discussion in [1] I came up with this proposal solution on how to timestamp perf samples, so we can tell in trace post-processing, how long does it take to execute traced blocks of code. I'll be grateful if you find some time to take a look at it. The patch itself is 37 lines of additions, so it shouldn't consume a lot of your time. I also have some ideas to further improve this patch I'm not quite sure about, so I'd like to ask you for your input.
The simplest verification is visibility of the time field in 'perf script' output: root@zynq:~# perf script --ns -F comm,cpu,event,time | head Frame deformatter: Found 4 FSYNCS singlethread-at [001] 4572.697372400: branches:u: singlethread-at [001] 4572.697372404: branches:u: singlethread-at [001] 4572.697372408: branches:u: singlethread-at [001] 4572.697372412: branches:u: singlethread-at [001] 4572.697372416: branches:u:
The step above works only if trace has been collected in CPU-wide mode. I have an idea on how to improve this, but I'd like to get some more information first. I elaborated in the last paragraph.
Another way to access these values is using the script engine, which I used for validation of sanity of timestamp values. Using python script I checked if timestamps in subsequent samples are monotonically increasing and it seems they are. The only exception are discontinuities in trace. From my understanding, we can't timestamp discontinuities reasonably, since after decoder synchronizes back after trace loss, it needs to wait for another timestamp packet.
from __future__ import print_function import os import sys
sys.path.append(os.environ['PERF_EXEC_PATH'] + \ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
from perf_trace_context import *
prev_sample = 0 cnt = 0;
def process_event(s): global prev_sample global cnt
sample = s['sample']
if not prev_sample: prev_sample = sample; cnt = cnt+1 return
if sample['time'] < prev_sample['time']: print('Sample %d has higher timestamp than sample %d' % (cnt-1, cnt)) cnt = cnt+1
By subtracting timestamps of two samples, we can approximately tell, how long did it take (in timestamp units) to execute instructions between these two points. Now, (basing on Al's response in [1]) knowing the timestamp generator clock frequency, we can tell how long did it take in proper time units. The frequency is known (we have it in DT) and can be saved to and retrieved from TSGEN register. This is not done at this moment (the register is empty), but the idea is, if it was, it might then be saved in perf.data headers (similarly as TRCIDR are saved now), and later programmatically retrieved to calculate timestamp differences into actual time. I didn't experiment with it yet - I'd like to learn your opinion on it.
I've also tried to access the time field with 'perf script -F time' and while it's possible for traces recorded in CPU-wide mode, it's not possible in per-thread mode due to PERF_SAMPLE_TIME not being set in that mode (cs_etm__synth_events()). I tried to analyze that code, but I can't understand relation between this flag and ETM's timeless_decoding property. It looks to me that timeless decoding was equal to per-thread mode, where timestamps were not really utilized, while in CPU-wide mode they were used to inter-queue correlation. Would it be a good direction to have a separate property of TSGEN being on/off and to set PERF_SAMPLE_TIME based on this, rather than on etm->timeless_decoding? I see an obstacle in the cs_etm__is_timeless_decoding() function, that sets etm->timeless_decoding basing on events having this SAMPLE_TIME flag or not - could the criterion here be the tracing mode (CPU-wide/per-thread) instead?
Thank you, Wojciech
[1] https://lists.linaro.org/pipermail/coresight/2019-May/002577.html
Wojciech Zmuda (1): perf cs-etm: Set time value for samples
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
From: Wojciech Zmuda wzmuda@n7space.com
Populate the time field in branches and instruction samples generated from RANGE, EXCEPTION and EXCEPTION_RET packets. The time value is based on TIMESTAMP packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com --- tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..838726748cd7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN; + queue->packet_buffer[i].timestamp = 0; } }
@@ -1098,6 +1099,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc; + sample.time = tidq->packet->timestamp;
if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6 +1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc; + sample.time = tidq->packet->timestamp;
/* * perf report cannot handle events without a branch stack @@ -1894,6 +1897,37 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; }
+static void +cs_etm__timestamp_traceid_queue_packet(struct cs_etm_traceid_queue *tidq) +{ + /** + * At this point, packet queue's timestamp and next_timestamp fields + * represent start and stop timestamp value for range packets buffered + * in the queue. Since timestamp packets come once for every N range packets, + * we cannot directly assign timestamp values to specific range packets. + * Some estimation is done to tell at what approximate timestamp + * has execution of each instruction range started. + */ + if (!tidq->prev_packet->timestamp) { + /* First range packet in this queue */ + tidq->packet->timestamp = tidq->packet_queue.timestamp; + } else { + /* + * Similarly to what is done in cs_etm_decoder__do_{soft|hard}_timestamp(), + * adjust timestamp value by the number of executed instructions, + * so that packets representing ranges of instructions executed later, + * have higher timestamp value than those executed earlier, in line with + * number of instruction executed in the previous range. + * Currently, the assumption is that execution time of each instruction + * equals 1 timestamp cycle. + * + * TODO find better approximation. + */ + tidq->packet->timestamp = tidq->prev_packet->timestamp + + tidq->prev_packet->instr_count; + } +} + static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { @@ -1926,6 +1960,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
switch (tidq->packet->sample_type) { case CS_ETM_RANGE: + cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the packet contains an instruction * range, generate instruction sequence @@ -1935,6 +1970,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET: + cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the exception packet is coming, * make sure the previous instruction diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr; + u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
Hi Wojciech,
On Fri, May 10, 2019 at 01:54:25PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Populate the time field in branches and instruction samples generated from RANGE, EXCEPTION and EXCEPTION_RET packets. The time value is based on TIMESTAMP packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..838726748cd7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN;
}queue->packet_buffer[i].timestamp = 0;
} @@ -1098,6 +1099,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
- sample.time = tidq->packet->timestamp;
if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6 +1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
- sample.time = tidq->packet->timestamp;
/* * perf report cannot handle events without a branch stack @@ -1894,6 +1897,37 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; } +static void +cs_etm__timestamp_traceid_queue_packet(struct cs_etm_traceid_queue *tidq) +{
- /**
* At this point, packet queue's timestamp and next_timestamp fields
* represent start and stop timestamp value for range packets buffered
* in the queue. Since timestamp packets come once for every N range packets,
* we cannot directly assign timestamp values to specific range packets.
* Some estimation is done to tell at what approximate timestamp
* has execution of each instruction range started.
*/
- if (!tidq->prev_packet->timestamp) {
/* First range packet in this queue */
tidq->packet->timestamp = tidq->packet_queue.timestamp;
- } else {
/*
* Similarly to what is done in cs_etm_decoder__do_{soft|hard}_timestamp(),
* adjust timestamp value by the number of executed instructions,
* so that packets representing ranges of instructions executed later,
* have higher timestamp value than those executed earlier, in line with
* number of instruction executed in the previous range.
* Currently, the assumption is that execution time of each instruction
* equals 1 timestamp cycle.
*
* TODO find better approximation.
*/
tidq->packet->timestamp = tidq->prev_packet->timestamp +
tidq->prev_packet->instr_count;
- }
+}
This will work but as you said in the cover letter, only in CPU-wide trace scenarios.
To me the first thing to do his confine time management to cs-etm-decoder.c so that the front end (i.e cs-etm.c) only has to use time. I also think each range packet should be timestamp'ed as they arrive. To do so two things need to happen:
1) Fix function cs_etm__do_soft_timestamp(). On line 282 [1] the timestamp of the packet is equal to the timestamp of the queue. Line 287 and below should be moved to cs_etm_decoder__buffer_range(), exactly where cs_etm__do_soft_timestamp() currently is. The new function cs_etm__do_soft_timestamp() should be moved to line 436. That way we timestamp all packets and we keep the current code to trigger front end attention in CPU-wide scenarios.
The above addresses the case where we've already seen a timestamp.
2) Fix function cs_etm_decoder__do_hard_timestamp() to timestamp all range packet in the packet_queue when the first timestamp is encountered.
That should cover all basis. To answer your question from the cover letter, timeless_decoding == per-thread.
Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/tree/tools/perf/...
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { @@ -1926,6 +1960,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, switch (tidq->packet->sample_type) { case CS_ETM_RANGE:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the packet contains an instruction * range, generate instruction sequence
@@ -1935,6 +1970,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the exception packet is coming, * make sure the previous instruction
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
- u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
-- 2.11.0
On Mon, 20 May 2019 at 12:06, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Hi Wojciech,
On Fri, May 10, 2019 at 01:54:25PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Populate the time field in branches and instruction samples generated from RANGE, EXCEPTION and EXCEPTION_RET packets. The time value is based on TIMESTAMP packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..838726748cd7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN;
queue->packet_buffer[i].timestamp = 0; }
}
@@ -1098,6 +1099,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp; if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq);
@@ -1157,6 +1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp; /* * perf report cannot handle events without a branch stack
@@ -1894,6 +1897,37 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; }
+static void +cs_etm__timestamp_traceid_queue_packet(struct cs_etm_traceid_queue *tidq) +{
/**
* At this point, packet queue's timestamp and next_timestamp fields
* represent start and stop timestamp value for range packets buffered
* in the queue. Since timestamp packets come once for every N range packets,
* we cannot directly assign timestamp values to specific range packets.
* Some estimation is done to tell at what approximate timestamp
* has execution of each instruction range started.
*/
if (!tidq->prev_packet->timestamp) {
/* First range packet in this queue */
tidq->packet->timestamp = tidq->packet_queue.timestamp;
} else {
/*
* Similarly to what is done in cs_etm_decoder__do_{soft|hard}_timestamp(),
* adjust timestamp value by the number of executed instructions,
* so that packets representing ranges of instructions executed later,
* have higher timestamp value than those executed earlier, in line with
* number of instruction executed in the previous range.
* Currently, the assumption is that execution time of each instruction
* equals 1 timestamp cycle.
*
* TODO find better approximation.
*/
tidq->packet->timestamp = tidq->prev_packet->timestamp +
tidq->prev_packet->instr_count;
}
+}
This will work but as you said in the cover letter, only in CPU-wide trace scenarios.
To me the first thing to do his confine time management to cs-etm-decoder.c so that the front end (i.e cs-etm.c) only has to use time. I also think each range packet should be timestamp'ed as they arrive. To do so two things need to happen:
- Fix function cs_etm__do_soft_timestamp(). On line 282 [1] the timestamp of
the packet is equal to the timestamp of the queue. Line 287 and below should be moved to cs_etm_decoder__buffer_range(), exactly where cs_etm__do_soft_timestamp() currently is. The new function cs_etm__do_soft_timestamp() should be moved to line 436. That way we timestamp all packets and we keep the current code to trigger front end attention in CPU-wide scenarios.
The above addresses the case where we've already seen a timestamp.
- Fix function cs_etm_decoder__do_hard_timestamp() to timestamp all range
packet in the packet_queue when the first timestamp is encountered.
That should cover all basis. To answer your question from the cover letter, timeless_decoding == per-thread.
I forgot... This might also be of interest to you: https://lkml.org/lkml/2019/5/20/494
Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/tree/tools/perf/...
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { @@ -1926,6 +1960,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
switch (tidq->packet->sample_type) { case CS_ETM_RANGE:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the packet contains an instruction * range, generate instruction sequence
@@ -1935,6 +1970,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the exception packet is coming, * make sure the previous instruction
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
-- 2.11.0
Hello Mathieu,
Thank you very much for your input. Some comments below.
-----Original Message----- From: Mathieu Poirier mathieu.poirier@linaro.org Sent: Monday, May 20, 2019 9:21 PM To: Wojciech Żmuda wzmuda@n7space.com Cc: coresight@lists.linaro.org; Michal Mosdorf mmosdorf@n7space.com; Michał Kurowski mkurowski@n7space.com; leo.yan@linaro.org; Al.Grant@arm.com Subject: Re: [RFC PATCH 1/1] perf cs-etm: Set time value for samples
On Mon, 20 May 2019 at 12:06, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Hi Wojciech,
On Fri, May 10, 2019 at 01:54:25PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Populate the time field in branches and instruction samples generated from RANGE, EXCEPTION and EXCEPTION_RET packets. The time value is based on TIMESTAMP packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..838726748cd7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct
cs_etm_packet_queue *queue)
queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN;
queue->packet_buffer[i].timestamp = 0; }
}
@@ -1098,6 +1099,7 @@ static int
cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp; if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6
+1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue
*etmq,
sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp; /* * perf report cannot handle events without a branch stack @@
-1894,6 +1897,37 @@ static int cs_etm__decode_data_block(struct
cs_etm_queue *etmq)
return ret;
}
+static void +cs_etm__timestamp_traceid_queue_packet(struct cs_etm_traceid_queue +*tidq) {
/**
* At this point, packet queue's timestamp and next_timestamp
fields
* represent start and stop timestamp value for range packets
buffered
* in the queue. Since timestamp packets come once for every N
range packets,
* we cannot directly assign timestamp values to specific range
packets.
* Some estimation is done to tell at what approximate timestamp
* has execution of each instruction range started.
*/
if (!tidq->prev_packet->timestamp) {
/* First range packet in this queue */
tidq->packet->timestamp = tidq->packet_queue.timestamp;
} else {
/*
* Similarly to what is done in
cs_etm_decoder__do_{soft|hard}_timestamp(),
* adjust timestamp value by the number of executed
instructions,
* so that packets representing ranges of instructions
executed later,
* have higher timestamp value than those executed
earlier, in line with
* number of instruction executed in the previous range.
* Currently, the assumption is that execution time of
each instruction
* equals 1 timestamp cycle.
*
* TODO find better approximation.
*/
tidq->packet->timestamp = tidq->prev_packet->timestamp +
tidq->prev_packet->instr_count;
}
+}
This will work but as you said in the cover letter, only in CPU-wide trace scenarios.
Actually, the limitation I mentioned there was in what perf-script can print. The time field can be printed only if the PERF_SAMPLE_TIME flag is set. The current code removes this flag in per-thread mode. I'd like to tie this flag to the ETM configuration instead and set it if timestamps are included in the stream.
The question here is, if splitting the 'timeless' concept from the per-thread tracing does not have further implications. What especially confuses me, is the cs_etm__is_timeless_decoding() function, that turns off timeless decoding if it sees samples with this flag set. This introduces some kind of circular dependency between PERF_SAMPLE_TIME and timeless decoding mode.
To me the first thing to do his confine time management to cs-etm-decoder.c so that the front end (i.e cs-etm.c) only has to use time. I also think each range packet should be timestamp'ed as they arrive. To do so two things need to happen:
- Fix function cs_etm__do_soft_timestamp(). On line 282 [1] the
timestamp of the packet is equal to the timestamp of the queue. Line 287 and below should be moved to cs_etm_decoder__buffer_range(), exactly where cs_etm__do_soft_timestamp() currently is. The new function cs_etm__do_soft_timestamp() should be moved to line 436. That way we timestamp all packets and we keep the current code to trigger front end attention in CPU-wide scenarios.
The above addresses the case where we've already seen a timestamp.
- Fix function cs_etm_decoder__do_hard_timestamp() to timestamp all
range packet in the packet_queue when the first timestamp is
encountered.
I think I get your point. Let me send out V2 I prepared with this advice.
That should cover all basis. To answer your question from the cover letter, timeless_decoding == per-thread.
I forgot... This might also be of interest to you: https://lkml.org/lkml/2019/5/20/494
I need to take some time to understand this, but seems like a good inspiration to how to get rid of this coarse approximation we use, that one instruction is one cycle. IPC is much better metric than the adjusted timestamp. Thanks for letting me know about this set, I'll process it and try to use this idea in my patch set.
Patch V2 is coming.
Thanks, Wojciech
Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/tree/tools /perf/util/cs-etm-decoder/cs-etm-decoder.c?h=5.1-rc3-cpu-wide-v3#n282
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { @@ -1926,6 +1960,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
switch (tidq->packet->sample_type) { case CS_ETM_RANGE:
- cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the packet contains an instruction * range, generate instruction
sequence @@ -1935,6 +1970,7 @@ static int
cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
- cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the exception packet is coming, * make sure the previous instruction
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
-- 2.11.0
Hi Wojciech,
On Fri, May 10, 2019 at 01:54:25PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Populate the time field in branches and instruction samples generated from RANGE, EXCEPTION and EXCEPTION_RET packets. The time value is based on TIMESTAMP packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..838726748cd7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct cs_etm_packet_queue *queue) queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN;
}queue->packet_buffer[i].timestamp = 0;
} @@ -1098,6 +1099,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
- sample.time = tidq->packet->timestamp;
if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6 +1159,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
- sample.time = tidq->packet->timestamp;
/* * perf report cannot handle events without a branch stack @@ -1894,6 +1897,37 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) return ret; } +static void +cs_etm__timestamp_traceid_queue_packet(struct cs_etm_traceid_queue *tidq) +{
- /**
* At this point, packet queue's timestamp and next_timestamp fields
* represent start and stop timestamp value for range packets buffered
* in the queue. Since timestamp packets come once for every N range packets,
* we cannot directly assign timestamp values to specific range packets.
* Some estimation is done to tell at what approximate timestamp
* has execution of each instruction range started.
*/
- if (!tidq->prev_packet->timestamp) {
/* First range packet in this queue */
tidq->packet->timestamp = tidq->packet_queue.timestamp;
- } else {
/*
* Similarly to what is done in cs_etm_decoder__do_{soft|hard}_timestamp(),
* adjust timestamp value by the number of executed instructions,
* so that packets representing ranges of instructions executed later,
* have higher timestamp value than those executed earlier, in line with
* number of instruction executed in the previous range.
* Currently, the assumption is that execution time of each instruction
* equals 1 timestamp cycle.
*
* TODO find better approximation.
*/
tidq->packet->timestamp = tidq->prev_packet->timestamp +
tidq->prev_packet->instr_count;
- }
+}
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { @@ -1926,6 +1960,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, switch (tidq->packet->sample_type) { case CS_ETM_RANGE:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the packet contains an instruction * range, generate instruction sequence
@@ -1935,6 +1970,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the exception packet is coming, * make sure the previous instruction
Please note for handling the branch packet for discontinuity in trace, if the trace is discontinuity then the function cs_etm__flush() will be invoked to generate the last branch sample before discontinuity; but this patch misses to set timestamp for the last branch sample.
Thanks, Leo Yan
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
- u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
-- 2.11.0
Hi Leo,
-----Original Message----- From: Leo Yan leo.yan@linaro.org Sent: Thursday, May 23, 2019 11:37 AM To: Wojciech Żmuda wzmuda@n7space.com Cc: coresight@lists.linaro.org; Michal Mosdorf mmosdorf@n7space.com; Michał Kurowski mkurowski@n7space.com; mathieu.poirier@linaro.org; Al.Grant@arm.com Subject: Re: [RFC PATCH 1/1] perf cs-etm: Set time value for samples
Hi Wojciech,
On Fri, May 10, 2019 at 01:54:25PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Populate the time field in branches and instruction samples generated from RANGE, EXCEPTION and EXCEPTION_RET packets. The time value is based on TIMESTAMP packets that may appear in the trace.
The time value may be accessed through perf-script built-in script engine in the sample dict 'time' field or printed in 'perf script -F time' output (the latter only in CPU-wide mode at the moment).
Signed-off-by: Wojciech Zmuda wzmuda@n7space.com
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..838726748cd7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -204,6 +204,7 @@ static void cs_etm__clear_packet_queue(struct
cs_etm_packet_queue *queue)
queue->packet_buffer[i].exception_number = UINT32_MAX; queue->packet_buffer[i].trace_chan_id = UINT8_MAX; queue->packet_buffer[i].cpu = INT_MIN;
}queue->packet_buffer[i].timestamp = 0;
}
@@ -1098,6 +1099,7 @@ static int cs_etm__synth_instruction_sample(struct
cs_etm_queue *etmq,
sample.flags = tidq->prev_packet->flags; sample.insn_len = 1; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp;
if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); @@ -1157,6 +1159,7 @@
static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp;
/*
- perf report cannot handle events without a branch stack @@
-1894,6 +1897,37 @@ static int cs_etm__decode_data_block(struct
cs_etm_queue *etmq)
return ret; }
+static void +cs_etm__timestamp_traceid_queue_packet(struct cs_etm_traceid_queue +*tidq) {
- /**
* At this point, packet queue's timestamp and next_timestamp fields
* represent start and stop timestamp value for range packets
buffered
* in the queue. Since timestamp packets come once for every N range
packets,
* we cannot directly assign timestamp values to specific range
packets.
* Some estimation is done to tell at what approximate timestamp
* has execution of each instruction range started.
*/
- if (!tidq->prev_packet->timestamp) {
/* First range packet in this queue */
tidq->packet->timestamp = tidq->packet_queue.timestamp;
- } else {
/*
* Similarly to what is done in
cs_etm_decoder__do_{soft|hard}_timestamp(),
* adjust timestamp value by the number of executed
instructions,
* so that packets representing ranges of instructions
executed later,
* have higher timestamp value than those executed earlier, in
line with
* number of instruction executed in the previous range.
* Currently, the assumption is that execution time of each
instruction
* equals 1 timestamp cycle.
*
* TODO find better approximation.
*/
tidq->packet->timestamp = tidq->prev_packet->timestamp +
tidq->prev_packet->instr_count;
- }
+}
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq) { @@ -
1926,6 +1960,7 @@
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
switch (tidq->packet->sample_type) { case CS_ETM_RANGE:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the packet contains an instruction * range, generate instruction sequence @@ -1935,6
+1970,7 @@
static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET:
cs_etm__timestamp_traceid_queue_packet(tidq); /* * If the exception packet is coming, * make sure the previous instruction
Please note for handling the branch packet for discontinuity in trace, if the trace is discontinuity then the function cs_etm__flush() will be invoked to generate the last branch sample before discontinuity; but this patch misses to set timestamp for the last branch sample.
Thank you for your input. I've just sent V2 that addresses this issue. I've verified in gdb, that the branch sample created out of a discontinuity packet is indeed timestamped, as is the previous one (with the same time value, but I think this is expected, as it's not the real branching event and we don't adjust queue timestamp for it).
Best regards, Wojciech
Thanks, Leo Yan
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
- u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
-- 2.11.0
On Fri, 10 May 2019 at 07:54, Wojciech Żmuda wzmuda@n7space.com wrote:
From: Wojciech Zmuda wzmuda@n7space.com
Hello,
After our discussion in [1] I came up with this proposal solution on how to timestamp perf samples, so we can tell in trace post-processing, how long does it take to execute traced blocks of code. I'll be grateful if you find some time to take a look at it. The patch itself is 37 lines of additions, so it shouldn't consume a lot of your time. I also have some ideas to further improve this patch I'm not quite sure about, so I'd like to ask you for your input.
The simplest verification is visibility of the time field in 'perf script' output: root@zynq:~# perf script --ns -F comm,cpu,event,time | head Frame deformatter: Found 4 FSYNCS singlethread-at [001] 4572.697372400: branches:u: singlethread-at [001] 4572.697372404: branches:u: singlethread-at [001] 4572.697372408: branches:u: singlethread-at [001] 4572.697372412: branches:u: singlethread-at [001] 4572.697372416: branches:u:
The step above works only if trace has been collected in CPU-wide mode. I have an idea on how to improve this, but I'd like to get some more information first. I elaborated in the last paragraph.
Another way to access these values is using the script engine, which I used for validation of sanity of timestamp values. Using python script I checked if timestamps in subsequent samples are monotonically increasing and it seems they are. The only exception are discontinuities in trace. From my understanding, we can't timestamp discontinuities reasonably, since after decoder synchronizes back after trace loss, it needs to wait for another timestamp packet.
from __future__ import print_function import os import sys sys.path.append(os.environ['PERF_EXEC_PATH'] + \ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') from perf_trace_context import * prev_sample = 0 cnt = 0; def process_event(s): global prev_sample global cnt sample = s['sample'] if not prev_sample: prev_sample = sample; cnt = cnt+1 return if sample['time'] < prev_sample['time']: print('Sample %d has higher timestamp than sample %d' % (cnt-1, cnt)) cnt = cnt+1
By subtracting timestamps of two samples, we can approximately tell, how long did it take (in timestamp units) to execute instructions between these two points. Now, (basing on Al's response in [1]) knowing the timestamp generator clock frequency, we can tell how long did it take in proper time units. The frequency is known (we have it in DT) and can be saved to and retrieved from TSGEN register. This is not done at this moment (the register is empty), but the idea is, if it was, it might then be saved in perf.data headers (similarly as TRCIDR are saved now), and later programmatically retrieved to calculate timestamp differences into actual time. I didn't experiment with it yet - I'd like to learn your opinion on it.
I've also tried to access the time field with 'perf script -F time' and while it's possible for traces recorded in CPU-wide mode, it's not possible in per-thread mode due to PERF_SAMPLE_TIME not being set in that mode (cs_etm__synth_events()). I tried to analyze that code, but I can't understand relation between this flag and ETM's timeless_decoding property. It looks to me that timeless decoding was equal to per-thread mode, where timestamps were not really utilized, while in CPU-wide mode they were used to inter-queue correlation. Would it be a good direction to have a separate property of TSGEN being on/off and to set PERF_SAMPLE_TIME based on this, rather than on etm->timeless_decoding? I see an obstacle in the cs_etm__is_timeless_decoding() function, that sets etm->timeless_decoding basing on events having this SAMPLE_TIME flag or not - could the criterion here be the tracing mode (CPU-wide/per-thread) instead?
I have looked at your work but need to think more about the best way to proceed. I will get back to you on Monday or Tuesday.
Mathieu
Thank you, Wojciech
[1] https://lists.linaro.org/pipermail/coresight/2019-May/002577.html
Wojciech Zmuda (1): perf cs-etm: Set time value for samples
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++ tools/perf/util/cs-etm.h | 1 + 2 files changed, 37 insertions(+)
-- 2.11.0