That may be the missing piece. Currently my understanding is that the timestamp generator is clocked with F [Hz], which means value in its output register increments F times per second. This is effectively limited by how fast we sample this register and how many of these values actually end in the sink and later in perf.data. Apart from that, I don't see other requirements w.r.t the clock. Could you please confirm my understanding?
The timestamp generator is global (one per chip) and runs it a fixed frequency. It might be possible to change this but it would be more of a BIOS thing. Definitely not something you'd do in response to a perf option.
What you can do (and what is reasonable to control through perf) is change how often it's sampled into the trace stream.
Thank you for this reference. Actually I'm working on v1 of this patch set on my devboard. I took a look at v3 and I don't see major differences around the commit you mentioned. Anyway, I examined it closely and noticed that my previous solution does not make much sense, since the decoder updates the timestamp value cached in the packet queue much quicker than samples are produced, which resulted in samples having the same timestamp value over and over again (from the last decoding pass), until the queue is drained.
I came up with another approach and it seems to work fine. I extended packet with the timestamp field, so each branch-related packet is timestamped with the last observed timestamp value. Then, each sample may have this timestamp as well.
It should be done the other way round. When you see a timestamp value you apply it to the last observed: - atom i.e. Y/N branch - event packet (perf doesn't use these) - exception
So when you see a branch, you don't know what its timestamp is, until you see the next timestamp packet.
Al
I'm still not quite sure if timestamps don't require some clever adjustments, since branch samples are composed from two consecutive packets, and I assign simply one of them to the sample. As you suggested, I tried to understand your [2] patch, but I don't quite get relation of timestamp, next_timestamp and adjusting them by instr_count.
The change looks like this:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 33e975c8d11b..1c052ae9aaaa 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -372,6 +372,7 @@ cs_etm_decoder__buffer_packet(struct cs_etm_packet_queue *packet_queue, packet_queue->packet_buffer[et].flags = 0; packet_queue->packet_buffer[et].exception_number = UINT32_MAX; packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
packet_queue->packet_buffer[et].timestamp =
packet_queue->next_timestamp;
if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_WAIT;
@@ -424,6 +425,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq, case OCSD_INSTR_BR: case OCSD_INSTR_BR_INDIRECT: packet->last_instr_taken_branch = elem->last_instr_exec;
packet->timestamp = packet_queue->next_timestamp; break; case OCSD_INSTR_ISB: case OCSD_INSTR_DSB_DMB:
@@ -483,6 +485,7 @@ cs_etm_decoder__buffer_exception(struct cs_etm_packet_queue *queue,
packet = &queue->packet_buffer[queue->tail]; packet->exception_number = elem->exception_number;
packet->timestamp = queue->next_timestamp; return ret;
} diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8496190ad553..aaffb9026af6 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1157,6 +1157,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.cpu = tidq->packet->cpu; sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
sample.time = tidq->packet->timestamp; /* * perf report cannot handle events without a branch stack diff --git
a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index 1f6ec7babe70..a14b0220eaa7 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -122,6 +122,7 @@ struct cs_etm_packet { enum cs_etm_isa isa; u64 start_addr; u64 end_addr;
u64 timestamp; u32 instr_count; u32 last_instr_type; u32 last_instr_subtype;
Thanks, Wojciech
With the following change, the time field is populated in the param_dict['sample']['time'] field (sample = param_dict['sample']):
diff --git a/arm-cs-trace-disasm.py b/arm-cs-trace-disasm.py index 1239ab4b6a58..da5cb4d53b4a 100644 --- a/arm-cs-trace-disasm.py +++ b/arm-cs-trace-disasm.py @@ -232,4 +232,5 @@ def process_event(param_dict): print "Address range [ %s .. %s ]: isn't in same dso" %
(start_addr, stop_addr)
return
print('Timestamp 0x%lx' % sample['time']) dump_disam(prev_dso, start_addr, stop_addr)
Then, applying the script to perf-script prints timestamps along with
disassembly:
root@zynq:~# perf script -s arm-cs-trace-disasm.py -- -d objdump | less ARM CoreSight Trace Data Assembler Dump kallsyms cannot be used to dump assembler Timestamp 0x53e56a31002 0000000000400a80 <__libc_csu_init>: 400a80: a9bc7bfd stp x29, x30, [sp, #-64]! 400a84: 910003fd mov x29, sp 400a88: a901d7f4 stp x20, x21, [sp, #24] 400a8c: 90000094 adrp x20, 410000
<__FRAME_END__+0xf3f0>
400a90: 90000095 adrp x21, 410000
<__FRAME_END__+0xf3f0>
400a94: 9137c294 add x20, x20, #0xdf0 400a98: 9137a2b5 add x21, x21, #0xde8 400a9c: a902dff6 stp x22, x23, [sp, #40] 400aa0: cb150294 sub x20, x20, x21 400aa4: f9001ff8 str x24, [sp, #56] 400aa8: 2a0003f6 mov w22, w0 400aac: aa0103f7 mov x23, x1 400ab0: 9343fe94 asr x20, x20, #3 400ab4: aa0203f8 mov x24, x2 400ab8: 97fffece bl 4005f0 <_init>
CPU3: CS_ETM_TRACE_ON packet is inserted Timestamp 0x53e56a3100e 00000000004008dc <main>: 4008dc: a9b37bfd stp x29, x30, [sp, #-208]! 4008e0: 910003fd mov x29, sp 4008e4: b9001fa0 str w0, [x29, #28] 4008e8: f9000ba1 str x1, [x29, #16] 4008ec: f9400ba0 ldr x0, [x29, #16] 4008f0: 91002000 add x0, x0, #0x8 4008f4: f9400001 ldr x1, [x0] 4008f8: f9400ba0 ldr x0, [x29, #16] 4008fc: 91004000 add x0, x0, #0x10 400900: f9400000 ldr x0, [x0] 400904: aa0003e2 mov x2, x0 400908: b9401fa0 ldr w0, [x29, #28] 40090c: 97ffffde bl 400884 <opt_validate> CPU3: CS_ETM_TRACE_ON packet is inserted Timestamp 0x53e56a31924 0000000000400910 <main+0x34>: 400910: 7100001f cmp w0, #0x0 400914: 540000c0 b.eq 40092c <main+0x50> //
b.none
Timestamp 0x53e56a31924 000000000040092c <main+0x50>: 40092c: f9400ba0 ldr x0, [x29, #16] 400930: 91004000 add x0, x0, #0x10 400934: f9400000 ldr x0, [x0] 400938: 39400000 ldrb w0, [x0] 40093c: 7101941f cmp w0, #0x65 400940: 1a9f17e0 cset w0, eq // eq = none 400944: 12001c00 and w0, w0, #0xff 400948: b9002fa0 str w0, [x29, #44] 40094c: f9400ba0 ldr x0, [x29, #16] 400950: 91002000 add x0, x0, #0x8
I need to investigate it further to make sure getting timestamps from this source is a good idea - I'm not convinced if timestamp as a packet queue parameter is refreshed frequently enough to keep up with timestamp packets actually emitted.
Look at [2] above, it should give you what you need.
Then, I think, it should be possible to generate samples (not sure what type though, perhaps not 'branch' this time) for timestamp/cycacc packets, analogically to what has been done for TRACE_ON https://lists.linaro.org/pipermail/coresight/2018-May/001327.htm l and then expose it in the python interface.
I'd be grateful for any opinion about this idea, especially about usefulness of such feature for the general audience, as well as any possible compatibility issues. If you are aware of another approach to achieve timestamp correlation with branch samples, it would also be very
welcome.
Could you review Mathieu's patches 'perf tools: Configure timestsamp generation in CPU-wide mode' [1] and 'coresight: etm4x: Configure tracers to emit timestamps' [2]?
Thanks for pointing it. I did reviewed them, however they didn't help
much.
They are pretty much just configuration of the timestamping module, which works well for me.
You are correct, they just deal with the mechanic of enabling the generation of timestamp packets. How I use them is in [2] above.
I hope the idea is not completely pointless. I'm still making my way through the perf subsystem, so I might have missed some crucial
details.
I don't think the idea is pointless. You simply have a new use case and from my point of view it is enough to receive consideration.
The idea looks good to me. Actually I am very curious with below question:
What's the brief benefit we can get from enabling timestamp for CoreSight branch events, and this cannot be fulfilled by Perf's cpu-clock/task-clock events and PMU cpu cycle event?
I tried to research cpu-clock and task-clock and it looks like they are based on wall clock, while CS timestamping is CPU-independent. Measurement with CS may help to narrow down instruction stalls, which, I
believe, would be hidden otherwise.
CPU cycle seems like a good measurement in this case, but I'm not sure if correlating PMU events with specific instruction range wouldn't be harder than extracting timestamps we already have in the
stream.
Best regards, Wojciech
Thanks, Leo Yan
[1] https://lists.linaro.org/pipermail/coresight/2019-March/002259.htm l [2] https://lists.linaro.org/pipermail/coresight/2019-March/002232.htm l
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.