Re: [RFC] Trace-based execution time profiling (timestamp in perf-script?)

2 May 2019

      On Wed, 1 May 2019 at 17:52, Wojciech Żmuda wzmuda@n7space.com wrote:
...
Hello,
On 30/04/2019, 18:48, "Al Grant" Al.Grant@arm.com wrote:
...
...
That may be the missing piece. Currently my understanding is that the timestamp
generator is clocked with F [Hz], which means value in its output register
increments F times per second. This is effectively limited by how fast we sample
this register and how many of these values actually end in the sink and later in
perf.data. Apart from that, I don't see other requirements w.r.t the clock.
Could you please confirm my understanding?
The timestamp generator is global (one per chip) and runs it a
fixed frequency. It might be possible to change this but it would be
more of a BIOS thing. Definitely not something you'd do in response
to a perf option.
Thanks Al. Sorry, I was not quite clear here. What I wanted to understand was
How exactly does timestamp generator increment its counter? I guess that
it is F times per second, where F is the frequency of TSGEN's clock, but I may
be wrong.
On 01/05/2019, 00:43, "Mathieu Poirier" mathieu.poirier@linaro.org wrote:
...
...
I came up with another approach and it seems to work fine. I extended packet
with the timestamp field, so each branch-related packet is timestamped with
the last observed timestamp value. Then, each sample may have this timestamp
as well.
I'm still not quite sure if timestamps don't require some clever adjustments,
since branch samples are composed from two consecutive packets, and I assign
simply one of them to the sample. As you suggested, I tried to understand your
[2] patch, but I don't quite get relation of timestamp, next_timestamp and
adjusting them by instr_count.
A typical trace will look as follow:
Sync_packet
Range_packet_A(3 instructions)
Range_packet_B(2 instructions)
Timestamp(x)
Range_packet_C(5 instructions)
Range_packet_D(4 instructions)
Range_packet_E(2 instructions)
Timestamp(y)
...
...
As I mentionned above timestamps are generated for instructions that
have *already* executed.  So timestamp X was generated after
instructions in range packet A an B have executed.  The best we can do
is estimate the execution time with once clock cycle per instructions.
Given the above we can estimate that range packet A started at time X

5 and range packet B at X - 2.

Once we have received a timestamp we can anchor the next range packets
on it.  So range packet C started at time X, range packet D started at
X + 5 and range packet E at X + 9.  As soon as a new timestamp is
received, i.e Y, we can readjust the hard timestamp to avoid incurring
too much deviation.
Thanks for this great write-up. So, concatenating this as well as Al's suggestions,
I think what needs to be done is the following:

When timestamp packet is encountered, get the packet queue and iterate backwards, until the previous

timestamp packet or discontinuity occurs.
2. For each encountered range/exception/exception_ret packet:

if this packet happened immediately before the timestamp packet, mark it with value of this timestamp,

decremented by the number of instructions executed in this range

if this is a subsequent non-TS packet, mark it with the timestamp value of the previous non-TS packet,

Decremented further in the same fashion.
This seems very similar to what is currently done to correlate traces
from different processors in the CPU-wide implementation.  Without a
patch to look at I can't really say much more.
...
BTW, is timestamp unit a clock cycle? It seems so, since we decrement it by instruction count
with assumption that one instruction lasts one clock cycle. If so, what would be the difference between
cycacc packets and timestamp packets?
Thank you and best regards,
Wojciech

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [RFC] Trace-based execution time profiling (timestamp in perf-script?)