RE: [RFC] Trace-based execution time profiling (timestamp in perf-script?)

2 May 2019

      ...
Thanks Al. Sorry, I was not quite clear here. What I wanted to understand was
How exactly does timestamp generator increment its counter? I guess that it is F
times per second, where F is the frequency of TSGEN's clock, but I may be wrong.
Generally yes, though this will be a fixed frequency system clock
which may be independent of the CPU clock. Register 0x020 in
TSGEN holds the frequency, but I believe this is simply a place
for software to make a note of the value in case other software
wants to know. I.e. it's zero unless something writes it, and writing
it has no effect on the frequency.
Whoever first enables the timestamp, if it finds register 0x020 is
zero, ought to measure the frequency and make a note of it.
...
Thanks for this great write-up. So, concatenating this as well as Al's suggestions,
I think what needs to be done is the following:

When timestamp packet is encountered, get the packet queue and iterate

backwards, until the previous timestamp packet or discontinuity occurs.
2. For each encountered range/exception/exception_ret packet:

if this packet happened immediately before the timestamp packet, mark it with

value of this timestamp, decremented by the number of instructions executed in
this range
I don't think you need to decrement. The TS packet applies to the
most recent branch or exception. Also, decrementing by number
of instructions is definitely wrong as there is little relation between
instruction counts and timing of any sort.
...

if this is a subsequent non-TS packet, mark it with the timestamp value of the

previous non-TS packet, Decremented further in the same fashion.
BTW, is timestamp unit a clock cycle? It seems so, since we decrement it by
instruction count with assumption that one instruction lasts one clock cycle. If
so, what would be the difference between cycacc packets and timestamp
packets?
A timestamp unit is a system clock cycle of some kind.
Any system bigger than a small microcontroller would typically have
several clocks. There will be the system interconnect clock, generally
fixed frequency. CPUs may run at a variable ratio to the interconnect
clock and it may be possible to vary them independently of each other.
In addition to this, there is the system generic timer/counter (which
Linux uses for timing), and the CoreSight timestamp generator, which
both run at yet another fixed frequency. So you might have
- system interconnect at 1GHz
  - one CPU varying between 1.5GHz and 3GHz
  - another CPU varying between 500MHz and 2GHz
  - system generic timer/counter at 40MHz
  - CoreSight timestamp generator at 100MHz
You are likely to see some convergence between the system generic
timer, and the CoreSight timestamp generator. It's always been a bit
of a bugbear that across our various logs and traces, we have these
two different timebases. On the other hand, there's something to
be said for having a debug-related timestamp that's protected against
the sort of adjustment Linux might make to the system timer.
On top of all this lot, there is very little relation between instruction
count and cycle count. At best the core will execute 3 or 4 instructions
per cycle, at worst it will take tens or even hundreds of cycles to do
one instruction. The actual time at which an instruction does its work
may be quite loosely related to where you see it in the trace - you can
see this if you use the ETM Event packets to instrument performance
events, as you may get an event relating to an instruction when you
haven't even had the branch atom that leads up to the instruction.
Cycle counts in traces can help us understand how long instructions
are taking.
Al
...
Thank you and best regards,
Wojciech
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

RE: [RFC] Trace-based execution time profiling (timestamp in perf-script?)