On Wed, 1 May 2019 at 17:52, Wojciech Żmuda wzmuda@n7space.com wrote:
Hello,
On 30/04/2019, 18:48, "Al Grant" Al.Grant@arm.com wrote:
That may be the missing piece. Currently my understanding is that the timestamp generator is clocked with F [Hz], which means value in its output register increments F times per second. This is effectively limited by how fast we sample this register and how many of these values actually end in the sink and later in perf.data. Apart from that, I don't see other requirements w.r.t the clock. Could you please confirm my understanding?
The timestamp generator is global (one per chip) and runs it a fixed frequency. It might be possible to change this but it would be more of a BIOS thing. Definitely not something you'd do in response to a perf option.
Thanks Al. Sorry, I was not quite clear here. What I wanted to understand was How exactly does timestamp generator increment its counter? I guess that it is F times per second, where F is the frequency of TSGEN's clock, but I may be wrong.
On 01/05/2019, 00:43, "Mathieu Poirier" mathieu.poirier@linaro.org wrote:
I came up with another approach and it seems to work fine. I extended packet with the timestamp field, so each branch-related packet is timestamped with the last observed timestamp value. Then, each sample may have this timestamp as well.
I'm still not quite sure if timestamps don't require some clever adjustments, since branch samples are composed from two consecutive packets, and I assign simply one of them to the sample. As you suggested, I tried to understand your [2] patch, but I don't quite get relation of timestamp, next_timestamp and adjusting them by instr_count.
A typical trace will look as follow:
Sync_packet Range_packet_A(3 instructions) Range_packet_B(2 instructions) Timestamp(x) Range_packet_C(5 instructions) Range_packet_D(4 instructions) Range_packet_E(2 instructions) Timestamp(y) ... ...
As I mentionned above timestamps are generated for instructions that have *already* executed. So timestamp X was generated after instructions in range packet A an B have executed. The best we can do is estimate the execution time with once clock cycle per instructions. Given the above we can estimate that range packet A started at time X
- 5 and range packet B at X - 2.
Once we have received a timestamp we can anchor the next range packets on it. So range packet C started at time X, range packet D started at X + 5 and range packet E at X + 9. As soon as a new timestamp is received, i.e Y, we can readjust the hard timestamp to avoid incurring too much deviation.
Thanks for this great write-up. So, concatenating this as well as Al's suggestions, I think what needs to be done is the following:
- When timestamp packet is encountered, get the packet queue and iterate backwards, until the previous
timestamp packet or discontinuity occurs. 2. For each encountered range/exception/exception_ret packet:
- if this packet happened immediately before the timestamp packet, mark it with value of this timestamp,
decremented by the number of instructions executed in this range
- if this is a subsequent non-TS packet, mark it with the timestamp value of the previous non-TS packet,
Decremented further in the same fashion.
This seems very similar to what is currently done to correlate traces from different processors in the CPU-wide implementation. Without a patch to look at I can't really say much more.
BTW, is timestamp unit a clock cycle? It seems so, since we decrement it by instruction count with assumption that one instruction lasts one clock cycle. If so, what would be the difference between cycacc packets and timestamp packets?
Thank you and best regards, Wojciech