[RFC PATCH 0/1] Timestamp perf samples

10 May 2019


      From: Wojciech Zmuda wzmuda@n7space.com
Hello,
After our discussion in [1] I came up with this proposal solution on how to timestamp
perf samples, so we can tell in trace post-processing, how long does it take to execute
traced blocks of code. I'll be grateful if you find some time to take a look at it.
The patch itself is 37 lines of additions, so it shouldn't consume a lot of your time.
I also have some ideas to further improve this patch I'm not quite sure about, so I'd like
to ask you for your input.
The simplest verification is visibility of the time field in 'perf script' output:
    root@zynq:~# perf script --ns -F comm,cpu,event,time | head
    Frame deformatter: Found 4 FSYNCS
     singlethread-at [001]  4572.697372400:                        branches:u:
     singlethread-at [001]  4572.697372404:                        branches:u:
     singlethread-at [001]  4572.697372408:                        branches:u:
     singlethread-at [001]  4572.697372412:                        branches:u:
     singlethread-at [001]  4572.697372416:                        branches:u:
The step above works only if trace has been collected in CPU-wide mode. I have an idea
on how to improve this, but I'd like to get some more information first. I elaborated
in the last paragraph.
Another way to access these values is using the script engine, which I used for validation
of sanity of timestamp values. Using python script I checked if timestamps in subsequent
samples are monotonically increasing and it seems they are. The only exception are discontinuities
in trace. From my understanding, we can't timestamp discontinuities reasonably, since after
decoder synchronizes back after trace loss, it needs to wait for another timestamp packet.
from __future__ import print_function
    import os
    import sys
sys.path.append(os.environ['PERF_EXEC_PATH'] + \
            '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
from perf_trace_context import *
prev_sample = 0
    cnt = 0;
def process_event(s):
       global prev_sample
       global cnt
sample = s['sample']
if not prev_sample:
           prev_sample = sample;
           cnt = cnt+1
           return
if sample['time'] < prev_sample['time']:
           print('Sample %d has higher timestamp than sample %d' % (cnt-1, cnt))
       cnt = cnt+1
By subtracting timestamps of two samples, we can approximately tell, how long did it take
(in timestamp units) to execute instructions between these two points. Now, (basing on Al's response
in [1]) knowing the timestamp generator clock frequency, we can tell how long did it take
in proper time units. The frequency is known (we have it in DT) and can be saved to and retrieved
from TSGEN register. This is not done at this moment (the register is empty), but the idea is,
if it was, it might then be saved in perf.data headers (similarly as TRCIDR are saved now),
and later programmatically retrieved to calculate timestamp differences into actual time.
I didn't experiment with it yet - I'd like to learn your opinion on it.
I've also tried to access the time field with 'perf script -F time' and while it's possible
for traces recorded in CPU-wide mode, it's not possible in per-thread mode due to PERF_SAMPLE_TIME
not being set in that mode (cs_etm__synth_events()). I tried to analyze that code, but I can't
understand relation between this flag and ETM's timeless_decoding property. It looks to me
that timeless decoding was equal to per-thread mode, where timestamps were not really utilized,
while in CPU-wide mode they were used to inter-queue correlation. Would it be a good direction to have
a separate property of TSGEN being on/off and to set PERF_SAMPLE_TIME based on this, rather
than on etm->timeless_decoding? I see an obstacle in the cs_etm__is_timeless_decoding() function,
that sets etm->timeless_decoding basing on events having this SAMPLE_TIME flag or not - could the
criterion here be the tracing mode (CPU-wide/per-thread) instead?
Thank you,
Wojciech
[1] https://lists.linaro.org/pipermail/coresight/2019-May/002577.html
Wojciech Zmuda (1):
  perf cs-etm: Set time value for samples
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++
 tools/perf/util/cs-etm.h |  1 +
 2 files changed, 37 insertions(+)
-- 
2.11.0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[RFC PATCH 0/1] Timestamp perf samples