Re: [RFC PATCH v3 0/1] Timestamp perf samples

10 Jun 2019

      Hi Wojciech,
On Fri, Jun 07, 2019 at 05:06:40PM +0000, Wojciech Żmuda wrote:
...
From: Wojciech Zmuda wzmuda@n7space.com
This patchset adds time notion to perf instruction and branch samples to allow
coarse time measurement of code blocks execution.
The simplest verification is visibility of the time field in 'perf script' output:
root@zynq:~# perf record -e cs_etm/timestamp,@fe970000.etr/u -a sleep 1
   Couldn't synthesize bpf events.
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.262 MB perf.data ]
root@zynq:~# perf script --ns -F cpu,comm,time
               perf [002]  9546.053455325:
               perf [002]  9546.053455340:
               perf [002]  9546.053455344:
               (...)
              sleep [003]  9546.060163742:
              sleep [003]  9546.060163754:
              sleep [003]  9546.060163766:
               (...)
               ntpd [001]  9546.389083194:
               ntpd [001]  9546.389083400:
               ntpd [001]  9546.389086319:
               (...)
The step above works only if trace has been collected in CPU-wide mode because of some
perf event flags mismatch I'm working on fixing.
Even if I had no comment on this set (see other patch) I couldn't apply it
before the problems you report above are fixed.
...
In general, timestamps in subsequent samples are monotonically increasing. Exceptions are:

discontinuities in trace. From my understanding, we can't timestamp discontinuities
reasonably, since after decoder synchronizes back after trace loss, it needs to wait for
another timestamp packet. Thus, time value of such samples stays at 0x0.
cases when a lot of non-branching instructions is executed between two subsequent timestamps.
Since we use approximation of timestamps for subsequent samples by increasing the timestamp
value by instruction count, it is possible that we go past the value of the next hardware
timestamp.

Another way to access these values is to use the perf script engine, which I used for validation
of the feature. The script below calculates timestamp differences of two consecutive branches
sharing the same branch address. This is a simple example of execution time fluctuation detector.
from __future__ import print_function
import os
   import sys
sys.path.append(os.environ['PERF_EXEC_PATH'] + \
   	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
from perf_trace_context import *
target_start_addr = int('4005e4', 16)  # 0x4005e4 is func() from listing below
branch = dict()
   branch['from'] = 0
   branches = []
def process_event(s):
       global branch
       global branches
   sample = s['sample']

   branch['cpu'] = sample['cpu']

   if not branch['from']:
       branch['from'] = sample['addr']
       branch['ts'] = sample['time']
       return

   branch['to'] = sample['ip']
   if not branch['to']:
       branch['from'] = 0
       branch['ts'] = 0
       return

   if branch['from'] and branch['to']:
       branches.append(branch.copy())
       branch['from'] = 0
       return

def trace_end():
       global branches
   count = 0
   timestamp_start = 0

   print("Got {0} samples:".format(len(branches)))

   for b in branches:
       if b['from'] == target_start_addr:
           if not timestamp_start:
               timestamp_start = b['ts']
               continue

           print("[{0}]: ts diff = 0x{1:x} - 0x{2:x} = {3:d}".format(count,
               b['ts'], timestamp_start, b['ts'] - timestamp_start))
           count = count + 1
           timestamp_start = b['ts']

The following function was traced:
static int func(int cnt)
   {
     volatile int x = 0;
     static int i;
 x += cnt + 0xdeadbeefcafeb00b;
 (...) /* repeats ~100 times */

 if (i++ % 3 == 0) // Every third execution is longer
   usleep(1000);

 return x;

}
root@zynq:~# perf record -m,16K -e cs_etm/timestamp,@fe970000.etr/u \
                  --filter 'filter func @./program \
                  --per-thread ./program
   Couldn't synthesize bpf events.
   CTRL+C me when you find appropriate.
   ^C[ perf record: Woken up 12 times to write data ]
   [ perf record: Captured and wrote 0.135 MB perf.data ]
root@zynq:~# perf script -s exectime.py 
   Got 2469 samples:
   [0]: ts diff = 0x92f2752e512 - 0x92f274a7ae9 = 551465
   [1]: ts diff = 0x92f2752e694 - 0x92f2752e512 = 386
   [2]: ts diff = 0x92f2752e817 - 0x92f2752e694 = 387
   [3]: ts diff = 0x92f275bef12 - 0x92f2752e817 = 591611
   [4]: ts diff = 0x92f275bf093 - 0x92f275bef12 = 385
   [5]: ts diff = 0x92f275bf211 - 0x92f275bf093 = 382
   [6]: ts diff = 0x92f276451d7 - 0x92f275bf211 = 548806
   [7]: ts diff = 0x92f2764535a - 0x92f276451d7 = 387
   [8]: ts diff = 0x92f276454d7 - 0x92f2764535a = 381
   [9]: ts diff = 0x92f276cb256 - 0x92f276454d7 = 548223
   [10]: ts diff = 0x92f276cb3d9 - 0x92f276cb256 = 387
   [11]: ts diff = 0x92f276cb556 - 0x92f276cb3d9 = 381
   (...)
At the listing above it is visible that every third execution of the function lasted longer
than the other two. It is a naive example and could be enhanced to point to the area that
caused the disruption by examining events 'in the middle' of the traced code range.
Applies cleanly on Mathieu's 5.1-rc3-cpu-wide-v3 branch.
Changes for V3:

fix checkpatch warnings

I really appreciate that.
Thanks,
Mathieu
...

don't timestamp discontinuity packets with hardware timestamp, as it is
for range packets only
fix timestamping packets on full queue in per-thread mode
simplify code in timestamping last branch samples before discontinuity
and in timestamping pending packets

Changes for V2:

move packet timestamping logic to decoder. Front end only uses this information
to timestamp samples (as suggested by Mathieu).
leave original behaviour of CPU-wide mode, where decoder is stopped
and front end is triggered about pending queue with timestamp packet.
At the same time, always adjust next and current timestamp in both CPU-wide
and per-thread modes (as suggested by Mathieu).
when timestamp packet is encountered, timestamp range and discontinuity packets
waiting in the queue, that are not yet consumed by the front end (as suggested by Mathieu).
don't timestamp exceptions, since they are not turned into branch nor instruction
samples.
fix timestamping of the last branch sample before discontinuity appears (as suggested by Leo).

Wojciech Zmuda (1):
  perf cs-etm: Set time value for samples
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 70 ++++++++++++++++++-------
 tools/perf/util/cs-etm.c                        |  3 ++
 tools/perf/util/cs-etm.h                        |  1 +
 3 files changed, 55 insertions(+), 19 deletions(-)
-- 
2.11.0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [RFC PATCH v3 0/1] Timestamp perf samples