Hi Wojciech,
On Fri, Jun 07, 2019 at 05:06:40PM +0000, Wojciech Żmuda wrote:
From: Wojciech Zmuda wzmuda@n7space.com
This patchset adds time notion to perf instruction and branch samples to allow coarse time measurement of code blocks execution.
The simplest verification is visibility of the time field in 'perf script' output:
root@zynq:~# perf record -e cs_etm/timestamp,@fe970000.etr/u -a sleep 1 Couldn't synthesize bpf events. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.262 MB perf.data ]
root@zynq:~# perf script --ns -F cpu,comm,time perf [002] 9546.053455325: perf [002] 9546.053455340: perf [002] 9546.053455344: (...) sleep [003] 9546.060163742: sleep [003] 9546.060163754: sleep [003] 9546.060163766: (...) ntpd [001] 9546.389083194: ntpd [001] 9546.389083400: ntpd [001] 9546.389086319: (...)
The step above works only if trace has been collected in CPU-wide mode because of some perf event flags mismatch I'm working on fixing.
Even if I had no comment on this set (see other patch) I couldn't apply it before the problems you report above are fixed.
In general, timestamps in subsequent samples are monotonically increasing. Exceptions are:
- discontinuities in trace. From my understanding, we can't timestamp discontinuities reasonably, since after decoder synchronizes back after trace loss, it needs to wait for another timestamp packet. Thus, time value of such samples stays at 0x0.
- cases when a lot of non-branching instructions is executed between two subsequent timestamps. Since we use approximation of timestamps for subsequent samples by increasing the timestamp value by instruction count, it is possible that we go past the value of the next hardware timestamp.
Another way to access these values is to use the perf script engine, which I used for validation of the feature. The script below calculates timestamp differences of two consecutive branches sharing the same branch address. This is a simple example of execution time fluctuation detector.
from __future__ import print_function
import os import sys
sys.path.append(os.environ['PERF_EXEC_PATH'] + \ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
from perf_trace_context import *
target_start_addr = int('4005e4', 16) # 0x4005e4 is func() from listing below
branch = dict() branch['from'] = 0 branches = []
def process_event(s): global branch global branches
sample = s['sample'] branch['cpu'] = sample['cpu'] if not branch['from']: branch['from'] = sample['addr'] branch['ts'] = sample['time'] return branch['to'] = sample['ip'] if not branch['to']: branch['from'] = 0 branch['ts'] = 0 return if branch['from'] and branch['to']: branches.append(branch.copy()) branch['from'] = 0 return
def trace_end(): global branches
count = 0 timestamp_start = 0 print("Got {0} samples:".format(len(branches))) for b in branches: if b['from'] == target_start_addr: if not timestamp_start: timestamp_start = b['ts'] continue print("[{0}]: ts diff = 0x{1:x} - 0x{2:x} = {3:d}".format(count, b['ts'], timestamp_start, b['ts'] - timestamp_start)) count = count + 1 timestamp_start = b['ts']
The following function was traced:
static int func(int cnt) { volatile int x = 0; static int i;
x += cnt + 0xdeadbeefcafeb00b; (...) /* repeats ~100 times */ if (i++ % 3 == 0) // Every third execution is longer usleep(1000); return x;
}
root@zynq:~# perf record -m,16K -e cs_etm/timestamp,@fe970000.etr/u \ --filter 'filter func @./program \ --per-thread ./program Couldn't synthesize bpf events. CTRL+C me when you find appropriate. ^C[ perf record: Woken up 12 times to write data ] [ perf record: Captured and wrote 0.135 MB perf.data ]
root@zynq:~# perf script -s exectime.py Got 2469 samples: [0]: ts diff = 0x92f2752e512 - 0x92f274a7ae9 = 551465 [1]: ts diff = 0x92f2752e694 - 0x92f2752e512 = 386 [2]: ts diff = 0x92f2752e817 - 0x92f2752e694 = 387 [3]: ts diff = 0x92f275bef12 - 0x92f2752e817 = 591611 [4]: ts diff = 0x92f275bf093 - 0x92f275bef12 = 385 [5]: ts diff = 0x92f275bf211 - 0x92f275bf093 = 382 [6]: ts diff = 0x92f276451d7 - 0x92f275bf211 = 548806 [7]: ts diff = 0x92f2764535a - 0x92f276451d7 = 387 [8]: ts diff = 0x92f276454d7 - 0x92f2764535a = 381 [9]: ts diff = 0x92f276cb256 - 0x92f276454d7 = 548223 [10]: ts diff = 0x92f276cb3d9 - 0x92f276cb256 = 387 [11]: ts diff = 0x92f276cb556 - 0x92f276cb3d9 = 381 (...)
At the listing above it is visible that every third execution of the function lasted longer than the other two. It is a naive example and could be enhanced to point to the area that caused the disruption by examining events 'in the middle' of the traced code range.
Applies cleanly on Mathieu's 5.1-rc3-cpu-wide-v3 branch.
Changes for V3:
- fix checkpatch warnings
I really appreciate that.
Thanks, Mathieu
- don't timestamp discontinuity packets with hardware timestamp, as it is for range packets only
- fix timestamping packets on full queue in per-thread mode
- simplify code in timestamping last branch samples before discontinuity and in timestamping pending packets
Changes for V2:
- move packet timestamping logic to decoder. Front end only uses this information to timestamp samples (as suggested by Mathieu).
- leave original behaviour of CPU-wide mode, where decoder is stopped and front end is triggered about pending queue with timestamp packet. At the same time, always adjust next and current timestamp in both CPU-wide and per-thread modes (as suggested by Mathieu).
- when timestamp packet is encountered, timestamp range and discontinuity packets waiting in the queue, that are not yet consumed by the front end (as suggested by Mathieu).
- don't timestamp exceptions, since they are not turned into branch nor instruction samples.
- fix timestamping of the last branch sample before discontinuity appears (as suggested by Leo).
Wojciech Zmuda (1): perf cs-etm: Set time value for samples
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 70 ++++++++++++++++++------- tools/perf/util/cs-etm.c | 3 ++ tools/perf/util/cs-etm.h | 1 + 3 files changed, 55 insertions(+), 19 deletions(-)
-- 2.11.0