On Fri, 28 Apr 2017 15:51:22 +0100 Kim Phillips kim.phillips@arm.com wrote:
On Thu, 23 Mar 2017 19:31:30 -0500 Kim Phillips kim.phillips@arm.com wrote:
OK, I've trimmed out all the nitty gritty details where we were trying to compare setups, mainly because I've reached a point where I can finally run an ETM trace without locking up the machine (without having to buy an SSD for an older operating system's rootfs :).
Recipe for success - at least for this juno R2 user - is:
- Use upstream Linus' ToT as source for kernel (not OpenCSD's
tree/branches). This commit (093b995) works for me, built with ubuntu's standard gcc 6.2 cross compiler:
For the record, I've since found out that this method uses the ETB trace method, and actually produces zero valid trace data in the resulting perf.data file. It turns out that pointing create_gcov to an empty-but-valid samples perf.data file produces an empty-but-valid .gcov file, and, in turn, using that to recompile sort.c with -fauto-profile magically produces a better performing resulting executable.
So I still can't get any sort (pun intended) of trace data out of my debian-based Juno r2.
I've been able to get past the hard-lockup situation when capturing ETM data by upgrading my Juno R2's firmware to the latest (Jan 2017).
However, sadly, I am still unable to replicate the results demonstrated in the AutoFDO instructions. When using host (cross) built tools, I see no improvement in performance:
$ gcc -O3 -fauto-profile=sort-3000-O3.gcov sort.c -o sort-autofdo $ ./sort-O3 Bubble sorting array of 30000 elements 6939 ms $ taskset -c 2 ./sort-O3 Bubble sorting array of 30000 elements 5375 ms $ taskset -c 2 ./sort-O3 Bubble sorting array of 30000 elements 5362 ms $ taskset -c 2 ./sort-O3 Bubble sorting array of 30000 elements 5369 ms $ taskset -c 2 ./sort-autofdo Bubble sorting array of 30000 elements 5361 ms $ taskset -c 2 ./sort-autofdo Bubble sorting array of 30000 elements 5359 ms $ taskset -c 2 ./sort-autofdo Bubble sorting array of 30000 elements 5362 ms $ taskset -c 2 ./sort-O3 Bubble sorting array of 30000 elements 5362 ms $ taskset -c 2 ./sort-autofdo Bubble sorting array of 30000 elements 5365 ms
In trying to eliminate any arch-dependent code in the tools, I discovered perf inject fails when built and run on the Juno target itself.
See the 'failed to process type' message when running inject on native built OpenCSD 4.11 branch perf with OpenCSD master branch libs:
--- [ 8038.213380] coresight-replicator-qcom 20120000.replicator: REPLICATOR enabled [ 8038.220440] coresight-funnel 20150000.funnel: FUNNEL inport 0 enabled [ 8038.226812] coresight-tmc 20010000.etf: TMC-ETF enabled [ 8038.231978] coresight-funnel 20040000.funnel: FUNNEL inport 0 enabled [ 8038.238350] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled 18567 ms [ 8038.284876] coresight-funnel 220c0000.funnel: FUNNEL inport 1 disabled [ 8038.291335] coresight-funnel 20040000.funnel: FUNNEL inport 0 disabled [ 8038.297793] coresight-tmc 20010000.etf: TMC-ETF disabled [ 8038.303047] coresight-funnel 20150000.funnel: FUNNEL inport 0 disabled [ 8038.309504] coresight-replicator-qcom 20120000.replicator: REPLICATOR disabled [ 8038.316651] coresight-tmc 20070000.etr: TMC-ETR disabled [ perf record: Woken up 84 times to write data ] Warning: AUX data lost 82 times out of 106!
[ perf record: Captured and wrote 86.034 MB perf.data ] + sudo perf inject -i perf.data -o inj.data --itrace=il64 --strip 0x350 [0x50]: failed to process type: 1 <<<<<<<<<<<<< here. + sudo perf report -i inj.data --stdio WARNING: The inj.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? Warning: Kernel address maps (/proc/{kallsyms,modules}) were restricted.
Check /proc/sys/kernel/kptr_restrict before running 'perf record'.
As no suitable kallsyms nor vmlinux was found, kernel samples can't be resolved.
Samples in kernel modules can't be resolved as well.
Error: The inj.data file has no samples! # To display the perf.data header info, please use --header/--header-only options. # + /home/kim/git/autofdo/create_gcov --binary=./sort-O3 --profile=inj.data --gcov=./sort-O3.gcov -gcov_version=1 E0509 13:53:22.904515 11360 utils.cc:231] Failed to open file inj.data E0509 13:53:22.905418 11360 profile_creator.cc:79] Error reading profile. + gcc -O3 -fauto-profile=./sort-O3.gcov ./sort.c -o ./sort-O3-autofdo ./sort.c:1:0: error: Cannot open profile file ./sort-O3.gcov. #include <stdio.h>
./sort.c:1: confused by earlier errors, bailing out ---
Has anyone else been able to confirm successful AutoFDO tests, and the type that don't just occur due to different compiler flags being implied with the -fauto-profile gcc flags when building the autofdo binary?
Thanks,
Kim