A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text?
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
A: The lost context. Q: What makes top-posted replies harder to read than bottom-posted?
On Fri, 12 May 2017 07:37:27 -0500 Sebastian Pop s.pop@samsung.com wrote:
Oh, I see this error now. Maybe you can launch gdb on perf-inject and see where this assert is triggered, and add code to handle/discard that opcode.
I haven't done anything to warrant that: I was just trying to follow the AutoFDO instructions from Documentation/trace/coresight.txt:
--- $ gcc-5 -O3 sort.c -o sort $ taskset -c 2 ./sort Bubble sorting array of 30000 elements 5910 ms
$ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 12543 ms [ perf record: Woken up 35 times to write data ] [ perf record: Captured and wrote 69.640 MB perf.data ]
$ perf inject -i perf.data -o inj.data --itrace=il64 --strip $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo $ taskset -c 2 ./sort_autofdo Bubble sorting array of 30000 elements 5806 ms ---
The following session is using a native-built perf based on the OpenCSD tree's perf-opencsd-4.11 commit e0cf71ef149f:
$ gcc -O3 sort.c -o sort $ taskset -c 2 ./sort Bubble sorting array of 30000 elements 5311 ms $ perf --version perf version 4.10.rc4.ge0cf71 $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort Error: failed to set config "20070000.etr" on event cs_etm/@20070000.etr/u with 13 (Permission denied)
$ sudo perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort Bubble sorting array of 30000 elements 14048 ms [ perf record: Woken up 63 times to write data ] Warning: AUX data lost 62 times out of 65!
[ perf record: Captured and wrote 62.663 MB perf.data ] $ sudo perf inject -i perf.data -o inj.data --itrace=il64 --strip File perf.data not owned by current user or root (use -f to override) $ sudo perf inject -f -i perf.data -o inj.data --itrace=il64 --strip 0x350 [0x50]: failed to process type: 1
Obviously there are missing keywords like 'sudo' (or '#' prompts instead of '$' prompts) in the instructions.
Even with the inject command provided in the last reply, I still get the error:
$ sudo perf inject -f -i perf.data -o inj.data --itrace=i100usl --strip 0x350 [0x50]: failed to process type: 1
I even went back to try the 4.11-rc1 branch, and get the same error.
That's all with native-built perf: something the instructions seem to imply: e.g., there are no instructions saying when to go in between the host and the target machines.
When trying to do the cross-built perf inject with OpenCSD libraries (also not mentioned in the instructions, nor does inject complain if not built without them), things start to work a bit better in that the "failed to process type: 1" error does not occur.
So it would have been better to have been provided more exact instructions on how to run the AutoFDO example, but now I'm seeing a different problem:
Given this excerpt of my version of sort.c (it has been modified):
void bubble_sort (int *a, int n) { int i, temp, swap_flag = 1; while (swap_flag) { swap_flag = 0; for (i = 1; i < n; i++) { if (a[i] < a[i - 1]) { temp = a[i]; a[i] = a[i - 1]; a[i - 1] = temp; swap_flag = 1; } } } }
I see this gcov generated with the intel-pt trace data:
$ dump_gcov ./sort-O3.gcov -gcov_version=1 <snip> 11: bubble_sort total:101485 2: 10 4: 28064 5: 27985 7: 8681 8: 8681 9: 28064 <snip>
which makes an improvement:
+ taskset -c 2 ./sort-O3 30000 Bubble sorting array of 30000 elements 1452 ms + taskset -c 2 ./sort-autofdo 30000 Bubble sorting array of 30000 elements 1356 ms
but the aarch64 version does not - in fact it makes things worse:
$ taskset -c 2 ./sort-O3 Bubble sorting array of 30000 elements 5302 ms $ taskset -c 2 ./sort-O3-autofdo Bubble sorting array of 30000 elements 6484 ms
Here are the instructions I used to make the aarch64 version:
x86host$ perf inject -i perf.data -o inj.data --itrace=i100usl --strip x86host$ create_gcov --binary=sort-O3 --profile=inj.data --gcov=sort-O3.gcov -gcov_version=1 x86host$ aarch64-linux-gnu-gcc -O3 -fauto-profile=./sort-O3.gcov ./sort.c -o ./sort-O3-autofdo
and here is the equivalent snippet of the gcov dump:
x86host$ dump_gcov sort-O3.gcov -gcov_version=1 <snip> 11: bubble_sort total:11570765 2: 2090 4: 2089 5: 0 7: 5436878 9: 6129708 <snip>
Any ideas?
Thanks,
Kim