-----Original Message----- From: Mathieu Poirier [mailto:mathieu.poirier@linaro.org] Sent: 14 November 2017 17:56 To: Robert Walker Robert.Walker@arm.com Cc: CoreSight@lists.linaro.org Subject: Re: [PATCH 0/2] perf inject branch stack fixes
Hi Robert and thanks for the code.
On 13 November 2017 at 08:11, Robert Walker robert.walker@arm.com wrote:
These patches fix some issues with the branch stacks generated from CoreSight ETM trace.
The main issues addressed are:
- The branch stack should only contain taken branches.
- The instruction samples are generated using the period specified by the --itrace option to perf inject. Currently, the period can only be specified as an instruction count - further work is required to specify the period as a cycle count or time interval.
- The ordering of the branch stack should have newest branch first.
- Some minor fixes to the address calculations.
With these fixes, the branch stacks are more similar to the last branch records produced by 'perf record -b' and Intel-PT on x86. There are similar improvements in the autofdo profiles generated from
these traces.
I'm a little confused. Here you mention that reverting d3fa0f70b7e8 make records look more similar to intelPT, but the changelog in d3fa0f70b7e8 claims the same thing. We obviously have two diverging point of views and I'd like to have a better understanding of the situation. Is there any way I can test this on my side?
Thanks, Mathieu
I used the attached test program to test this - main() calls f1() which then calls f2(), which calls f3().
On an x86 PC, I recorded last branch records with: perf record -b ./call_chain
And Intel-PT with: perf record -e intel_pt//u ./call_chain perf inject -i perf.data -o inj.data --itrace=i10000il --strip
In each case, perf report -D is used to view the instruction samples with branch stacks.
The attached script, addr_mapper.py makes it easier to see what's going on in the branch stack by annotating addresses with their offsets from symbols.
objdump -d ./call_chain > call_chain.dump perf report -D -i perf.data | ./addr_mapper.py call_chain.dump | less
This results in a branch stack like this:
13 12877531596650324 [main+1287753159624fd3e] 0x4528 [0x640]: PERF_RECORD_SAMPLE(IP, 0x2): 17548/17548: 0x400608 [main+22] period: 10000 a ddr: 0 ... branch stack: nr:64 ..... 0: 000000000040061d [main+37] -> 0000000000400605 [main+1f] 0 cycles 0 ..... 1: 00000000004005e5 [f1+1c] -> 000000000040060f [main+29] 0 cycles 0 ..... 2: 00000000004005c8 [f2+1c] -> 00000000004005de [f1+15] 0 cycles 0 ..... 3: 00000000004005ab [f3+e] -> 00000000004005c1 [f2+15] 0 cycles 0 ..... 4: 00000000004005bc [f2+10] -> 000000000040059d [f3] 0 cycles 0 ..... 5: 00000000004005d9 [f1+10] -> 00000000004005ac [f2] 0 cycles 0 ..... 6: 000000000040060a [main+24] -> 00000000004005c9 [f1] 0 cycles 0 ..... 7: 000000000040061d [main+37] -> 0000000000400605 [main+1f] 0 cycles 0 ..... 8: 00000000004005e5 [f1+1c] -> 000000000040060f [main+29] 0 cycles 0 ..... 9: 00000000004005c8 [f2+1c] -> 00000000004005de [f1+15] 0 cycles 0 ..... 10: 00000000004005ab [f3+e] -> 00000000004005c1 [f2+15] 0 cycles 0 ..... 11: 00000000004005bc [f2+10] -> 000000000040059d [f3] 0 cycles 0 ..... 12: 00000000004005d9 [f1+10] -> 00000000004005ac [f2] 0 cycles 0 ..... 13: 000000000040060a [main+24] -> 00000000004005c9 [f1] 0 cycles 0
Entry 13 is the call from main() to f1(), entry 12 is the call from f1() to f2(), entry 11 is the call from f2() to f2(). Then entries 10, 9 & 8 are the returns from f3(), f2(), f1() to main().
Without the reversion of d3fa0f7, the Arm trace produced the reverse stack, so that the call from main() to f1() appeared at the top, f1() to f2() as the 2nd entry, f2() to f3() as the 3rd and so on. With d3fa0f7 reverted, the Arm stacks match the order of the intel stacks.
Hope this helps.
Regards
Rob
The patches apply to the autoFDO branch of https://github.com/Linaro/perf-opencsd.git (d3fa0f7)
Regards
Robert Walker
Robert Walker (2): Revert "perf inject: record branches in chronological order" perf: Fix branch stack records from CoreSight ETM decode
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 4 +- tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +- tools/perf/util/cs-etm.c | 134 +++++++++++++----------- 3 files changed, 73 insertions(+), 67 deletions(-)
-- 1.9.1
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight