This patch series adds support for thread stack and callchain; this patch set depends on the instruction sample fix patch set [1].
This patch set get more complex, so before divide into small groups, I'd like to use this patch set version to include all relevant patches, hope this can give whole context for related code change.
Briefly, this patch can be divided into three parts, which also can be reviewed separately for every part:
Patches 01, 02 are used to fix samples for one corner case is for accessing the branch's target address and trigger an exception. Essentially, an extra branch sample is added to reflect this mediate branch between the previous branch and exception entry.
Patches 03, 04, 05, 06 are coming from patch v4, which are used to support thread stack and callchain.
Patches 07, 08, 09 are used to fixup for exception entry and exit. This is mainly used to fix two cases, one part is to fixup the thread stack and callchain for the case when access branch target address and trigger exception; another part is to fixup the thread stack for instruction emulation (and other single step cases).
This patch set has been tested on Juno-r2 after applied on perf/core branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock to maps__insert() error case"), and this patch set is also applied on top of instruction sample fix patch set [1].
Test for option '-F,+callindent':
# perf script -F,+callindent main 3258 1 branches: main ffffad684d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so) main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main) main 3258 1 branches: _dl_fixup ffffad811b4c _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 3258 1 branches: _dl_lookup_symbol_x ffffad80c078 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 3258 1 branches: do_lookup_x ffffad80849c _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 3258 1 branches: check_match ffffad807bf0 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 3258 1 branches: strcmp ffffad807888 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main) main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main) main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main) main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
[...]
Test for option '--itrace=g':
# perf script --itrace=g16l64i100
main 3258 100 instructions: ffffad816a80 memcpy+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad809468 _dl_new_object+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions: ffffad80952c _dl_new_object+0x16c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions: ffffad8018dc dl_main+0x814 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions: ffff8000100878d0 el0_sync_handler+0x168 ([kernel.kallsyms]) ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms]) ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
Changes from v4: * Addressed Mike's suggestion for performance improvement for function cs_etm__instr_addr() for quick calculation for non T32; * Removed the patch 'perf cs-etm: Synchronize instruction sample with the thread stack' (Mike); * Fixed the issue for exception is taken for branch target address accessing, for the branch sample and stack thread handling, the related patches are 01, 02, 07; * Fixed the stack thread handling for instruction emulation and single step with patches 08, 09.
Changes from v3: * Split out separate patch set for instruction samples fixing. * Rebased on latest perf/core branch.
Changes from v2: * Added patch 01 to fix the unsigned variable comparison to zero (Suzuki). * Refined commit logs.
Changes from v1: * Added comments for task thread handling (Mathieu). * Split patch 02 into two patches, one is for support thread stack and another is for callchain support (Mathieu). * Added a new patch to support branch filter.
[1] https://lkml.org/lkml/2020/2/18/1406
Leo Yan (9): perf cs-etm: Defer to assign exception sample flag perf cs-etm: Reflect branch prior to exception perf cs-etm: Refactor instruction size handling perf cs-etm: Support thread stack perf cs-etm: Support branch filter perf cs-etm: Support callchain for instruction sample perf cs-etm: Fixup exception entry for thread stack perf thread: Add helper to get top return address perf cs-etm: Fixup exception exit for thread stack
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 + tools/perf/util/cs-etm.c | 290 ++++++++++++++++-- tools/perf/util/thread-stack.c | 10 + tools/perf/util/thread-stack.h | 1 + 4 files changed, 268 insertions(+), 34 deletions(-)