This patch series adds support for thread stack and callchain; this is a sequential version from v3 [1] but reorgnized the patches, some changes have been refactored into the instruction sample fix patch set [2], and this patch set is only to focus on thread stack and callchain support.
Patch 01 is to refactor the instruction size calculation; this patch is used by patch 02.
Patch 02 is to add thread stack support, after applying this patch the option '-F,+callindent' can be used by perf script tool; patch 03 is to add branch filter thus the Perf tool can display branch samples only for function calls and returns after enable the call indentation or call chain related options.
Patch 04 is to synthesize call chain for the instruction samples.
Patch 05 allows the instruction sample can be handled synchronously with the thread stack, thus it fixes an error for the callchain generation.
This patch set has been tested on Juno-r2 after applied on perf/core branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock to maps__insert() error case"), and this patch set is dependent on the instruction sample fix patch set [2].
Test for option '-F,+callindent':
Before:
# perf script -F,+callindent main 840 1 branches: main ffffa2319d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so) main 840 1 branches: aaaab94cb7d0 main+0xc (/root/coresight_test/main) main 840 1 branches: aaaab94cb808 main+0x44 (/root/coresight_test/main) main 840 1 branches: lib_loop_test@plt aaaab94cb7dc main+0x18 (/root/coresight_test/main) main 840 1 branches: lib_loop_test@plt aaaab94cb67c lib_loop_test@plt+0xc (/root/coresight_test/main) main 840 1 branches: _init aaaab94cb650 _init+0x30 (/root/coresight_test/main) main 840 1 branches: _dl_fixup ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: _dl_lookup_symbol_x ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
After:
# perf script -F,+callindent main 840 1 branches: main ffffa2319d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so) main 840 1 branches: lib_loop_test@plt aaaab94cb7dc main+0x18 (/root/coresight_test/main) main 840 1 branches: _dl_fixup ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: _dl_lookup_symbol_x ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: do_lookup_x ffffa249c4a4 _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: check_match ffffa249bbf8 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: strcmp ffffa249b890 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: printf@plt aaaab94cb7f0 main+0x2c (/root/coresight_test/main) main 840 1 branches: _dl_fixup ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: _dl_lookup_symbol_x ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: do_lookup_x ffffa249c4a4 _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: _dl_name_match_p ffffa249baf8 do_lookup_x+0x138 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: strcmp ffffa24a17e8 _dl_name_match_p+0x18 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: strcmp ffffa24a180c _dl_name_match_p+0x3c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: _dl_name_match_p ffffa249baf8 do_lookup_x+0x138 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: strcmp ffffa24a17e8 _dl_name_match_p+0x18 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: strcmp ffffa24a180c _dl_name_match_p+0x3c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: check_match ffffa249bbf8 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: strcmp ffffa249b890 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 840 1 branches: strcmp ffffa249b968 check_match+0x148 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
Test for option '--itrace=g':
Before:
# perf script --itrace=g16l64i100 main 840 100 instructions: ffff8000102642c0 event_sched_in.isra.119+0x140 ([kernel.kallsyms]) main 840 100 instructions: ffff800010264794 flexible_sched_in+0xe4 ([kernel.kallsyms]) main 840 100 instructions: ffff800010263024 perf_pmu_disable+0x4 ([kernel.kallsyms]) main 840 100 instructions: ffff80001026b0e0 perf_swevent_add+0xb8 ([kernel.kallsyms]) main 840 100 instructions: ffff80001025b504 calc_timer_values+0x34 ([kernel.kallsyms]) main 840 100 instructions: ffff80001019bd24 clocks_calc_mult_shift+0x3c ([kernel.kallsyms]) main 840 100 instructions: ffff80001026556c perf_event_update_userpage+0xec ([kernel.kallsyms]) main 840 100 instructions: ffff80001025c5e4 visit_groups_merge+0x194 ([kernel.kallsyms])
[...]
After:
# perf script --itrace=g16l64i100
main 840 100 instructions: ffff800010264794 flexible_sched_in+0xe4 ([kernel.kallsyms]) ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])
main 840 100 instructions: ffff800010263024 perf_pmu_disable+0x4 ([kernel.kallsyms]) ffff8000102641f0 event_sched_in.isra.119+0x70 ([kernel.kallsyms]) ffff8000102643d8 group_sched_in+0x60 ([kernel.kallsyms]) ffff8000102647b0 flexible_sched_in+0x100 ([kernel.kallsyms]) ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])
main 840 100 instructions: ffff80001026b0e0 perf_swevent_add+0xb8 ([kernel.kallsyms]) ffff80001026423c event_sched_in.isra.119+0xbc ([kernel.kallsyms]) ffff8000102643d8 group_sched_in+0x60 ([kernel.kallsyms]) ffff8000102647b0 flexible_sched_in+0x100 ([kernel.kallsyms]) ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])
[...]
Changes from v3: * Splitted out separate patch set for instruction samples fixing. * Rebased on latest perf/core branch.
Changes from v2: * Added patch 01 to fix the unsigned variable comparison to zero (Suzuki). * Refined commit logs.
Changes from v1: * Added comments for task thread handling (Mathieu). * Split patch 02 into two patches, one is for support thread stack and another is for callchain support (Mathieu). * Added a new patch to support branch filter.
[1] https://lkml.org/lkml/2019/10/5/51 [2] https://lkml.org/lkml/2020/2/2/228
Leo Yan (5): perf cs-etm: Refactor instruction size handling perf cs-etm: Support thread stack perf cs-etm: Support branch filter perf cs-etm: Support callchain for instruction sample perf cs-etm: Synchronize instruction sample with the thread stack
tools/perf/util/cs-etm.c | 145 ++++++++++++++++++++++++++++++++------- 1 file changed, 121 insertions(+), 24 deletions(-)