New subject: [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling

3 Feb 2020


      This patch series adds support for thread stack and callchain; this is a
sequential version from v3 [1] but reorgnized the patches, some changes
have been refactored into the instruction sample fix patch set [2], and
this patch set is only to focus on thread stack and callchain support.
Patch 01 is to refactor the instruction size calculation; this patch is
used by patch 02.
Patch 02 is to add thread stack support, after applying this patch the
option '-F,+callindent' can be used by perf script tool; patch 03 is to
add branch filter thus the Perf tool can display branch samples only
for function calls and returns after enable the call indentation or call
chain related options.
Patch 04 is to synthesize call chain for the instruction samples.
Patch 05 allows the instruction sample can be handled synchronously with
the thread stack, thus it fixes an error for the callchain generation.
This patch set has been tested on Juno-r2 after applied on perf/core
branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock
to maps__insert() error case"), and this patch set is dependent on the
instruction sample fix patch set [2].
Test for option '-F,+callindent':
Before:
# perf script -F,+callindent
            main   840          1          branches: main                                 ffffa2319d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
            main   840          1          branches:                                      aaaab94cb7d0 main+0xc (/root/coresight_test/main)
            main   840          1          branches:                                      aaaab94cb808 main+0x44 (/root/coresight_test/main)
            main   840          1          branches: lib_loop_test@plt                    aaaab94cb7dc main+0x18 (/root/coresight_test/main)
            main   840          1          branches: lib_loop_test@plt                    aaaab94cb67c lib_loop_test@plt+0xc (/root/coresight_test/main)
            main   840          1          branches: _init                                aaaab94cb650 _init+0x30 (/root/coresight_test/main)
            main   840          1          branches: _dl_fixup                            ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches: _dl_lookup_symbol_x                  ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
After:
# perf script -F,+callindent main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840 main   840

1          branches:             main                                                     ffffa2319d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so) 1          branches:                 lib_loop_test@plt                                    aaaab94cb7dc main+0x18 (/root/coresight_test/main) 1          branches:                     _dl_fixup                                        ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                         _dl_lookup_symbol_x                          ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                             do_lookup_x                              ffffa249c4a4 _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                 check_match                          ffffa249bbf8 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                     strcmp                           ffffa249b890 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                 printf@plt                                           aaaab94cb7f0 main+0x2c (/root/coresight_test/main) 1          branches:                     _dl_fixup                                        ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                         _dl_lookup_symbol_x                          ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                             do_lookup_x                              ffffa249c4a4 _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                 _dl_name_match_p                     ffffa249baf8 do_lookup_x+0x138 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                     strcmp                           ffffa24a17e8 _dl_name_match_p+0x18 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                     strcmp                           ffffa24a180c _dl_name_match_p+0x3c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                 _dl_name_match_p                     ffffa249baf8 do_lookup_x+0x138 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                     strcmp                           ffffa24a17e8 _dl_name_match_p+0x18 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                     strcmp                           ffffa24a180c _dl_name_match_p+0x3c (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                 check_match                          ffffa249bbf8 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                     strcmp                           ffffa249b890 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) 1          branches:                                     strcmp                           ffffa249b968 check_match+0x148 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
Test for option '--itrace=g':
Before:
# perf script --itrace=g16l64i100
            main   840        100      instructions:  ffff8000102642c0 event_sched_in.isra.119+0x140 ([kernel.kallsyms])
            main   840        100      instructions:  ffff800010264794 flexible_sched_in+0xe4 ([kernel.kallsyms])
            main   840        100      instructions:  ffff800010263024 perf_pmu_disable+0x4 ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001026b0e0 perf_swevent_add+0xb8 ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001025b504 calc_timer_values+0x34 ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001019bd24 clocks_calc_mult_shift+0x3c ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001026556c perf_event_update_userpage+0xec ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001025c5e4 visit_groups_merge+0x194 ([kernel.kallsyms])
[...]
After:
# perf script --itrace=g16l64i100
main   840        100      instructions: 
    ffff800010264794 flexible_sched_in+0xe4 ([kernel.kallsyms])
    ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])
main   840        100      instructions: 
    ffff800010263024 perf_pmu_disable+0x4 ([kernel.kallsyms])
    ffff8000102641f0 event_sched_in.isra.119+0x70 ([kernel.kallsyms])
    ffff8000102643d8 group_sched_in+0x60 ([kernel.kallsyms])
    ffff8000102647b0 flexible_sched_in+0x100 ([kernel.kallsyms])
    ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])
main   840        100      instructions: 
    ffff80001026b0e0 perf_swevent_add+0xb8 ([kernel.kallsyms])
    ffff80001026423c event_sched_in.isra.119+0xbc ([kernel.kallsyms])
    ffff8000102643d8 group_sched_in+0x60 ([kernel.kallsyms])
    ffff8000102647b0 flexible_sched_in+0x100 ([kernel.kallsyms])
    ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])
[...]
Changes from v3:
* Splitted out separate patch set for instruction samples fixing.
* Rebased on latest perf/core branch.
Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
  (Suzuki).
* Refined commit logs.
Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
  another is for callchain support (Mathieu).
* Added a new patch to support branch filter.
[1] https://lkml.org/lkml/2019/10/5/51
[2] https://lkml.org/lkml/2020/2/2/228
Leo Yan (5):
  perf cs-etm: Refactor instruction size handling
  perf cs-etm: Support thread stack
  perf cs-etm: Support branch filter
  perf cs-etm: Support callchain for instruction sample
  perf cs-etm: Synchronize instruction sample with the thread stack
tools/perf/util/cs-etm.c | 145 ++++++++++++++++++++++++++++++++-------
 1 file changed, 121 insertions(+), 24 deletions(-)
-- 
2.17.1

[PATCH v4 0/5] perf cs-etm: Support thread stack and callchain

Signed-off-by: Leo Yan leo.yan@linaro.org

Signed-off-by: Leo Yan leo.yan@linaro.org

Signed-off-by: Leo Yan leo.yan@linaro.org

Signed-off-by: Leo Yan leo.yan@linaro.org