This patch series adds support for thread stack and callchain.
Patch 01 is to refactor the instruction size calculation and it is a preparation for patch 02.
Patch 02 is to add thread stack support, after applying this patch then the option '-F,+callindent' can be used by perf script tool; patch 03 is to add branch filter thus the perf tool can only display function calls and returns after enable the call indentation or call chain related options. Patch 04 is the patch to synthesize call chain for the instruction samples.
Patch 05 allows the instruction sample can be handled synchronously with the thread stack, thus it fixes an error for the callchain generation.
This patch set has been tested on 96boards Hikey620.
Test for option '-F,+callindent':
Before:
# perf script -F,+callindent main 2808 1 branches: coresight_test1 ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so) main 2808 1 branches: printf@plt aaaaba8d37ec main+0x28 (/root/coresight_test/main) main 2808 1 branches: printf@plt aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main) main 2808 1 branches: _init aaaaba8d3650 _init+0x30 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) [...]
After:
# perf script -F,+callindent main 2808 1 branches: coresight_test1@plt aaaaba8d37d8 main+0x14 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: do_lookup_x ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28. main 2808 1 branches: check_match ffff86369bf0 do_lookup_x+0x238 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: strcmp ffff86369888 check_match+0x70 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: printf@plt aaaaba8d37ec main+0x28 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: do_lookup_x ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28. main 2808 1 branches: _dl_name_match_p ffff86369af0 do_lookup_x+0x138 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: strcmp ffff8636f7f0 _dl_name_match_p+0x18 (/lib/aarch64-linux-gnu/ld-2.28.so) [...]
Test for option '--itrace=g':
Before:
# perf script --itrace=g16l64i100 main 1579 100 instructions: ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms]) main 1579 100 instructions: ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms]) main 1579 100 instructions: ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms]) main 1579 100 instructions: ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms]) main 1579 100 instructions: ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms]) [...]
After:
# perf script --itrace=g16l64i100
main 1579 100 instructions: ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
main 1579 100 instructions: ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
main 1579 100 instructions: ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms]) ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms]) [...]
Changes from v1: * Added comments for task thread handling (Mathieu). * Split patch 02 into two patches, one is for support thread stack and another is for callchain support (Mathieu). * Added a new patch to support branch filter.
Leo Yan (5): perf cs-etm: Refactor instruction size handling perf cs-etm: Support thread stack perf cs-etm: Support branch filter perf cs-etm: Support callchain for instruction sample perf cs-etm: Correct callchain for instruction sample
tools/perf/util/cs-etm.c | 141 ++++++++++++++++++++++++++++++++------- 1 file changed, 118 insertions(+), 23 deletions(-)
In cs-etm.c there have several functions need to know instruction size based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn() these two functions both calculate the instruction size separately. Furthermore, if we consider to add new features later which also might require to calculate instruction size.
For this reason, this patch refactors the code to introduce a new function cs_etm__instr_size(), it will be a central place to calculate the instruction size based on ISA type and instruction address.
For a neat implementation, cs_etm__instr_addr() will always execute the loop without checking ISA type, this allows cs_etm__instr_size() and cs_etm__instr_addr() have no any duplicate code with each other and both functions can be changed independently later without breaking anything. As a side effect, cs_etm__instr_addr() will do a few more iterations for A32/A64 instructions, this would be fine if consider perf tool runs in the user space.
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/cs-etm.c | 48 +++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f87b9c1c9f9a..1de3f9361193 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -917,6 +917,26 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2; }
+static inline int cs_etm__instr_size(struct cs_etm_queue *etmq, + u8 trace_chan_id, + enum cs_etm_isa isa, + u64 addr) +{ + int insn_len; + + /* + * T32 instruction size might be 32-bit or 16-bit, decide by calling + * cs_etm__t32_instr_size(). + */ + if (isa == CS_ETM_ISA_T32) + insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, addr); + /* Otherwise, A64 and A32 instruction size are always 32-bit. */ + else + insn_len = 4; + + return insn_len; +} + static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) { /* Returns 0 for the CS_ETM_DISCONTINUITY packet */ @@ -941,19 +961,15 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, const struct cs_etm_packet *packet, u64 offset) { - if (packet->isa == CS_ETM_ISA_T32) { - u64 addr = packet->start_addr; + u64 addr = packet->start_addr;
- while (offset > 0) { - addr += cs_etm__t32_instr_size(etmq, - trace_chan_id, addr); - offset--; - } - return addr; + while (offset > 0) { + addr += cs_etm__instr_size(etmq, trace_chan_id, + packet->isa, addr); + offset--; }
- /* Assume a 4 byte instruction size (A32/A64) */ - return packet->start_addr + offset * 4; + return addr; }
static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, @@ -1093,16 +1109,8 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq, return; }
- /* - * T32 instruction size might be 32-bit or 16-bit, decide by calling - * cs_etm__t32_instr_size(). - */ - if (packet->isa == CS_ETM_ISA_T32) - sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, - sample->ip); - /* Otherwise, A64 and A32 instruction size are always 32-bit. */ - else - sample->insn_len = 4; + sample->insn_len = cs_etm__instr_size(etmq, trace_chan_id, + packet->isa, sample->ip);
cs_etm__mem_access(etmq, trace_chan_id, sample->ip, sample->insn_len, (void *)sample->insn);
Hi Leo,
On 23/09/2019 17:07, Leo Yan wrote:
In cs-etm.c there have several functions need to know instruction size based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn() these two functions both calculate the instruction size separately. Furthermore, if we consider to add new features later which also might require to calculate instruction size.
For this reason, this patch refactors the code to introduce a new function cs_etm__instr_size(), it will be a central place to calculate the instruction size based on ISA type and instruction address.
For a neat implementation, cs_etm__instr_addr() will always execute the loop without checking ISA type, this allows cs_etm__instr_size() and cs_etm__instr_addr() have no any duplicate code with each other and both functions can be changed independently later without breaking anything. As a side effect, cs_etm__instr_addr() will do a few more iterations for A32/A64 instructions, this would be fine if consider perf tool runs in the user space.
Signed-off-by: Leo Yan leo.yan@linaro.org
Your changes look fine to me. However, please see my comment below.
tools/perf/util/cs-etm.c | 48 +++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f87b9c1c9f9a..1de3f9361193 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -917,6 +917,26 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2; } +static inline int cs_etm__instr_size(struct cs_etm_queue *etmq,
u8 trace_chan_id,
enum cs_etm_isa isa,
u64 addr)
+{
- int insn_len;
- /*
* T32 instruction size might be 32-bit or 16-bit, decide by calling
* cs_etm__t32_instr_size().
*/
- if (isa == CS_ETM_ISA_T32)
insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, addr);
- /* Otherwise, A64 and A32 instruction size are always 32-bit. */
- else
insn_len = 4;
- return insn_len;
+}
- static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) { /* Returns 0 for the CS_ETM_DISCONTINUITY packet */
@@ -941,19 +961,15 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, const struct cs_etm_packet *packet, u64 offset) {
- if (packet->isa == CS_ETM_ISA_T32) {
u64 addr = packet->start_addr;
- u64 addr = packet->start_addr;
while (offset > 0) {
addr += cs_etm__t32_instr_size(etmq,
trace_chan_id, addr);
offset--;
}
return addr;
- while (offset > 0) {
Given that offset is u64, the check above is not appropriate. You could either change it to : while (offset) // if you are sure (s64)offset always is a postive integer and we always reduce it by 1.
Otherwise you may switch the offset to a signed type. I understand that this is not introduced by your changes. But you may fix that up in a separate patch.
Kind regards Suzuki
Hi Suzuki,
On Mon, Sep 23, 2019 at 05:51:04PM +0100, Suzuki K Poulose wrote:
Hi Leo,
On 23/09/2019 17:07, Leo Yan wrote:
In cs-etm.c there have several functions need to know instruction size based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn() these two functions both calculate the instruction size separately. Furthermore, if we consider to add new features later which also might require to calculate instruction size.
For this reason, this patch refactors the code to introduce a new function cs_etm__instr_size(), it will be a central place to calculate the instruction size based on ISA type and instruction address.
For a neat implementation, cs_etm__instr_addr() will always execute the loop without checking ISA type, this allows cs_etm__instr_size() and cs_etm__instr_addr() have no any duplicate code with each other and both functions can be changed independently later without breaking anything. As a side effect, cs_etm__instr_addr() will do a few more iterations for A32/A64 instructions, this would be fine if consider perf tool runs in the user space.
Signed-off-by: Leo Yan leo.yan@linaro.org
Your changes look fine to me. However, please see my comment below.
tools/perf/util/cs-etm.c | 48 +++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f87b9c1c9f9a..1de3f9361193 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -917,6 +917,26 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2; } +static inline int cs_etm__instr_size(struct cs_etm_queue *etmq,
u8 trace_chan_id,
enum cs_etm_isa isa,
u64 addr)
+{
- int insn_len;
- /*
* T32 instruction size might be 32-bit or 16-bit, decide by calling
* cs_etm__t32_instr_size().
*/
- if (isa == CS_ETM_ISA_T32)
insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, addr);
- /* Otherwise, A64 and A32 instruction size are always 32-bit. */
- else
insn_len = 4;
- return insn_len;
+}
- static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) { /* Returns 0 for the CS_ETM_DISCONTINUITY packet */
@@ -941,19 +961,15 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, const struct cs_etm_packet *packet, u64 offset) {
- if (packet->isa == CS_ETM_ISA_T32) {
u64 addr = packet->start_addr;
- u64 addr = packet->start_addr;
while (offset > 0) {
addr += cs_etm__t32_instr_size(etmq,
trace_chan_id, addr);
offset--;
}
return addr;
- while (offset > 0) {
Given that offset is u64, the check above is not appropriate. You could either change it to : while (offset) // if you are sure (s64)offset always is a postive integer and we always reduce it by 1.
Otherwise you may switch the offset to a signed type. I understand that this is not introduced by your changes. But you may fix that up in a separate patch.
Thanks a lot for the review. Seems to me the reliable fix is to change to a signed type. Will add this fix in next spin.
Thanks, Leo Yan
Arm CoreSight doesn't support thread stack, thus the decoding cannot display the symbol with indented spaces to reflect the stack depth.
This patch adds support thread stack, this allows 'perf script' to support option '-F,+callindent'.
Before:
# perf script -F,+callindent main 2808 1 branches: coresight_test1 ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so) main 2808 1 branches: printf@plt aaaaba8d37ec main+0x28 (/root/coresight_test/main) main 2808 1 branches: printf@plt aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main) main 2808 1 branches: _init aaaaba8d3650 _init+0x30 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) [...]
After:
# perf script -F,+callindent main 2808 1 branches: coresight_test1 ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so) main 2808 1 branches: printf@plt aaaaba8d37ec main+0x28 (/root/coresight_test/main) main 2808 1 branches: printf@plt aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main) main 2808 1 branches: _init aaaaba8d3650 _init+0x30 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) [...]
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 1de3f9361193..6bdc9cd8293c 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1116,6 +1116,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq, sample->insn_len, (void *)sample->insn); }
+static void cs_etm__add_stack_event(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) +{ + struct cs_etm_auxtrace *etm = etmq->etm; + u8 trace_chan_id = tidq->trace_chan_id; + int insn_len; + u64 from_ip, to_ip; + + if (etm->synth_opts.thread_stack) { + from_ip = cs_etm__last_executed_instr(tidq->prev_packet); + to_ip = cs_etm__first_executed_instr(tidq->packet); + + insn_len = cs_etm__instr_size(etmq, trace_chan_id, + tidq->prev_packet->isa, from_ip); + + /* + * Create thread stacks by keeping track of calls and returns; + * any call pushes thread stack, return pops the stack, and + * flush stack when the trace is discontinuous. + */ + thread_stack__event(tidq->thread, tidq->prev_packet->cpu, + tidq->prev_packet->flags, + from_ip, to_ip, insn_len, + etmq->buffer->buffer_nr); + } else { + /* + * The thread stack can be output via thread_stack__process(); + * thus the detailed information about paired calls and returns + * will be facilitated by Python script for the db-export. + * + * Need to set trace buffer number and flush thread stack if the + * trace buffer number has been alternate. + */ + thread_stack__set_trace_nr(tidq->thread, + tidq->prev_packet->cpu, + etmq->buffer->buffer_nr); + } +} + static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 addr, u64 period) @@ -1392,6 +1431,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, tidq->period_instructions = instrs_over; }
+ if (tidq->prev_packet->last_instr_taken_branch) + cs_etm__add_stack_event(etmq, tidq); + if (etm->sample_branches) { bool generate_sample = false;
@@ -2592,6 +2634,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event, itrace_synth_opts__set_default(&etm->synth_opts, session->itrace_synth_opts->default_no_sample); etm->synth_opts.callchain = false; + etm->synth_opts.thread_stack = + session->itrace_synth_opts->thread_stack; }
err = cs_etm__synth_events(etm, session);
If user specifies options -F,+callindent or call chain related options, it means the user only cares about functions calls and returns; thus in this case it's pointless to generate samples for other types of branches.
To output only pairs of calls and returns, this patch introduces branch filter and the filter is set according to synthetic options. Finally, perf can output only for calls and returns and without redundant branches.
Before:
# perf script -F,+callindent main 2808 1 branches: coresight_test1@plt aaaaba8d37d8 main+0x14 (/root/coresight_test/main) main 2808 1 branches: coresight_test1@plt aaaaba8d367c coresight_test1@plt+0xc (/root/coresight_test/main) main 2808 1 branches: _init aaaaba8d3650 _init+0x30 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s [...]
After:
# perf script -F,+callindent main 2808 1 branches: coresight_test1@plt aaaaba8d37d8 main+0x14 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: do_lookup_x ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28. main 2808 1 branches: check_match ffff86369bf0 do_lookup_x+0x238 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: strcmp ffff86369888 check_match+0x70 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: printf@plt aaaaba8d37ec main+0x28 (/root/coresight_test/main) main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: do_lookup_x ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28. main 2808 1 branches: _dl_name_match_p ffff86369af0 do_lookup_x+0x138 (/lib/aarch64-linux-gnu/ld-2.28.so) main 2808 1 branches: strcmp ffff8636f7f0 _dl_name_match_p+0x18 (/lib/aarch64-linux-gnu/ld-2.28.so) [...]
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/cs-etm.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 6bdc9cd8293c..018c7e682ded 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -55,6 +55,7 @@ struct cs_etm_auxtrace {
int num_cpu; u32 auxtrace_type; + u32 branches_filter; u64 branches_sample_type; u64 branches_id; u64 instructions_sample_type; @@ -1222,6 +1223,10 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, } dummy_bs; u64 ip;
+ if (etm->branches_filter && + !(etm->branches_filter & tidq->prev_packet->flags)) + return 0; + ip = cs_etm__last_executed_instr(tidq->prev_packet);
event->sample.header.type = PERF_RECORD_SAMPLE; @@ -2638,6 +2643,13 @@ int cs_etm__process_auxtrace_info(union perf_event *event, session->itrace_synth_opts->thread_stack; }
+ if (etm->synth_opts.calls) + etm->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC | + PERF_IP_FLAG_TRACE_END; + if (etm->synth_opts.returns) + etm->branches_filter |= PERF_IP_FLAG_RETURN | + PERF_IP_FLAG_TRACE_BEGIN; + err = cs_etm__synth_events(etm, session); if (err) goto err_delete_thread;
CoreSight has supported the thread stack; so based on the thread stack we can synthesize call chain for the instruction sample; the call chain can be used by itrace option '--itrace=g'.
Before:
# perf script --itrace=g16l64i100 main 1579 100 instructions: ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms]) main 1579 100 instructions: ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms]) main 1579 100 instructions: ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms]) main 1579 100 instructions: ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms]) main 1579 100 instructions: ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms]) [...]
After:
# perf script --itrace=g16l64i100
main 1579 100 instructions: ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
main 1579 100 instructions: ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
main 1579 100 instructions: ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms]) ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms]) [...]
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/cs-etm.c | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 018c7e682ded..bd09254a7208 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -17,6 +17,7 @@ #include <stdlib.h>
#include "auxtrace.h" +#include "callchain.h" #include "color.h" #include "cs-etm.h" #include "cs-etm-decoder/cs-etm-decoder.h" @@ -73,6 +74,7 @@ struct cs_etm_traceid_queue { size_t last_branch_pos; union perf_event *event_buf; struct thread *thread; + struct ip_callchain *chain; struct branch_stack *last_branch; struct branch_stack *last_branch_rb; struct cs_etm_packet *prev_packet; @@ -250,6 +252,16 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, if (!tidq->prev_packet) goto out_free;
+ if (etm->synth_opts.callchain) { + size_t sz = sizeof(struct ip_callchain); + + /* Add 1 to callchain_sz for callchain context */ + sz += (etm->synth_opts.callchain_sz + 1) * sizeof(u64); + tidq->chain = zalloc(sz); + if (!tidq->chain) + goto out_free; + } + if (etm->synth_opts.last_branch) { size_t sz = sizeof(struct branch_stack);
@@ -274,6 +286,7 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq, zfree(&tidq->last_branch); zfree(&tidq->prev_packet); zfree(&tidq->packet); + zfree(&tidq->chain); out: return rc; } @@ -545,6 +558,7 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq) zfree(&tidq->last_branch_rb); zfree(&tidq->prev_packet); zfree(&tidq->packet); + zfree(&tidq->chain); zfree(&tidq);
/* @@ -1125,7 +1139,7 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq, int insn_len; u64 from_ip, to_ip;
- if (etm->synth_opts.thread_stack) { + if (etm->synth_opts.callchain || etm->synth_opts.thread_stack) { from_ip = cs_etm__last_executed_instr(tidq->prev_packet); to_ip = cs_etm__first_executed_instr(tidq->packet);
@@ -1181,6 +1195,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
+ if (etm->synth_opts.callchain) { + thread_stack__sample(tidq->thread, tidq->packet->cpu, + tidq->chain, + etm->synth_opts.callchain_sz + 1, + sample.ip, etm->kernel_start); + sample.callchain = tidq->chain; + } + if (etm->synth_opts.last_branch) { cs_etm__copy_last_branch_rb(etmq, tidq); sample.branch_stack = tidq->last_branch; @@ -1368,6 +1390,8 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR; }
+ if (etm->synth_opts.callchain) + attr.sample_type |= PERF_SAMPLE_CALLCHAIN; if (etm->synth_opts.last_branch) attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
@@ -2638,7 +2662,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event, } else { itrace_synth_opts__set_default(&etm->synth_opts, session->itrace_synth_opts->default_no_sample); - etm->synth_opts.callchain = false; etm->synth_opts.thread_stack = session->itrace_synth_opts->thread_stack; } @@ -2650,6 +2673,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event, etm->branches_filter |= PERF_IP_FLAG_RETURN | PERF_IP_FLAG_TRACE_BEGIN;
+ if (etm->synth_opts.callchain && !symbol_conf.use_callchain) { + symbol_conf.use_callchain = true; + if (callchain_register_param(&callchain_param) < 0) { + symbol_conf.use_callchain = false; + etm->synth_opts.callchain = false; + } + } + err = cs_etm__synth_events(etm, session); if (err) goto err_delete_thread;
The synthesized flow use 'tidq->packet' for instruction samples, comparing against the thread stack and the branch samples which are uses the 'tidp->prev_packet', thus the instruction samples result in using an packet ahead than thread stack and branch samples.
This leads to an instruction's callchain error as shows in below example:
main 1579 100 instructions: ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms]) ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms]) ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms]) ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
In the callchain log, for the two continuous lines the up line contains one child function info and the followed line contains the caller function info, and so forth. But the first two lines:
perf_event_update_userpage+0x4c => the sampled instruction perf_event_update_userpage+0x48 => the parent function's calling
The child function and parent function both are the same function perf_event_update_userpage(), but this isn't a recursive function, thus the sequence for perf_event_update_userpage() calling itself shouldn't never happen. This callchain error is caused by the instruction sample using an ahead packet than the thread stack, the thread stack is deferred to process this packet and missed to pop stack if this is a return packet.
To fix this issue, we can simply change to use 'tidq->prev_packet' to generate the instruction samples, this allows the thread stack to push and pop synchronously with instruction sample. Finally, the callchain is displayed as below:
main 1579 100 instructions: ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms]) ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms]) ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms]) ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms]) ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms]) ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/cs-etm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index bd09254a7208..3f7edfd15124 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1418,7 +1418,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, struct cs_etm_packet *tmp; int ret; u8 trace_chan_id = tidq->trace_chan_id; - u64 instrs_executed = tidq->packet->instr_count; + u64 instrs_executed = tidq->prev_packet->instr_count;
tidq->period_instructions += instrs_executed;
@@ -1449,7 +1449,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, */ u64 offset = (instrs_executed - instrs_over - 1); u64 addr = cs_etm__instr_addr(etmq, trace_chan_id, - tidq->packet, offset); + tidq->prev_packet, offset);
ret = cs_etm__synth_instruction_sample( etmq, tidq, addr, etm->instructions_sample_period);