When a branch instruction is to be executed, if the branch target address is not mapped into the virtual address space, this branch instruction will trigger an exception with data abort. For this case, CoreSight decoding flow cannot reflect the complete branch flow prior to exception, and leads the user space addresses inconsistency before and after the exception handling.
Let's see the detailed explanation for the issue with an example:
Packet 0: range packet start_addr=0xffffad8018a4 end_addr=0xffffad8018ec Packet 1: exception packet start_addr=0xffffad8018a4 end_addr=0xffffad801910 Packet 2: range packet start_addr=0xffff800010081c00 end_addr=0xffff800010081c18
There have three packets are coming; from packet 0 to packet 1, CPU tries to branch from 0xffffad8018ec-4 to 0xffffad801910, accessing the address 0xffffad801910 causes the data abort, so this branch is not taken and an exception is triggered and jump to 0xffff800010081c00 in packet 2.
When handle this sequence, it misses a range packet for the branch between 0xffffad8018ec-4 and 0xffffad801910, so Perf tool cannot generate a branch sample for it and this might introduce confusion for the addresses before and after exception handling, since we can see the exception return address is 0xffffad801910, which is not a sequential value for the address 0xffffad8018ec-4 before exception was taken.
0xffffad8018ec-4 -> 0xffff800010081c00: exception is taken ... ... exception return back -> 0xffffad801910
To fix this issue, firstly we need to decide which conditions can be used to distinguish that a branch triggers an exception. So below conditions are used to make decision:
- Check if the exception is a trap by comparing the specific sample flag for the exception packet; - The exception packet's end address is not same with its previous range packet's end address, which implies a branch triggering the exception and the branch target address is contained in the exception packet's end address.
This patch changes the exception packet to a 'fake' range packet; this allows to generate an extra branch sample for the branch instruction prior to the exception (between 0xffffad8018ec-4 and 0xffffad801910). So finally can get below samples:
0xffffad8018ec-4 -> 0xffffad801910: branch 0xffffad801910 -> 0xffff800010081c00: exception is taken ... ... exception return back -> 0xffffad801910
Note, this 'fake' range packet will add an extra recording for last branch array and change the thread stack pushing and popping (if later supported). But since 'fake' range packet's instruction length is set to zero, it doesn't introduce any change for instruction samples.
Before:
# perf script -F,+flags
main 3258 1 branches: int ffffad8018e8 dl_main+0x820 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) => ffff800010081c00 vectors+0x400 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010081c20 vectors+0x420 ([kernel.kallsyms]) => ffff800010082bc0 el0_sync+0x0 ([kernel.kallsyms]) main 3258 1 branches: jcc ffff800010082c8c el0_sync+0xcc ([kernel.kallsyms]) => ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) => ffff800010082ccc el0_sync+0x10c ([kernel.kallsyms]) [...] main 3258 1 branches: jcc ffff800010083574 finish_ret_to_user+0x34 ([kernel.kallsyms]) => ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) => ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) => ffff8000100835c4 finish_ret_to_user+0x84 ([kernel.kallsyms]) main 3258 1 branches: iret ffff800010083610 finish_ret_to_user+0xd0 ([kernel.kallsyms]) => ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
After:
# perf script -F,+flags
main 3258 1 branches: jmp ffffad8018e8 dl_main+0x820 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) => ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) main 3258 1 branches: int ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) => ffff800010081c00 vectors+0x400 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010081c20 vectors+0x420 ([kernel.kallsyms]) => ffff800010082bc0 el0_sync+0x0 ([kernel.kallsyms]) main 3258 1 branches: jcc ffff800010082c8c el0_sync+0xcc ([kernel.kallsyms]) => ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) => ffff800010082ccc el0_sync+0x10c ([kernel.kallsyms]) [...] main 3258 1 branches: jcc ffff800010083574 finish_ret_to_user+0x34 ([kernel.kallsyms]) => ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) => ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) main 3258 1 branches: jmp ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) => ffff8000100835c4 finish_ret_to_user+0x84 ([kernel.kallsyms]) main 3258 1 branches: iret ffff800010083610 finish_ret_to_user+0xd0 ([kernel.kallsyms]) => ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
Suggested-by: Mike Leach mike.leach@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org --- .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 + tools/perf/util/cs-etm.c | 66 ++++++++++++++++++- 2 files changed, 65 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index cd92a99eb89d..f1f66d883391 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -482,6 +482,7 @@ cs_etm_decoder__buffer_exception(struct cs_etm_packet_queue *queue,
packet = &queue->packet_buffer[queue->tail]; packet->exception_number = elem->exception_number; + packet->end_addr = elem->en_addr;
return ret; } diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 48932a7a933f..7cf30b5e0e20 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1477,8 +1477,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, return 0; }
-static int cs_etm__exception(struct cs_etm_traceid_queue *tidq) +static int cs_etm__exception(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq) { + u32 flags; + /* * Usually the exception packet follows a range packet, if it's not the * case, directly bail out. @@ -1486,6 +1489,65 @@ static int cs_etm__exception(struct cs_etm_traceid_queue *tidq) if (tidq->prev_packet->sample_type != CS_ETM_RANGE) return 0;
+ /* + * If the exception is a trap and its end_addr is not same with its + * previous range packet's end_addr, this implies the exception is + * triggered by a branch and the exception packet's end_addr is the + * branch target address from the previous range packet. + * + * Below is an example with three packets: + * Packet 0: range packet + * start_addr=0xffffad8018a4 end_addr=0xffffad8018ec + * Packet 1: exception packet + * start_addr=0xffffad8018a4 end_addr=0xffffad801910 + * Packet 2: range packet + * start_addr=0xffff800010081c00 end_addr=0xffff800010081c18 + * + * CPU tries to branch from 0xffffad8018ec-4 (packet 0) to + * 0xffffad801910 (packet 1), accessing the address 0xffffad801910 + * causes data abort, so the branch is not taken and an exception is + * triggered and jump to 0xffff800010081c00 (packet 2). + * + * For this case, it misses a range packet for the branch between + * 0xffffad8018ec-4 and 0xffffad801910, so perf tool cannot generate + * branch sample and introduces confusion for exception return parsing: + * + * 0xffffad8018ec-4 -> 0xffff800010081c00: exception is taken + * ... exception return back ... -> 0xffffad801910 + * + * To fix this issue, the exception packet is changed to a 'fake' + * range packet. This can allow to generate a branch sample between + * 0xffffad8018ec-4 and 0xffffad801910. Finally get below samples: + * + * 0xffffad8018ec-4 -> 0xffffad801910: branch + * 0xffffad801910 -> 0xffff800010081c00: exception is taken + * ... exception return back ... -> 0xffffad801910 + */ + + /* Use flags to check if the exception is trap */ + flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | + PERF_IP_FLAG_INTERRUPT; + + if (tidq->packet->sample_type == CS_ETM_EXCEPTION && + tidq->packet->flags == flags && + tidq->packet->end_addr != tidq->prev_packet->end_addr) { + /* + * Change the exception packet to a range packet, so can reflect + * branch from prev_packet::end_addr-4 to packet::start_addr; + * + * This branch is not taken yet, so set its instruction count + * to zero. Set 'last_instr_taken_branch' to true, so allow + * it to generate samples with its seqential range packet. + */ + tidq->packet->sample_type = CS_ETM_RANGE; + tidq->packet->start_addr = tidq->packet->end_addr; + tidq->packet->instr_count = 0; + tidq->packet->last_instr_taken_branch = true; + + /* Generate sample with the previous range packet */ + return cs_etm__sample(etmq, tidq); + } + /* * When the exception packet is inserted, whether the last instruction * in previous range packet is taken branch or not, we need to force @@ -2045,7 +2107,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, * make sure the previous instruction * range packet to be handled properly. */ - cs_etm__exception(tidq); + cs_etm__exception(etmq, tidq); break; case CS_ETM_DISCONTINUITY: /*