On 13 November 2017 at 08:11, Robert Walker robert.walker@arm.com wrote:
This reverts commit d3fa0f70b7e80dbd9317d25b2dc028a093e0f5b7.
A previous commit changed the order branches were recorded in so that they are recorded in chronological order. This results in the branch stack generated by perf inject containing the oldest branch first and the newest branch last.
However, this does not match the behaviour of LBR (perf record -b) or the results of perf inject with Intel-PT, which show the newest branch at the top of the stack and the oldest at the bottom.
For example, with the following source:
#include <stdio.h> int f3(int a) { return a ^ 0x55; } int f2(int a) { int b; b = f3(a); return b; } int f1(int a) { int b; b = f2(a); return b; } int main(int argc, char *argv[]) { int a = 0xDEADBEEF; int i; for (i = 0; i < 1000000; ++i) a = f1(a); printf("%x\n", a); return 0; }
LBR produces the following branch stacks (annotated with symbol addresses):
12714186730559108 [main+12714186730158b22] 0x10a0 [0x1b0]: PERF_RECORD_SAMPLE(IP, 0x4002): 25772/25772: 0x4005c4 [f2+18] period: 63741 addr: 0 ... branch stack: nr:16 ..... 0: 00000000004005ab [f3+e] -> 00000000004005c1 [f2+15] 0 cycles P 0 ..... 1: 00000000004005bc [f2+10] -> 000000000040059d [f3] 0 cycles P 0 ..... 2: 00000000004005d9 [f1+10] -> 00000000004005ac [f2] 0 cycles P 0 ..... 3: 000000000040060a [main+24] -> 00000000004005c9 [f1] 0 cycles P 0 ..... 4: 000000000040061d [main+37] -> 0000000000400605 [main+1f] 0 cycles P 0 ..... 5: 00000000004005e5 [f1+1c] -> 000000000040060f [main+29] 0 cycles P 0 ..... 6: 00000000004005c8 [f2+1c] -> 00000000004005de [f1+15] 0 cycles P 0 ..... 7: 00000000004005ab [f3+e] -> 00000000004005c1 [f2+15] 0 cycles P 0 ..... 8: 00000000004005bc [f2+10] -> 000000000040059d [f3] 0 cycles P 0 ..... 9: 00000000004005d9 [f1+10] -> 00000000004005ac [f2] 0 cycles P 0 ..... 10: 000000000040060a [main+24] -> 00000000004005c9 [f1] 0 cycles P 0 ..... 11: 000000000040061d [main+37] -> 0000000000400605 [main+1f] 0 cycles P 0 ..... 12: 00000000004005e5 [f1+1c] -> 000000000040060f [main+29] 0 cycles P 0 ..... 13: 00000000004005c8 [f2+1c] -> 00000000004005de [f1+15] 0 cycles P 0 ..... 14: 00000000004005ab [f3+e] -> 00000000004005c1 [f2+15] 0 cycles P 0 ..... 15: 00000000004005bc [f2+10] -> 000000000040059d [f3] 0 cycles P 0
Where the stack shows the call from main() -> f1() -> f2() -> f3(), then return to f2().
By restoring the previous behaviour, the branch stacks generated from ETM trace have the same ordering.
I'm in agreement with Robert's assessment. The first pastebin [1] is a snapshot from x86, the second [2] is a snapshot on ARM64 _without_ Robert's revert and the third [3] is another ARM64 snapshot with the revert. In this scenario it is clear that results with the revert do match what we get on Intel. Sebastian, can you share details on the scenario you were working with when implementing your patch?
Thanks, Mathieu
[1]. https://pastebin.com/kkPhg2sx [2]. https://pastebin.com/EZrYpFjJ [3]. https://pastebin.com/gea5JXBS
Signed-off-by: Robert Walker robert.walker@arm.com
tools/perf/util/cs-etm.c | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index b1095a5..24783b7 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -580,28 +580,32 @@ static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq) return;
/*
* If we wrapped around at least once, the branches from last_branch_pos
* element to last_branch_sz are older valid branches: copy them over.
* As bs_src->entries is a circular buffer, we need to copy from it in
* two steps. First, copy the branches from the most recently inserted
* branch ->last_branch_pos until the end of bs_src->entries buffer. */
if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
nr = etmq->etm->synth_opts.last_branch_sz
- etmq->last_branch_pos - 1;
memcpy(&bs_dst->entries[0],
&bs_src->entries[etmq->last_branch_pos + 1],
sizeof(struct branch_entry) * nr);
}
nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
memcpy(&bs_dst->entries[0],
&bs_src->entries[etmq->last_branch_pos],
sizeof(struct branch_entry) * nr); /*
* Copy the branches from the most recently inserted branches from 0
* to last_branch_pos included.
* If we wrapped around at least once, the branches from the beginning
* of the bs_src->entries buffer and until the ->last_branch_pos element
* are older valid branches: copy them over. The total number of
* branches copied over will be equal to the number of branches asked by
* the user in last_branch_sz. */
memcpy(&bs_dst->entries[nr], &bs_src->entries[0],
sizeof(struct branch_entry) * (etmq->last_branch_pos + 1));
if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
memcpy(&bs_dst->entries[nr],
&bs_src->entries[0],
sizeof(struct branch_entry) * etmq->last_branch_pos);
}
}
static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) {
etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz - 1;
etmq->last_branch_pos = 0; etmq->last_branch_rb->nr = 0;
}
@@ -611,14 +615,15 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) struct branch_entry *be;
/*
* Record branches in a circular buffer in chronological order. After
* writing the last element of the stack, move the insert position back
* to the beginning of the buffer.
* The branches are recorded in a circular buffer in reverse
* chronological order: we start recording from the last element of the
* buffer down. After writing the first element of the stack, move the
* insert position back to the end of the buffer. */
if (etmq->last_branch_pos == etmq->etm->synth_opts.last_branch_sz - 1)
etmq->last_branch_pos = 0;
else
etmq->last_branch_pos += 1;
if (!etmq->last_branch_pos)
etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
etmq->last_branch_pos -= 1; be = &bs->entries[etmq->last_branch_pos];
-- 1.9.1
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight