On Tue, Sep 19, 2017 at 3:16 PM, Dehao Chen <dehao@google.com> wrote:


On Tue, Sep 19, 2017 at 1:02 PM, Sebastian Pop <sebpop@gmail.com> wrote:
By popular demand, I started debugging this problem again.

With the two patches that I posted earlier,
the traces seem correct with the exception of a few "holes"
where the trace seems to jump over a few instructions that
are not reported in the trace, creating jumps that do not exist
in the control flow graph of the code.

The nested loop for bubble sort is:

  4008a0:       9100e3a0        add     x0, x29, #0x38
  4008a4:       52800004        mov     w4, #0x0                        // #0
  4008a8:       29400402        ldp     w2, w1, [x0]
  4008ac:       6b02003f        cmp     w1, w2
  4008b0:       5400006a        b.ge    4008bc <sort_array+0x84>
  4008b4:       52800024        mov     w4, #0x1                        // #1
  4008b8:       29000801        stp     w1, w2, [x0]
  4008bc:       91001000        add     x0, x0, #0x4
  4008c0:       eb00007f        cmp     x3, x0
  4008c4:       54ffff21        b.ne    4008a8 <sort_array+0x70>
  4008c8:       35fffec4        cbnz    w4, 4008a0 <sort_array+0x68>


..... 34: 00000000004008b0 -> 00000000004008b4 0 cycles  P   0
..... 35: 00000000004008c4 -> 00000000004008a8 0 cycles  P   0
..... 36: 00000000004008b0 -> 00000000004008a8 0 cycles  P   0

edge #36 does not exist in the code: the trace is not correct here.
4008b0 is "b.ge    4008bc" and should either jump to 4008bc or
fall through to the next instruction 4008b4, and the trace wrongly
jumps to 4008a8.

Several hundred jumps later, we see this following sequence:

..... 40: 00000000004008c4 -> 00000000004008a8 0 cycles  P   0
..... 41: 00000000004008b0 -> 00000000004008b4 0 cycles  P   0
..... 42: 00000000004008c4 -> 00000000004008b4 0 cycles  P   0
..... 43: 00000000004008c4 -> 00000000004008a8 0 cycles  P   0

where edge #42 is not correct either: 4008c4 should either branch to
4008a8 or fall through to 4008c8.

Maybe these inconsistencies are due to interruptions in trace recordings?
I think that these interruptions could not be avoided in trace collections.

Dehao, could these wrong edges be fixed in the compiler when reading
the coverage file?

I cannot see an easy way for compiler/create_gcov tool to cover these issue. Why can't trace collection tool fix these issues? Looks a bug to me.


My thinking was that the compiler knows that there are no edges
between the blocks at these addresses and may just ignore the counts
at these addresses.

Maybe we can figure out why this pattern occurs and try to solve
it in either perf inject or in the decoder? The pattern looks very regular.

These first two errors occur at a distance of 389 branches,
the next error occurs after again 389 branches.
If we call the first incorrect jump "A" and the second incorrect jump "B",
we have this pattern:

A,
389 correct jumps,
B,
389 correct jumps,
A,
389 correct jumps,
B,

There are 343 occurrences of A and 322 of B in a trace of sorting 3000 elements.