By popular demand, I started debugging this problem again.

With the two patches that I posted earlier,
the traces seem correct with the exception of a few "holes"
where the trace seems to jump over a few instructions that
are not reported in the trace, creating jumps that do not exist
in the control flow graph of the code.

The nested loop for bubble sort is:

  4008a0:       9100e3a0        add     x0, x29, #0x38
  4008a4:       52800004        mov     w4, #0x0                        // #0
  4008a8:       29400402        ldp     w2, w1, [x0]
  4008ac:       6b02003f        cmp     w1, w2
  4008b0:       5400006a        b.ge    4008bc <sort_array+0x84>
  4008b4:       52800024        mov     w4, #0x1                        // #1
  4008b8:       29000801        stp     w1, w2, [x0]
  4008bc:       91001000        add     x0, x0, #0x4
  4008c0:       eb00007f        cmp     x3, x0
  4008c4:       54ffff21        b.ne    4008a8 <sort_array+0x70>
  4008c8:       35fffec4        cbnz    w4, 4008a0 <sort_array+0x68>


..... 34: 00000000004008b0 -> 00000000004008b4 0 cycles  P   0
..... 35: 00000000004008c4 -> 00000000004008a8 0 cycles  P   0
..... 36: 00000000004008b0 -> 00000000004008a8 0 cycles  P   0

edge #36 does not exist in the code: the trace is not correct here.
4008b0 is "b.ge    4008bc" and should either jump to 4008bc or
fall through to the next instruction 4008b4, and the trace wrongly
jumps to 4008a8.

Several hundred jumps later, we see this following sequence:

..... 40: 00000000004008c4 -> 00000000004008a8 0 cycles  P   0
..... 41: 00000000004008b0 -> 00000000004008b4 0 cycles  P   0
..... 42: 00000000004008c4 -> 00000000004008b4 0 cycles  P   0
..... 43: 00000000004008c4 -> 00000000004008a8 0 cycles  P   0

where edge #42 is not correct either: 4008c4 should either branch to
4008a8 or fall through to 4008c8.

Maybe these inconsistencies are due to interruptions in trace recordings?
I think that these interruptions could not be avoided in trace collections.

Dehao, could these wrong edges be fixed in the compiler when reading
the coverage file?
Thanks,
Sebastian