Greetings,

I recorded the program "ls" (statically linked to provide a single executable as a memory accesses file).

I recorded the program using perf, and then extracted the actual raw trace data from the perf.data file using a little tool i wrote. I can use OpenCSD to fully decode the trace produced by perf.

I also recorded the "ls" util using an API i wrote from kernel mode. I published the API here as an [RFC]. Basically, i start recording and stop recording whenever the __process__ of my interest is scheduling in.
This post is not much about requesting a review for my API.. but i do have some issues with the trace that is produced by this API, and i'm not quite sure why.

I use the OpenCSD directly in my code, and register a decoder callback for every generic trace element. When my callback is called, i simply print the element string representation(e.g. OCSD_GEN_TRC_ELEM_INSTR_RANGE). 

Now, the weird thing is the perf and API produce the same generic elements until a certain element:

OCSD_GEN_TRC_ELEM_TRACE_ON()
...
...
... same elements...
... same elements...
... same elements...
...
...

And eventually diverge from each other. I assume the perf trace is going in the right direction, but my trace simply starts going nuts. The last __common__ generic element is the following:

OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4148f4:[0x414910] (ISA=A64) E iBR A64:ret )

After this element, perf trace goes in a different route, and the API right afterwards produced a very weird instruction range element:

OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x414910:[0x498a20] (ISA=A64) E --- )

There is no way this 0x498a20 address was reached, and i cannot see any proof for it in the trace itself(using ptm2human). It seems that the decoder keeps decoding and disassembling opcodes until it reaches 0x498a20... my memory callback(callback that is called if the decoder needs memory that isn't present) is called for the address 0x498a20. From the on, the trace just goes into a very weird path. I can't explain the address branches that are taken from here on.


Any ideas on how to approach this? OpenCSD experts would be appreciated.
I have attached the perf and API trace, and the "ls" executable which is loaded into address 0x400000. I also attached the ETMv4 config for every trace(trace id, etc..). There is no need to create multiple decoders for different trace ids, theres only a single ID for a single decoder.

Thanks,
Mike.