Re: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

1 Oct 2018

      On Mon, Oct 01, 2018 at 09:19:33AM +0100, Robert Walker wrote:
[...]
...
...
...
...
+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem)
+{

  return elem->num_instr_range;

+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem)
+{

  return elem->last_instr_sz;

+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem)
+{

  /* Assume a 4-byte instruction size - will be wrong for T32 */

  WARN_ONCE(elem->isa == ocsd_isa_thumb2,

            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");

For this case, the program should report failure and directly exit?
Here why not check for A32 instruction case?  Before version 0.8.x
has supported A32 instructions?  I understand A32 and A64 both have 4
bytes instruction width, but I don't know if the old OpenCSD lib
supports to decode both of them?
The old version of OpenCSD did support A32 and T32 - samples from A32 trace
will work with the current version of perf as the assumption of 4 byte
instruction sizes for A64 also applies to A32.
Thanks for explaination.
...
...
...
...

  return (elem->en_addr - elem->st_addr) / 4;

+}

+static
+int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused)
+{

  return 4;

+}

+#endif

alternate not needed if you decide not to build against insufficient version.
Just curious, if the API ocsd_get_version() has been provided by all
OpenCSD versions?
Seems to me, this patch is to keep backwards compability with v0.8.x; I
think we can still support perf build with old version OpenCSD, at the
runtime the perf cs_etm decoder can get the OpenCSD version number
by using API ocsd_get_version() and check if the OpenCSD version is
sufficient for decoding current trace data, if not then reports
error and directly exit from program.
ocsd_get_version() is provided by all versions of OpenCSD.  The features
added in 0.9.x to provide the instruction counts add a new member to the
struct that describes each range of executed instructions - so the new code
doesn't compile if the old library is installed.  Calling ocsd_get_version()
could be used to check for the unlikely case that perf has been compiled
against 0.9.x, but is now being run against an older version - is this worth
doing?
Ah, I missed the building failure when use the new kernel with old
library.
...
From the developer view, I bais to Mike's suggestion to directly report
failure when build new kernel with old verison lib; the code can keep
as simple as possible and we can easily build latest version OpenCSD
lib by ourselves.
Be honest, I am not confident for how this works with distros; this
is not for a developer but for end users.  E.g. Debian / Ubuntu have
officially released for the OpenCSD v0.8.x, should the new kernel
support these old version OpenCSD libs?  I am not sure if this is the
case, if OpenCSD libs have not been released in Debian/Ubuntu, then
this is not concern at all.
[...]
...
...
...
...
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,

                                   u64 addr) {

Nitpick: add a new line for '{'.
...
...
...
...

  u8 instrBytes[2];

  /*

   * The packet records the execution range with an exclusive end address

   *

   * A64 instructions are constant size, so the last executed

   * instruction is A64_INSTR_SIZE before the end address

   * Will need to do instruction level decode for T32 instructions as

   * they can be variable size (not yet supported).

  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);

  /* T32 instruction size is indicated by bits[15:11] of the first

   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111

   * denote a 32-bit instruction.
    */

  return packet->end_addr - A64_INSTR_SIZE;

  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;

Could use macro to replace 0xF8/0xE8 for more readable?
I don't think adding a macro will make this any more readable - there's a
detailed comment about the encoding above.
Yeah, after read the comments I understand the logic :)
Another question, look into the flow as showed blow, T32 must dependent
on DSO to calculate instruction address; just curious if we can relay on
decoder to decide the T32 instruction address and remove dependency on
DSO?
cs_etm__instr_addr()
    `-> cs_etm__t32_instr_size()
          `-> cs_etm__mem_access()
                `-> dso__data_read_offset()
[...]
...
...
...
...
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet,
+static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,

                               const struct cs_etm_packet *packet,
                                u64 offset)

{

  /*

   * Only A64 instructions are currently supported, so can get

   * instruction address by muliplying.

   * Will need to do instruction level decode for T32 instructions as

   * they can be variable size (not yet supported).

   */

  return packet->start_addr + offset * A64_INSTR_SIZE;

  if (packet->isa == CS_ETM_ISA_T32) {

          u64 addr = packet->start_addr;

          while (offset > 0) {

                  addr += cs_etm__t32_instr_size(etmq, addr);

                  offset--;

Should be: offset -= cs_etm__t32_instr_size(etmq, addr)?
No - offset is the number of the instruction we want to find the address of

e.g. the decoder will tell us that the block has 100 instructions and we

want to generate a sample on the 57th, so we step 57 instructions through
the block.
Understand now, sorry for noise.
[...]
Thanks,
Leo Yan

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace