[RFC v3 0/2] perf: Support for Arm A32/T32 instruction sets

List overview All Threads
Download

newer

older

Open CSD

[PATCH] coresight: tmc: Fix bad...

Robert Walker

27 Sep 2018 27 Sep '18

10:23 a.m.

Hi,

I'm taking this back to the linaro coresight list so we can get the OpenCSD library versioning sorted out.

The first patch splits the OpenCSD feature check into two parts. The original check is left as is - this just checks for the presence of an OpenCSD library. A new check (libopencsd-numinstr) is added that checks for the new OpenCSD (>0.9.0) that has the num_instr_range member in the ocsd_generic_trace_elem struct. This feature is then used to set a flag used in cs-etm-decoder.c to select which versions of 2 functions are used to get the instruction count / last instruction size of each instruction block - if the flag is not set, then the previous assumptions of a 4 byte instruction size are used. It was suggested that OpenCSD should export a version header - I agree this is a good idea, but this will require a new release of the library, so we would miss support for the instruction sizes when OpenCSD 0.9.{0,1,2} is installed - hence why I've kept the version check using the presence of num_instr_range.

The second patch adds support for finding the T32 instruction counts when the OpenCSD library doesn't report the instruction counts. As this involves iterating through the block of instructions and examining each instruction, there is a significant peformance hit (about 5x slower than using the OpenCSD library to report the instruction counts), so I'm not sure this patch should go into upstream.

Regards

Rob

Robert Walker (2): perf: Support for Arm A32/T32 instruction sets in CoreSight trace perf: Full support for Arm T32 instructions with older version of OpenCSD

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 + tools/build/feature/test-libopencsd-numinstr.c | 15 ++++ tools/perf/Makefile.config | 3 + tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 106 ++++++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 +++ tools/perf/util/cs-etm.c | 71 +++++++--------- 7 files changed, 171 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

-- 2.7.4

Show replies by date

Robert Walker

27 Sep 27 Sep

10:23 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com --- tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \ - libopencsd + libopencsd \ + libopencsd-numinstr

# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \ + test-libopencsd-numinstr.bin \ test-clang.bin \ test-llvm.bin \ test-llvm-version.bin @@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin: + $(BUILD) -lopencsd_c_api -lopencsd + DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2 diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h> + +int main(void) +{ + /* + * Requires ocsd_generic_trace_elem.num_instr_range introduced in + * OpenCSD 0.9.0 + */ + ocsd_generic_trace_elem elem; + (void)elem.num_instr_range; + + (void)ocsd_get_version(); + return 0; +} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif + ifeq ($(feature-libopencsd-numinstr), 1) + CFLAGS += -DHAVE_CSTRACE_INSTR_INFO + endif endif

ifndef NO_LIBELF diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) { + decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN; decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR; decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR; + decoder->packet_buffer[i].instr_count = 0; decoder->packet_buffer[i].last_instr_taken_branch = false; + decoder->packet_buffer[i].last_instr_size = 0; decoder->packet_buffer[i].exc = false; decoder->packet_buffer[i].exc_ret = false; decoder->packet_buffer[i].cpu = INT_MIN; @@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;

decoder->packet_buffer[et].sample_type = sample_type; + decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN; decoder->packet_buffer[et].exc = false; decoder->packet_buffer[et].exc_ret = false; decoder->packet_buffer[et].cpu = *((int *)inode->priv); decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR; decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR; + decoder->packet_buffer[et].instr_count = 0;

if (decoder->packet_count == MAX_BUFFER - 1) return OCSD_RESP_WAIT; @@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO + +static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{ + return elem->num_instr_range; +} + +static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{ + return elem->last_instr_sz; +} + +#else + +static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{ + /* Assume a 4-byte instruction size - will be wrong for T32 */ + WARN_ONCE(elem->isa == ocsd_isa_thumb2, + "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n"); + return (elem->en_addr - elem->st_addr) / 4; +} + +static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{ + return 4; +} + +#endif + static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem, @@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,

packet = &decoder->packet_buffer[decoder->tail];

+ switch (elem->isa) { + case ocsd_isa_aarch64: + packet->isa = CS_ETM_ISA_A64; + break; + case ocsd_isa_arm: + packet->isa = CS_ETM_ISA_A32; + break; + case ocsd_isa_thumb2: + packet->isa = CS_ETM_ISA_T32; + break; + case ocsd_isa_tee: + case ocsd_isa_jazelle: + case ocsd_isa_custom: + case ocsd_isa_unknown: + default: + packet->isa = CS_ETM_ISA_UNKNOWN; + } + packet->start_addr = elem->st_addr; packet->end_addr = elem->en_addr; + packet->instr_count = cs_etm_decoder__instr_count(elem); + switch (elem->last_i_type) { case OCSD_INSTR_BR: case OCSD_INSTR_BR_INDIRECT: @@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }

+ packet->last_instr_size = cs_etm_decoder__last_instr_size(elem); + return ret; }

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa { + CS_ETM_ISA_UNKNOWN, + CS_ETM_ISA_A64, + CS_ETM_ISA_A32, + CS_ETM_ISA_T32, +}; + struct cs_etm_packet { enum cs_etm_sample_type sample_type; + enum cs_etm_isa isa; u64 start_addr; u64 end_addr; + u32 instr_count; u8 last_instr_taken_branch; + u8 last_instr_size; u8 exc; u8 exc_ret; int cpu; diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/* - * A64 instructions are always 4 bytes - * - * Only A64 is supported, so can use this constant for converting between - * addresses and instruction counts, calculting offsets etc - */ -#define A64_INSTR_SIZE 4 - struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues; @@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{ - /* Returns 0 for the CS_ETM_TRACE_ON packet */ - if (packet->sample_type == CS_ETM_TRACE_ON) - return 0; +static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, + u64 addr) { + u8 instrBytes[2];

- /* - * The packet records the execution range with an exclusive end address - * - * A64 instructions are constant size, so the last executed - * instruction is A64_INSTR_SIZE before the end address - * Will need to do instruction level decode for T32 instructions as - * they can be variable size (not yet supported). + cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes); + /* T32 instruction size is indicated by bits[15:11] of the first + * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111 + * denote a 32-bit instruction. */ - return packet->end_addr - A64_INSTR_SIZE; + return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2; }

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) @@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) { - /* - * Only A64 instructions are currently supported, so can get - * instruction count by dividing. - * Will need to do instruction level decode for T32 instructions as - * they can be variable size (not yet supported). - */ - return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE; + /* Returns 0 for the CS_ETM_TRACE_ON packet */ + if (packet->sample_type == CS_ETM_TRACE_ON) + return 0; + + return packet->end_addr - packet->last_instr_size; }

-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, + const struct cs_etm_packet *packet, u64 offset) { - /* - * Only A64 instructions are currently supported, so can get - * instruction address by muliplying. - * Will need to do instruction level decode for T32 instructions as - * they can be variable size (not yet supported). - */ - return packet->start_addr + offset * A64_INSTR_SIZE; + if (packet->isa == CS_ETM_ISA_T32) { + u64 addr = packet->start_addr; + + while (offset > 0) { + addr += cs_etm__t32_instr_size(etmq, addr); + offset--; + } + return addr; + } + + /* Assume a 4 byte instruction size (A32/A64) */ + return packet->start_addr + offset * 4; }

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) @@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret; - u64 instrs_executed; + u64 instrs_executed = etmq->packet->instr_count;

- instrs_executed = cs_etm__instr_count(etmq->packet); etmq->period_instructions += instrs_executed;

/* @@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1); - u64 addr = cs_etm__instr_addr(etmq->packet, offset); + u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);

ret = cs_etm__synth_instruction_sample( etmq, addr, etm->instructions_sample_period);

-- 2.7.4

Mike Leach

2:03 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Hi Rob, On Thu, 27 Sep 2018 at 11:24, Robert Walker robert.walker@arm.com wrote:

...

This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
    test-clang.bin                                \
    test-llvm.bin                         \
    test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2

Per my comments previously - if no opencsd lib is found, then we don't build the feature in - as before - report lib not found etc. - if opencsd, but wrong version - don't build feature in, but output a version info message - e..g need 0.9.x or later.

...

diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif

endif

ifndef NO_LIBELF diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
          decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
          decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
          decoder->packet_buffer[i].exc = false;
          decoder->packet_buffer[i].exc_ret = false;
          decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
    decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
  decoder->packet_buffer[et].exc = false;
  decoder->packet_buffer[et].exc_ret = false;
  decoder->packet_buffer[et].cpu = *((int *)inode->priv);
  decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;

  if (decoder->packet_count == MAX_BUFFER - 1)
          return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif

alternate not needed if you decide not to build against insufficient version.

...

static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem, @@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
    packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
  packet->start_addr = elem->st_addr;
  packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
  switch (elem->last_i_type) {
  case OCSD_INSTR_BR:
  case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
  return ret;
}

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};

struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
  u64 start_addr;
  u64 end_addr;
  u32 instr_count;
  u8 last_instr_taken_branch;
  u8 last_instr_size;
  u8 exc;
  u8 exc_ret;
  int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues; @@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
   */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
}

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) @@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction count by dividing.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
  return packet->end_addr - packet->last_instr_size;
}

-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                               u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) @@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
  etmq->period_instructions += instrs_executed;

  /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);

If you are passing etmq, do you also need to pass etmq->packet?

Mike

...

            ret = cs_etm__synth_instruction_sample(
                    etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK

leo.yan＠linaro.org

29 Sep 29 Sep

8:35 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Thu, Sep 27, 2018 at 03:03:03PM +0100, Mike Leach wrote:

...

Hi Rob, On Thu, 27 Sep 2018 at 11:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
    test-clang.bin                                \
    test-llvm.bin                         \
    test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
Per my comments previously

if no opencsd lib is found, then we don't build the feature in - as

before - report lib not found etc.

if opencsd, but wrong version - don't build feature in, but output a

version info message - e..g need 0.9.x or later.

...
diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif

endif

ifndef NO_LIBELF diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
          decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
          decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
          decoder->packet_buffer[i].exc = false;
          decoder->packet_buffer[i].exc_ret = false;
          decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
    decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
  decoder->packet_buffer[et].exc = false;
  decoder->packet_buffer[et].exc_ret = false;
  decoder->packet_buffer[et].cpu = *((int *)inode->priv);
  decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;

Miss to initialize decoder->packet_buffer[et].last_instr_size to 0?

...

...
    if (decoder->packet_count == MAX_BUFFER - 1)
            return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");

For this case, the program should report failure and directly exit?

Here why not check for A32 instruction case? Before version 0.8.x has supported A32 instructions? I understand A32 and A64 both have 4 bytes instruction width, but I don't know if the old OpenCSD lib supports to decode both of them?

...

...
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif
alternate not needed if you decide not to build against insufficient version.

Just curious, if the API ocsd_get_version() has been provided by all OpenCSD versions?

Seems to me, this patch is to keep backwards compability with v0.8.x; I think we can still support perf build with old version OpenCSD, at the runtime the perf cs_etm decoder can get the OpenCSD version number by using API ocsd_get_version() and check if the OpenCSD version is sufficient for decoding current trace data, if not then reports error and directly exit from program.

...

...
static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem, @@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
    packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
  packet->start_addr = elem->st_addr;
  packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
  switch (elem->last_i_type) {
  case OCSD_INSTR_BR:
  case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
  return ret;
}

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};

struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
  u64 start_addr;
  u64 end_addr;
  u32 instr_count;
  u8 last_instr_taken_branch;
  u8 last_instr_size;
  u8 exc;
  u8 exc_ret;
  int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues; @@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
   */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;

Could use macro to replace 0xF8/0xE8 for more readable?

...

}

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {

```
  /*
```

   * Only A64 instructions are currently supported, so can get

```
   * instruction count by dividing.
```

   * Will need to do instruction level decode for T32 instructions as

   * they can be variable size (not yet supported).

```
   */
```

  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;

  /* Returns 0 for the CS_ETM_TRACE_ON packet */

  if (packet->sample_type == CS_ETM_TRACE_ON)

```
          return 0;
```

  return packet->end_addr - packet->last_instr_size;

}

-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,

                               const struct cs_etm_packet *packet,
                               u64 offset)

{

```
  /*
```

   * Only A64 instructions are currently supported, so can get

   * instruction address by muliplying.

   * Will need to do instruction level decode for T32 instructions as

   * they can be variable size (not yet supported).

```
   */
```

  return packet->start_addr + offset * A64_INSTR_SIZE;

```
  if (packet->isa == CS_ETM_ISA_T32) {
```

          u64 addr = packet->start_addr;

```
          while (offset > 0) {
```

                  addr += cs_etm__t32_instr_size(etmq, addr);

```
                  offset--;
```

Should be: offset -= cs_etm__t32_instr_size(etmq, addr)?

...

...
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) @@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
  etmq->period_instructions += instrs_executed;

  /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);
If you are passing etmq, do you also need to pass etmq->packet?

Mike

...
            ret = cs_etm__synth_instruction_sample(
                    etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Robert Walker

1 Oct 1 Oct

8:19 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On 29/09/18 09:35, leo.yan@linaro.org wrote:

...

On Thu, Sep 27, 2018 at 03:03:03PM +0100, Mike Leach wrote:

...
Hi Rob, On Thu, 27 Sep 2018 at 11:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
     test-clang.bin                                \
     test-llvm.bin                         \
     test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
Per my comments previously

if no opencsd lib is found, then we don't build the feature in - as

before - report lib not found etc.

if opencsd, but wrong version - don't build feature in, but output a

version info message - e..g need 0.9.x or later.

...
diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif endif

ifndef NO_LIBELF

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
           decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
           decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
           decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
           decoder->packet_buffer[i].exc = false;
           decoder->packet_buffer[i].exc_ret = false;
           decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
     decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
   decoder->packet_buffer[et].exc = false;
   decoder->packet_buffer[et].exc_ret = false;
   decoder->packet_buffer[et].cpu = *((int *)inode->priv);
   decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
   decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;
Miss to initialize decoder->packet_buffer[et].last_instr_size to 0?

And also last_instr_taken_branch - these fields are only used in cs-etm.c if the packet type is CS_ETM_RANGE - so they will be set by cs_etm_decoder__buffer_range(). However, I will add add initializers for these.

...

...
...
     if (decoder->packet_count == MAX_BUFFER - 1)
             return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
For this case, the program should report failure and directly exit?

Here why not check for A32 instruction case? Before version 0.8.x has supported A32 instructions? I understand A32 and A64 both have 4 bytes instruction width, but I don't know if the old OpenCSD lib supports to decode both of them?

The old version of OpenCSD did support A32 and T32 - samples from A32 trace will work with the current version of perf as the assumption of 4 byte instruction sizes for A64 also applies to A32.

...

...
...
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif
alternate not needed if you decide not to build against insufficient version.
Just curious, if the API ocsd_get_version() has been provided by all OpenCSD versions?

Seems to me, this patch is to keep backwards compability with v0.8.x; I think we can still support perf build with old version OpenCSD, at the runtime the perf cs_etm decoder can get the OpenCSD version number by using API ocsd_get_version() and check if the OpenCSD version is sufficient for decoding current trace data, if not then reports error and directly exit from program.

ocsd_get_version() is provided by all versions of OpenCSD. The features added in 0.9.x to provide the instruction counts add a new member to the struct that describes each range of executed instructions - so the new code doesn't compile if the old library is installed. Calling ocsd_get_version() could be used to check for the unlikely case that perf has been compiled against 0.9.x, but is now being run against an older version - is this worth doing?

...

...
...
static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem, @@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
     packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
   packet->start_addr = elem->st_addr;
   packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
   switch (elem->last_i_type) {
   case OCSD_INSTR_BR:
   case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
   return ret;
}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};
struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
   u64 start_addr;
   u64 end_addr;
  u32 instr_count;
   u8 last_instr_taken_branch;
  u8 last_instr_size;
   u8 exc;
   u8 exc_ret;
   int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues;

@@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
    */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
Could use macro to replace 0xF8/0xE8 for more readable?

I don't think adding a macro will make this any more readable - there's a detailed comment about the encoding above.

...

}

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {

```
  /*
```

   * Only A64 instructions are currently supported, so can get

```
   * instruction count by dividing.
```

   * Will need to do instruction level decode for T32 instructions as

   * they can be variable size (not yet supported).

```
   */
```

  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;

  /* Returns 0 for the CS_ETM_TRACE_ON packet */

  if (packet->sample_type == CS_ETM_TRACE_ON)

```
          return 0;
```

  return packet->end_addr - packet->last_instr_size;

}

-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,

                               const struct cs_etm_packet *packet,
                                u64 offset)

{

```
  /*
```

   * Only A64 instructions are currently supported, so can get

   * instruction address by muliplying.

   * Will need to do instruction level decode for T32 instructions as

   * they can be variable size (not yet supported).

```
   */
```

  return packet->start_addr + offset * A64_INSTR_SIZE;

```
  if (packet->isa == CS_ETM_ISA_T32) {
```

          u64 addr = packet->start_addr;

```
          while (offset > 0) {
```

                  addr += cs_etm__t32_instr_size(etmq, addr);

```
                  offset--;
```

Should be: offset -= cs_etm__t32_instr_size(etmq, addr)?

No - offset is the number of the instruction we want to find the address of - e.g. the decoder will tell us that the block has 100 instructions and we want to generate a sample on the 57th, so we step 57 instructions through the block.

...

...
...
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
@@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
   etmq->period_instructions += instrs_executed;

   /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);
If you are passing etmq, do you also need to pass etmq->packet?

Mike

...
             ret = cs_etm__synth_instruction_sample(
                     etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

leo.yan＠linaro.org

11:33 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Mon, Oct 01, 2018 at 09:19:33AM +0100, Robert Walker wrote:

[...]

...

...
...
...
+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
For this case, the program should report failure and directly exit?

Here why not check for A32 instruction case? Before version 0.8.x has supported A32 instructions? I understand A32 and A64 both have 4 bytes instruction width, but I don't know if the old OpenCSD lib supports to decode both of them?
The old version of OpenCSD did support A32 and T32 - samples from A32 trace will work with the current version of perf as the assumption of 4 byte instruction sizes for A64 also applies to A32.

Thanks for explaination.

...

...
...
...
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif
alternate not needed if you decide not to build against insufficient version.
Just curious, if the API ocsd_get_version() has been provided by all OpenCSD versions?

Seems to me, this patch is to keep backwards compability with v0.8.x; I think we can still support perf build with old version OpenCSD, at the runtime the perf cs_etm decoder can get the OpenCSD version number by using API ocsd_get_version() and check if the OpenCSD version is sufficient for decoding current trace data, if not then reports error and directly exit from program.
ocsd_get_version() is provided by all versions of OpenCSD. The features added in 0.9.x to provide the instruction counts add a new member to the struct that describes each range of executed instructions - so the new code doesn't compile if the old library is installed. Calling ocsd_get_version() could be used to check for the unlikely case that perf has been compiled against 0.9.x, but is now being run against an older version - is this worth doing?

Ah, I missed the building failure when use the new kernel with old library.

...

From the developer view, I bais to Mike's suggestion to directly report

failure when build new kernel with old verison lib; the code can keep as simple as possible and we can easily build latest version OpenCSD lib by ourselves.

Be honest, I am not confident for how this works with distros; this is not for a developer but for end users. E.g. Debian / Ubuntu have officially released for the OpenCSD v0.8.x, should the new kernel support these old version OpenCSD libs? I am not sure if this is the case, if OpenCSD libs have not been released in Debian/Ubuntu, then this is not concern at all.

[...]

...

...
...
...
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {

Nitpick: add a new line for '{'.

...

```
  u8 instrBytes[2];
```

```
  /*
```

   * The packet records the execution range with an exclusive end address

```
   *
```

   * A64 instructions are constant size, so the last executed

   * instruction is A64_INSTR_SIZE before the end address

   * Will need to do instruction level decode for T32 instructions as

   * they can be variable size (not yet supported).

  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);

  /* T32 instruction size is indicated by bits[15:11] of the first

   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111

   * denote a 32-bit instruction.
    */

  return packet->end_addr - A64_INSTR_SIZE;

  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;

Could use macro to replace 0xF8/0xE8 for more readable?

I don't think adding a macro will make this any more readable - there's a detailed comment about the encoding above.

Yeah, after read the comments I understand the logic :)

Another question, look into the flow as showed blow, T32 must dependent on DSO to calculate instruction address; just curious if we can relay on decoder to decide the T32 instruction address and remove dependency on DSO?

cs_etm__instr_addr() `-> cs_etm__t32_instr_size() `-> cs_etm__mem_access() `-> dso__data_read_offset()

[...]

...

...
...
...
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                                u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
Should be: offset -= cs_etm__t32_instr_size(etmq, addr)?
No - offset is the number of the instruction we want to find the address of

e.g. the decoder will tell us that the block has 100 instructions and we

want to generate a sample on the 57th, so we step 57 instructions through the block.

Understand now, sorry for noise.

[...]

Thanks, Leo Yan

Mike Leach

12:52 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Hi, On Mon, 1 Oct 2018 at 12:33, leo.yan@linaro.org wrote:

...

On Mon, Oct 01, 2018 at 09:19:33AM +0100, Robert Walker wrote:

[...]

...
...
...
...
+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
For this case, the program should report failure and directly exit?

Here why not check for A32 instruction case? Before version 0.8.x has supported A32 instructions? I understand A32 and A64 both have 4 bytes instruction width, but I don't know if the old OpenCSD lib supports to decode both of them?
The old version of OpenCSD did support A32 and T32 - samples from A32 trace will work with the current version of perf as the assumption of 4 byte instruction sizes for A64 also applies to A32.
Thanks for explaination.

...
...
...
...
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif
alternate not needed if you decide not to build against insufficient version.
Just curious, if the API ocsd_get_version() has been provided by all OpenCSD versions?

Seems to me, this patch is to keep backwards compability with v0.8.x; I think we can still support perf build with old version OpenCSD, at the runtime the perf cs_etm decoder can get the OpenCSD version number by using API ocsd_get_version() and check if the OpenCSD version is sufficient for decoding current trace data, if not then reports error and directly exit from program.
ocsd_get_version() is provided by all versions of OpenCSD. The features added in 0.9.x to provide the instruction counts add a new member to the struct that describes each range of executed instructions - so the new code doesn't compile if the old library is installed. Calling ocsd_get_version() could be used to check for the unlikely case that perf has been compiled against 0.9.x, but is now being run against an older version - is this worth doing?
Ah, I missed the building failure when use the new kernel with old library.

From the developer view, I bais to Mike's suggestion to directly report failure when build new kernel with old verison lib; the code can keep as simple as possible and we can easily build latest version OpenCSD lib by ourselves.

Be honest, I am not confident for how this works with distros; this is not for a developer but for end users. E.g. Debian / Ubuntu have officially released for the OpenCSD v0.8.x, should the new kernel support these old version OpenCSD libs? I am not sure if this is the case, if OpenCSD libs have not been released in Debian/Ubuntu, then this is not concern at all.

The first Debian distro to use OpenCSD used 0.8.4. They have picked up later versions as they have been released. Don't know if Ubuntu are picking this up yet.

Any user that is skilled enough to get / use / build an updated version of perf that is not in their current kernel revision, will have the necessary skills to find the latest version of the OpenCSD library should they need it.

Mike

...

[...]

...
...
...
...
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
Nitpick: add a new line for '{'.

...
...
...
...
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
    */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
Could use macro to replace 0xF8/0xE8 for more readable?
I don't think adding a macro will make this any more readable - there's a detailed comment about the encoding above.
Yeah, after read the comments I understand the logic :)

Another question, look into the flow as showed blow, T32 must dependent on DSO to calculate instruction address; just curious if we can relay on decoder to decide the T32 instruction address and remove dependency on DSO?

cs_etm__instr_addr() `-> cs_etm__t32_instr_size() `-> cs_etm__mem_access() `-> dso__data_read_offset()

[...]

...
...
...
...
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                                u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
Should be: offset -= cs_etm__t32_instr_size(etmq, addr)?
No - offset is the number of the instruction we want to find the address of

e.g. the decoder will tell us that the block has 100 instructions and we

want to generate a sample on the 57th, so we step 57 instructions through the block.
Understand now, sorry for noise.

[...]

Thanks, Leo Yan

-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK

Mathieu Poirier

5:47 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Mon, 1 Oct 2018 at 06:52, Mike Leach mike.leach@linaro.org wrote:

...

Hi, On Mon, 1 Oct 2018 at 12:33, leo.yan@linaro.org wrote:

...
On Mon, Oct 01, 2018 at 09:19:33AM +0100, Robert Walker wrote:

[...]

...
...
...
...
+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
For this case, the program should report failure and directly exit?

Here why not check for A32 instruction case? Before version 0.8.x has supported A32 instructions? I understand A32 and A64 both have 4 bytes instruction width, but I don't know if the old OpenCSD lib supports to decode both of them?
The old version of OpenCSD did support A32 and T32 - samples from A32 trace will work with the current version of perf as the assumption of 4 byte instruction sizes for A64 also applies to A32.
Thanks for explaination.

...
...
...
...
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif
alternate not needed if you decide not to build against insufficient version.
Just curious, if the API ocsd_get_version() has been provided by all OpenCSD versions?

Seems to me, this patch is to keep backwards compability with v0.8.x; I think we can still support perf build with old version OpenCSD, at the runtime the perf cs_etm decoder can get the OpenCSD version number by using API ocsd_get_version() and check if the OpenCSD version is sufficient for decoding current trace data, if not then reports error and directly exit from program.
ocsd_get_version() is provided by all versions of OpenCSD. The features added in 0.9.x to provide the instruction counts add a new member to the struct that describes each range of executed instructions - so the new code doesn't compile if the old library is installed. Calling ocsd_get_version() could be used to check for the unlikely case that perf has been compiled against 0.9.x, but is now being run against an older version - is this worth doing?
Ah, I missed the building failure when use the new kernel with old library.

From the developer view, I bais to Mike's suggestion to directly report failure when build new kernel with old verison lib; the code can keep as simple as possible and we can easily build latest version OpenCSD lib by ourselves.

Be honest, I am not confident for how this works with distros; this is not for a developer but for end users. E.g. Debian / Ubuntu have officially released for the OpenCSD v0.8.x, should the new kernel support these old version OpenCSD libs? I am not sure if this is the case, if OpenCSD libs have not been released in Debian/Ubuntu, then this is not concern at all.
The first Debian distro to use OpenCSD used 0.8.4. They have picked up later versions as they have been released. Don't know if Ubuntu are picking this up yet.

Any user that is skilled enough to get / use / build an updated version of perf that is not in their current kernel revision, will have the necessary skills to find the latest version of the OpenCSD library should they need it.

I totally agree with the opinion above and have been since the very first release of this patchset. But I asked around at Linaro and was told that most utilities will support building against an older library (yielding a reduced feature set of course). People will typically use autoconf/automake to provide this kind of functionality but I think it is way overkill, hence the idea of introducing a #define in the openCSD library to conditionally compile things.

I am willing to try to do the right thing. As such I suggest we accommodate minor library revision but mandate to move to the latest for major releases. This is something we can talk about in Manchester if we haven't come up with an agreement by that time.

Mathieu

...

Mike

...
[...]

...
...
...
...
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
Nitpick: add a new line for '{'.

...
...
...
...
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
    */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
Could use macro to replace 0xF8/0xE8 for more readable?
I don't think adding a macro will make this any more readable - there's a detailed comment about the encoding above.
Yeah, after read the comments I understand the logic :)

Another question, look into the flow as showed blow, T32 must dependent on DSO to calculate instruction address; just curious if we can relay on decoder to decide the T32 instruction address and remove dependency on DSO?

cs_etm__instr_addr() `-> cs_etm__t32_instr_size() `-> cs_etm__mem_access() `-> dso__data_read_offset()

[...]

...
...
...
...
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                                u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
Should be: offset -= cs_etm__t32_instr_size(etmq, addr)?
No - offset is the number of the instruction we want to find the address of

e.g. the decoder will tell us that the block has 100 instructions and we

want to generate a sample on the 57th, so we step 57 instructions through the block.
Understand now, sorry for noise.

[...]

Thanks, Leo Yan
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Mathieu Poirier

5:30 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Mon, 1 Oct 2018 at 02:19, Robert Walker robert.walker@arm.com wrote:

...

On 29/09/18 09:35, leo.yan@linaro.org wrote:

...
On Thu, Sep 27, 2018 at 03:03:03PM +0100, Mike Leach wrote:

...
Hi Rob, On Thu, 27 Sep 2018 at 11:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
     test-clang.bin                                \
     test-llvm.bin                         \
     test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
Per my comments previously

if no opencsd lib is found, then we don't build the feature in - as

before - report lib not found etc.

if opencsd, but wrong version - don't build feature in, but output a

version info message - e..g need 0.9.x or later.

...
diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif endif

ifndef NO_LIBELF

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
           decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
           decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
           decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
           decoder->packet_buffer[i].exc = false;
           decoder->packet_buffer[i].exc_ret = false;
           decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
     decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
   decoder->packet_buffer[et].exc = false;
   decoder->packet_buffer[et].exc_ret = false;
   decoder->packet_buffer[et].cpu = *((int *)inode->priv);
   decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
   decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;
Miss to initialize decoder->packet_buffer[et].last_instr_size to 0?
And also last_instr_taken_branch - these fields are only used in cs-etm.c if the packet type is CS_ETM_RANGE - so they will be set by cs_etm_decoder__buffer_range(). However, I will add add initializers for these.

...
...
...
     if (decoder->packet_count == MAX_BUFFER - 1)
             return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
For this case, the program should report failure and directly exit?

Here why not check for A32 instruction case? Before version 0.8.x has supported A32 instructions? I understand A32 and A64 both have 4 bytes instruction width, but I don't know if the old OpenCSD lib supports to decode both of them?
The old version of OpenCSD did support A32 and T32 - samples from A32 trace will work with the current version of perf as the assumption of 4 byte instruction sizes for A64 also applies to A32.

...
...
...
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif
alternate not needed if you decide not to build against insufficient version.
Just curious, if the API ocsd_get_version() has been provided by all OpenCSD versions?

Seems to me, this patch is to keep backwards compability with v0.8.x; I think we can still support perf build with old version OpenCSD, at the runtime the perf cs_etm decoder can get the OpenCSD version number by using API ocsd_get_version() and check if the OpenCSD version is sufficient for decoding current trace data, if not then reports error and directly exit from program.
ocsd_get_version() is provided by all versions of OpenCSD. The features added in 0.9.x to provide the instruction counts add a new member to the struct that describes each range of executed instructions - so the new code doesn't compile if the old library is installed. Calling ocsd_get_version() could be used to check for the unlikely case that perf has been compiled against 0.9.x, but is now being run against an older version - is this worth doing?

I don't think so - if someone is going to do these kind of things they deserve to see the SW explode on them.

...

...
...
...
static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem, @@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
     packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
   packet->start_addr = elem->st_addr;
   packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
   switch (elem->last_i_type) {
   case OCSD_INSTR_BR:
   case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
   return ret;
}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};
struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
   u64 start_addr;
   u64 end_addr;
  u32 instr_count;
   u8 last_instr_taken_branch;
  u8 last_instr_size;
   u8 exc;
   u8 exc_ret;
   int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues;

@@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
    */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
Could use macro to replace 0xF8/0xE8 for more readable?
I don't think adding a macro will make this any more readable - there's a detailed comment about the encoding above.

I agree with Robert. In this case all the information is there to understand what is happening.

...

...
...
...
}

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) @@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction count by dividing.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
  return packet->end_addr - packet->last_instr_size;
}
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                                u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
Should be: offset -= cs_etm__t32_instr_size(etmq, addr)?
No - offset is the number of the instruction we want to find the address of - e.g. the decoder will tell us that the block has 100 instructions and we want to generate a sample on the 57th, so we step 57 instructions through the block.

...
...
...
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
@@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
   etmq->period_instructions += instrs_executed;

   /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);
If you are passing etmq, do you also need to pass etmq->packet?

Mike

...
             ret = cs_etm__synth_instruction_sample(
                     etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Mathieu Poirier

6 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Hi Robert and thanks for this second revision.

On Thu, 27 Sep 2018 at 04:24, Robert Walker robert.walker@arm.com wrote:

...

This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr

I understand what you're doing here but it won't fly with the upstream guy. The new #define in the openCSD library needs to come before this patchset. That way functionality that needs the newest version of the library can be conditionally compiled without this extra step and a new HAVE_CSTRACE_INSTR_INFO.

Also, please run your patches through checkpatch.pl before sending them out.

Regards, Mathieu

...

# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
    test-clang.bin                                \
    test-llvm.bin                         \
    test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2 diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif

endif

ifndef NO_LIBELF diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
          decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
          decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
          decoder->packet_buffer[i].exc = false;
          decoder->packet_buffer[i].exc_ret = false;
          decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
    decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
  decoder->packet_buffer[et].exc = false;
  decoder->packet_buffer[et].exc_ret = false;
  decoder->packet_buffer[et].cpu = *((int *)inode->priv);
  decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;

  if (decoder->packet_count == MAX_BUFFER - 1)
          return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif

static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem, @@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
    packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
  packet->start_addr = elem->st_addr;
  packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
  switch (elem->last_i_type) {
  case OCSD_INSTR_BR:
  case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
  return ret;
}

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};

struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
  u64 start_addr;
  u64 end_addr;
  u32 instr_count;
  u8 last_instr_taken_branch;
  u8 last_instr_size;
  u8 exc;
  u8 exc_ret;
  int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues; @@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
   */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
}

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) @@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction count by dividing.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
  return packet->end_addr - packet->last_instr_size;
}

-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                               u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) @@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
  etmq->period_instructions += instrs_executed;

  /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);

          ret = cs_etm__synth_instruction_sample(
                  etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Robert Walker

7:24 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Hi Mathieu,

On 01/10/18 19:00, Mathieu Poirier wrote:

...

Hi Robert and thanks for this second revision.

On Thu, 27 Sep 2018 at 04:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
I understand what you're doing here but it won't fly with the upstream guy. The new #define in the openCSD library needs to come before this patchset. That way functionality that needs the newest version of the library can be conditionally compiled without this extra step and a new HAVE_CSTRACE_INSTR_INFO.

So just to be clear: * We do a new release of OpenCSD (0.9.3?) that exports some macros describing its version. You previously suggested including of the ocsd_if_version.h header in the OpenCSD install - we can't do a '#include "opencsd/ocsd_if_version.h"' directly from the perf code without breaking compilation when using the older library versions without the header (0.9.2 or older), so maybe opencsd/c_api/opencsd_c_api.h would be a good place to put the macros (or include ocsd_if_version.h from). * We don't need to make any changes to the feature detection in perf * util/cs-etm-decoder/cs-etm-decoder.c can use the version macros to determine whether to set HAVE_CSTRACE_INSTR_INFO within that file and hence whether to use the extra info available from the decoder.

This sounds OK to me - we get rid of the extra overhead of feature checks in the perf makefiles. The only downside is that it requires a new version of OpenCSD, so anyone with 0.9.x will have to update to enable the feature even though their version already has the necessary support. But if it makes adding new features easier in future then it's worth doing.

...

Also, please run your patches through checkpatch.pl before sending them out.

Sorry - I think there were a few long lines left to fix - will be fixed in the next version.

Regards

Rob

...

Regards, Mathieu

...
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
     test-clang.bin                                \
     test-llvm.bin                         \
     test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif endif

ifndef NO_LIBELF

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
           decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
           decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
           decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
           decoder->packet_buffer[i].exc = false;
           decoder->packet_buffer[i].exc_ret = false;
           decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
     decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
   decoder->packet_buffer[et].exc = false;
   decoder->packet_buffer[et].exc_ret = false;
   decoder->packet_buffer[et].cpu = *((int *)inode->priv);
   decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
   decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;

   if (decoder->packet_count == MAX_BUFFER - 1)
           return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif

static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem,

@@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
     packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
   packet->start_addr = elem->st_addr;
   packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
   switch (elem->last_i_type) {
   case OCSD_INSTR_BR:
   case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
   return ret;
}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};
struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
   u64 start_addr;
   u64 end_addr;
  u32 instr_count;
   u8 last_instr_taken_branch;
  u8 last_instr_size;
   u8 exc;
   u8 exc_ret;
   int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues;

@@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
    */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
}

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
@@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction count by dividing.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
  return packet->end_addr - packet->last_instr_size;
}
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                                u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
@@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
   etmq->period_instructions += instrs_executed;

   /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);

           ret = cs_etm__synth_instruction_sample(
                   etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Mathieu Poirier

8:17 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Mon, 1 Oct 2018 at 13:24, Robert Walker robert.walker@arm.com wrote:

...

Hi Mathieu,

On 01/10/18 19:00, Mathieu Poirier wrote:

...
Hi Robert and thanks for this second revision.

On Thu, 27 Sep 2018 at 04:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
I understand what you're doing here but it won't fly with the upstream guy. The new #define in the openCSD library needs to come before this patchset. That way functionality that needs the newest version of the library can be conditionally compiled without this extra step and a new HAVE_CSTRACE_INSTR_INFO.
So just to be clear:

We do a new release of OpenCSD (0.9.3?) that exports some macros

describing its version. You previously suggested including of the ocsd_if_version.h header in the OpenCSD install - we can't do a '#include "opencsd/ocsd_if_version.h"' directly from the perf code without breaking compilation when using the older library versions without the header (0.9.2 or older), so maybe opencsd/c_api/opencsd_c_api.h would be a good place to put the macros (or include ocsd_if_version.h from).

Yes, that makes sense to me.

...

We don't need to make any changes to the feature detection in perf

Correct.

...

util/cs-etm-decoder/cs-etm-decoder.c can use the version macros to

determine whether to set HAVE_CSTRACE_INSTR_INFO within that file and hence whether to use the extra info available from the decoder.

I was hoping for something like this instead of a new define:

# if OCSD_VER_MINOR >=9 && OCSD_VER_PATCH > 2 ....

#else ....

#endif

That way we know exactly what code belongs to what library version. Note that this will require to move things around in the library a little.

...

This sounds OK to me - we get rid of the extra overhead of feature checks in the perf makefiles. The only downside is that it requires a new version of OpenCSD, so anyone with 0.9.x will have to update to enable the feature even though their version already has the necessary support. But if it makes adding new features easier in future then it's worth doing.

Yes, that's what I'm thinking.

...

...
Also, please run your patches through checkpatch.pl before sending them out.

Sorry - I think there were a few long lines left to fix - will be fixed in the next version.

We have a deal.

...

Regards

Rob

...
Regards, Mathieu

...
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
     test-clang.bin                                \
     test-llvm.bin                         \
     test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif endif

ifndef NO_LIBELF

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
           decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
           decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
           decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
           decoder->packet_buffer[i].exc = false;
           decoder->packet_buffer[i].exc_ret = false;
           decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
     decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
   decoder->packet_buffer[et].exc = false;
   decoder->packet_buffer[et].exc_ret = false;
   decoder->packet_buffer[et].cpu = *((int *)inode->priv);
   decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
   decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;

   if (decoder->packet_count == MAX_BUFFER - 1)
           return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif

static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem,

@@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
     packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
   packet->start_addr = elem->st_addr;
   packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
   switch (elem->last_i_type) {
   case OCSD_INSTR_BR:
   case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
   return ret;
}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};
struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
   u64 start_addr;
   u64 end_addr;
  u32 instr_count;
   u8 last_instr_taken_branch;
  u8 last_instr_size;
   u8 exc;
   u8 exc_ret;
   int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues;

@@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
    */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
}

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
@@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction count by dividing.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
  return packet->end_addr - packet->last_instr_size;
}
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                                u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
@@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
   etmq->period_instructions += instrs_executed;

   /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);

           ret = cs_etm__synth_instruction_sample(
                   etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Mike Leach

2 Oct 2 Oct

9:17 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Mon, 1 Oct 2018 at 21:17, Mathieu Poirier mathieu.poirier@linaro.org wrote:

...

On Mon, 1 Oct 2018 at 13:24, Robert Walker robert.walker@arm.com wrote:

...
Hi Mathieu,

On 01/10/18 19:00, Mathieu Poirier wrote:

...
Hi Robert and thanks for this second revision.

On Thu, 27 Sep 2018 at 04:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
I understand what you're doing here but it won't fly with the upstream guy. The new #define in the openCSD library needs to come before this patchset. That way functionality that needs the newest version of the library can be conditionally compiled without this extra step and a new HAVE_CSTRACE_INSTR_INFO.

I note the following from the perf feature makefile....

ifndef NO_LIBDW_DWARF_UNWIND ifneq ($(feature-libdw-dwarf-unwind),1) NO_LIBDW_DWARF_UNWIND := 1 msg := $(warning No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR); endif endif ifneq ($(feature-dwarf), 1) ifndef NO_DWARF msg := $(warning No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev); NO_DWARF := 1 endif else ifneq ($(feature-dwarf_getlocations), 1) msg := $(warning Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157); else CFLAGS += -DHAVE_DWARF_GETLOCATIONS_SUPPORT

Looks a lot like what we are currently attempting for opencsd - two test bins to check for lib + add features + warnings about versioning - and disable support on insufficient version.

...

...
So just to be clear:

We do a new release of OpenCSD (0.9.3?) that exports some macros

describing its version. You previously suggested including of the ocsd_if_version.h header in the OpenCSD install - we can't do a '#include "opencsd/ocsd_if_version.h"' directly from the perf code without breaking compilation when using the older library versions without the header (0.9.2 or older), so maybe opencsd/c_api/opencsd_c_api.h would be a good place to put the macros (or include ocsd_if_version.h from).

Yes, that makes sense to me.

I am currently looking at a new opencsd release to fix an issue with snapshots and dumpfile 'offset' parrameters so I can move the version file as part of this.

...

...

We don't need to make any changes to the feature detection in perf

Correct.

...

util/cs-etm-decoder/cs-etm-decoder.c can use the version macros to

determine whether to set HAVE_CSTRACE_INSTR_INFO within that file and hence whether to use the extra info available from the decoder.

I was hoping for something like this instead of a new define:

# if OCSD_VER_MINOR >=9 && OCSD_VER_PATCH > 2 ....

theres a combine macro in the version file so something along the lines of

#if OCSD_VER_NUM >= 0x000903

remember VER_MINOR & VER_PATCH will reset to 0 when VER_MAJOR increments!

Regards

Mike

...

#else ....

#endif

That way we know exactly what code belongs to what library version. Note that this will require to move things around in the library a little.

...
This sounds OK to me - we get rid of the extra overhead of feature checks in the perf makefiles. The only downside is that it requires a new version of OpenCSD, so anyone with 0.9.x will have to update to enable the feature even though their version already has the necessary support. But if it makes adding new features easier in future then it's worth doing.

Yes, that's what I'm thinking.

...
...
Also, please run your patches through checkpatch.pl before sending them out.

Sorry - I think there were a few long lines left to fix - will be fixed in the next version.

We have a deal.

...
Regards

Rob

...
Regards, Mathieu

...
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list # of all feature tests diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 0516259..2cb7ef6 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -56,6 +56,7 @@ FILES= \ test-sched_getcpu.bin \ test-setns.bin \ test-libopencsd.bin \
    test-libopencsd-numinstr.bin          \
     test-clang.bin                                \
     test-llvm.bin                         \
     test-llvm-version.bin
@@ -114,6 +115,9 @@ $(OUTPUT)test-libopencsd.bin: $(BUILD) # -lopencsd_c_api -lopencsd provided by # $(FEATURE_CHECK_LDFLAGS-libopencsd)

+$(OUTPUT)test-libopencsd-numinstr.bin:
  $(BUILD) -lopencsd_c_api -lopencsd
DWARFLIBS := -ldw ifeq ($(findstring -static,${LDFLAGS}),-static) DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
diff --git a/tools/build/feature/test-libopencsd-numinstr.c b/tools/build/feature/test-libopencsd-numinstr.c new file mode 100644 index 0000000..606de62 --- /dev/null +++ b/tools/build/feature/test-libopencsd-numinstr.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <opencsd/c_api/opencsd_c_api.h>

+int main(void) +{
  /*
   * Requires ocsd_generic_trace_elem.num_instr_range introduced in
   * OpenCSD 0.9.0
   */
  ocsd_generic_trace_elem elem;
  (void)elem.num_instr_range;
  (void)ocsd_get_version();
  return 0;
+} diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index f6d1a03..b0ba3ca 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -383,6 +383,9 @@ ifndef NO_CORESIGHT endif endif endif

ifeq ($(feature-libopencsd-numinstr), 1)

CFLAGS += -DHAVE_CSTRACE_INSTR_INFO

endif endif

ifndef NO_LIBELF

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..260f3b3a 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -17,6 +17,7 @@ #include "cs-etm-decoder.h" #include "intlist.h" #include "util.h" +#include "asm/bug.h"

#define MAX_BUFFER 1024

@@ -263,9 +264,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
           decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
           decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
           decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
           decoder->packet_buffer[i].exc = false;
           decoder->packet_buffer[i].exc_ret = false;
           decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +298,13 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
     decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
   decoder->packet_buffer[et].exc = false;
   decoder->packet_buffer[et].exc_ret = false;
   decoder->packet_buffer[et].cpu = *((int *)inode->priv);
   decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
   decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;

   if (decoder->packet_count == MAX_BUFFER - 1)
           return OCSD_RESP_WAIT;
@@ -306,6 +312,36 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, return OCSD_RESP_CONT; }

+#ifdef HAVE_CSTRACE_INSTR_INFO

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  return elem->num_instr_range;
+}

+static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +{
  return elem->last_instr_sz;
+}

+#else

+static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +{
  /* Assume a 4-byte instruction size - will be wrong for T32 */
  WARN_ONCE(elem->isa == ocsd_isa_thumb2,
            "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n");
  return (elem->en_addr - elem->st_addr) / 4;
+}

+static +int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +{
  return 4;
+}

+#endif

static ocsd_datapath_resp_t cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, const ocsd_generic_trace_elem *elem,

@@ -321,8 +357,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
     packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
   packet->start_addr = elem->st_addr;
   packet->end_addr = elem->en_addr;
  packet->instr_count = cs_etm_decoder__instr_count(elem);
   switch (elem->last_i_type) {
   case OCSD_INSTR_BR:
   case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +392,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = cs_etm_decoder__last_instr_size(elem);
   return ret;
}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};
struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
   u64 start_addr;
   u64 end_addr;
  u32 instr_count;
   u8 last_instr_taken_branch;
  u8 last_instr_size;
   u8 exc;
   u8 exc_ret;
   int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues;

@@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
    */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
}

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
@@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction count by dividing.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
  return packet->end_addr - packet->last_instr_size;
}
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                                u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
@@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
   etmq->period_instructions += instrs_executed;

   /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);

           ret = cs_etm__synth_instruction_sample(
                   etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK

Mathieu Poirier

3 Oct 3 Oct

5:50 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Hi Robert,

On Mon, 1 Oct 2018 at 14:17, Mathieu Poirier mathieu.poirier@linaro.org wrote:

...

On Mon, 1 Oct 2018 at 13:24, Robert Walker robert.walker@arm.com wrote:

...
Hi Mathieu,

On 01/10/18 19:00, Mathieu Poirier wrote:

...
Hi Robert and thanks for this second revision.

On Thu, 27 Sep 2018 at 04:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
I understand what you're doing here but it won't fly with the upstream guy. The new #define in the openCSD library needs to come before this patchset. That way functionality that needs the newest version of the library can be conditionally compiled without this extra step and a new HAVE_CSTRACE_INSTR_INFO.
So just to be clear:

We do a new release of OpenCSD (0.9.3?) that exports some macros

describing its version. You previously suggested including of the ocsd_if_version.h header in the OpenCSD install - we can't do a '#include "opencsd/ocsd_if_version.h"' directly from the perf code without breaking compilation when using the older library versions without the header (0.9.2 or older), so maybe opencsd/c_api/opencsd_c_api.h would be a good place to put the macros (or include ocsd_if_version.h from).
Yes, that makes sense to me.

...

We don't need to make any changes to the feature detection in perf

Correct.

...

util/cs-etm-decoder/cs-etm-decoder.c can use the version macros to

determine whether to set HAVE_CSTRACE_INSTR_INFO within that file and hence whether to use the extra info available from the decoder.

I was hoping for something like this instead of a new define:

# if OCSD_VER_MINOR >=9 && OCSD_VER_PATCH > 2 ....

#else ....

#endif

In light of more discussion on IRC and Leo Yan's work on perf sampling, the consensus is that too many features are still being added to the openCSD library to be concerned with backward compatibility. We can revisit that line of thought at a later time but for the moment asking the perf tools be compiled with the latest code should be the way to go.

As such your original patchset was close to the final solution. Just checking for the right library version in test-libopencsd.c should do the trick.

Apologies for the change in direction, doing the right thing isn't always an exact science.

Regards, Mathieu

Mike Leach

9:20 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Version macros now in header file hierarchy from released v0.9.3

Mike On Wed, 3 Oct 2018 at 18:51, Mathieu Poirier mathieu.poirier@linaro.org wrote:

...

Hi Robert,

On Mon, 1 Oct 2018 at 14:17, Mathieu Poirier mathieu.poirier@linaro.org wrote:

...
On Mon, 1 Oct 2018 at 13:24, Robert Walker robert.walker@arm.com wrote:

...
Hi Mathieu,

On 01/10/18 19:00, Mathieu Poirier wrote:

...
Hi Robert and thanks for this second revision.

On Thu, 27 Sep 2018 at 04:24, Robert Walker robert.walker@arm.com wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.0 of OpenCSD. A check for the new version member has been added to the feature check for OpenCSD. Where only the older version of OpenCSD is available, the previous behaviour of assuming 4 byte instruction size is used.

Signed-off-by: Robert Walker robert.walker@arm.com

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 ++ tools/build/feature/test-libopencsd-numinstr.c | 15 ++++++ tools/perf/Makefile.config | 3 ++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 58 ++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 7 files changed, 123 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index f216b2f..5e8d108 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -68,7 +68,8 @@ FEATURE_TESTS_BASIC := \ sched_getcpu \ sdt \ setns \
   libopencsd
   libopencsd                     \
   libopencsd-numinstr
I understand what you're doing here but it won't fly with the upstream guy. The new #define in the openCSD library needs to come before this patchset. That way functionality that needs the newest version of the library can be conditionally compiled without this extra step and a new HAVE_CSTRACE_INSTR_INFO.
So just to be clear:

We do a new release of OpenCSD (0.9.3?) that exports some macros

describing its version. You previously suggested including of the ocsd_if_version.h header in the OpenCSD install - we can't do a '#include "opencsd/ocsd_if_version.h"' directly from the perf code without breaking compilation when using the older library versions without the header (0.9.2 or older), so maybe opencsd/c_api/opencsd_c_api.h would be a good place to put the macros (or include ocsd_if_version.h from).
Yes, that makes sense to me.

...

We don't need to make any changes to the feature detection in perf

Correct.

...

util/cs-etm-decoder/cs-etm-decoder.c can use the version macros to

determine whether to set HAVE_CSTRACE_INSTR_INFO within that file and hence whether to use the extra info available from the decoder.

I was hoping for something like this instead of a new define:

# if OCSD_VER_MINOR >=9 && OCSD_VER_PATCH > 2 ....

#else ....

#endif
In light of more discussion on IRC and Leo Yan's work on perf sampling, the consensus is that too many features are still being added to the openCSD library to be concerned with backward compatibility. We can revisit that line of thought at a later time but for the moment asking the perf tools be compiled with the latest code should be the way to go.

As such your original patchset was close to the final solution. Just checking for the right library version in test-libopencsd.c should do the trick.

Apologies for the change in direction, doing the right thing isn't always an exact science.

Regards, Mathieu _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK

Robert Walker

8 Oct 8 Oct

11:16 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

Signed-off-by: Robert Walker robert.walker@arm.com ---

Hi,

I think we need another release of OpenCSD with a fix for the encoding of the OCSD_VER_NUM macro (it had casts to uint32_t which can't be used in the pre-processor). Mike's already working on a release to add support for the function call / return information, so I'll wait until that's released before submitting this patch (with the correct OpenCSD version) to the perf maintainers.

Regards

Rob

tools/build/feature/test-libopencsd.c | 8 +++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 29 ++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 4 files changed, 78 insertions(+), 40 deletions(-)

diff --git a/tools/build/feature/test-libopencsd.c b/tools/build/feature/test-libopencsd.c index 5ff1246..8418a52 100644 --- a/tools/build/feature/test-libopencsd.c +++ b/tools/build/feature/test-libopencsd.c @@ -1,6 +1,14 @@ // SPDX-License-Identifier: GPL-2.0 #include <opencsd/c_api/opencsd_c_api.h>

+/* + * Check OpenCSD library version is sufficient to provide required features + */ +#define OCSD_MIN_VER ((0 << 16) | (9 << 8) | (4)) +#if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER) +#error "OpenCSD >= 0.9.4 is required" +#endif + int main(void) { (void)ocsd_get_version(); diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..5efb616 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -263,9 +263,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) { + decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN; decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR; decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR; + decoder->packet_buffer[i].instr_count = 0; decoder->packet_buffer[i].last_instr_taken_branch = false; + decoder->packet_buffer[i].last_instr_size = 0; decoder->packet_buffer[i].exc = false; decoder->packet_buffer[i].exc_ret = false; decoder->packet_buffer[i].cpu = INT_MIN; @@ -294,11 +297,15 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;

if (decoder->packet_count == MAX_BUFFER - 1) return OCSD_RESP_WAIT; @@ -321,8 +328,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,

packet = &decoder->packet_buffer[decoder->tail];

+ switch (elem->isa) { + case ocsd_isa_aarch64: + packet->isa = CS_ETM_ISA_A64; + break; + case ocsd_isa_arm: + packet->isa = CS_ETM_ISA_A32; + break; + case ocsd_isa_thumb2: + packet->isa = CS_ETM_ISA_T32; + break; + case ocsd_isa_tee: + case ocsd_isa_jazelle: + case ocsd_isa_custom: + case ocsd_isa_unknown: + default: + packet->isa = CS_ETM_ISA_UNKNOWN; + } + packet->start_addr = elem->st_addr; packet->end_addr = elem->en_addr; + packet->instr_count = elem->num_instr_range; + switch (elem->last_i_type) { case OCSD_INSTR_BR: case OCSD_INSTR_BR_INDIRECT: @@ -336,6 +363,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }

+ packet->last_instr_size = elem->last_instr_sz; + return ret; }

#define MAX_TIMESTAMP (~0ULL)

- instrs_executed = cs_etm__instr_count(etmq->packet); etmq->period_instructions += instrs_executed;

ret = cs_etm__synth_instruction_sample( etmq, addr, etm->instructions_sample_period);

-- 2.7.4

Mathieu Poirier

9 Oct 9 Oct

4:34 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Mon, 8 Oct 2018 at 05:16, Robert Walker robert.walker@arm.com wrote:

...

This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.4 of OpenCSD. A check for the new struct member has been added to the feature check for OpenCSD.

Signed-off-by: Robert Walker robert.walker@arm.com

Hi,

I think we need another release of OpenCSD with a fix for the encoding of the OCSD_VER_NUM macro (it had casts to uint32_t which can't be used in the pre-processor). Mike's already working on a release to add support for the function call / return information, so I'll wait until that's released before submitting this patch (with the correct OpenCSD version) to the perf maintainers.

Very well - I will wait to see the patches on the mailing list before reviewing.

...

Regards

Rob

tools/build/feature/test-libopencsd.c | 8 +++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 29 ++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++ tools/perf/util/cs-etm.c | 71 +++++++++++-------------- 4 files changed, 78 insertions(+), 40 deletions(-)

diff --git a/tools/build/feature/test-libopencsd.c b/tools/build/feature/test-libopencsd.c index 5ff1246..8418a52 100644 --- a/tools/build/feature/test-libopencsd.c +++ b/tools/build/feature/test-libopencsd.c @@ -1,6 +1,14 @@ // SPDX-License-Identifier: GPL-2.0 #include <opencsd/c_api/opencsd_c_api.h>

+/*

Check OpenCSD library version is sufficient to provide required features

*/

+#define OCSD_MIN_VER ((0 << 16) | (9 << 8) | (4)) +#if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER) +#error "OpenCSD >= 0.9.4 is required" +#endif

int main(void) { (void)ocsd_get_version(); diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 938def6..5efb616 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -263,9 +263,12 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder) decoder->tail = 0; decoder->packet_count = 0; for (i = 0; i < MAX_BUFFER; i++) {
          decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
          decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
          decoder->packet_buffer[i].instr_count = 0;
          decoder->packet_buffer[i].last_instr_taken_branch = false;
          decoder->packet_buffer[i].last_instr_size = 0;
          decoder->packet_buffer[i].exc = false;
          decoder->packet_buffer[i].exc_ret = false;
          decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +297,15 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder, decoder->packet_count++;
    decoder->packet_buffer[et].sample_type = sample_type;
  decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
  decoder->packet_buffer[et].exc = false;
  decoder->packet_buffer[et].exc_ret = false;
  decoder->packet_buffer[et].cpu = *((int *)inode->priv);
  decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
  decoder->packet_buffer[et].instr_count = 0;
  decoder->packet_buffer[et].last_instr_taken_branch = false;
  decoder->packet_buffer[et].last_instr_size = 0;

  if (decoder->packet_count == MAX_BUFFER - 1)
          return OCSD_RESP_WAIT;
@@ -321,8 +328,28 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
    packet = &decoder->packet_buffer[decoder->tail];
  switch (elem->isa) {
  case ocsd_isa_aarch64:
          packet->isa = CS_ETM_ISA_A64;
          break;
  case ocsd_isa_arm:
          packet->isa = CS_ETM_ISA_A32;
          break;
  case ocsd_isa_thumb2:
          packet->isa = CS_ETM_ISA_T32;
          break;
  case ocsd_isa_tee:
  case ocsd_isa_jazelle:
  case ocsd_isa_custom:
  case ocsd_isa_unknown:
  default:
          packet->isa = CS_ETM_ISA_UNKNOWN;
  }
  packet->start_addr = elem->st_addr;
  packet->end_addr = elem->en_addr;
  packet->instr_count = elem->num_instr_range;
  switch (elem->last_i_type) {
  case OCSD_INSTR_BR:
  case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +363,8 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }
  packet->last_instr_size = elem->last_instr_sz;
  return ret;
}

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h index 612b575..9351bd1 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h @@ -28,11 +28,21 @@ enum cs_etm_sample_type { CS_ETM_TRACE_ON = 1 << 1, };

+enum cs_etm_isa {
  CS_ETM_ISA_UNKNOWN,
  CS_ETM_ISA_A64,
  CS_ETM_ISA_A32,
  CS_ETM_ISA_T32,
+};

struct cs_etm_packet { enum cs_etm_sample_type sample_type;
  enum cs_etm_isa isa;
  u64 start_addr;
  u64 end_addr;
  u32 instr_count;
  u8 last_instr_taken_branch;
  u8 last_instr_size;
  u8 exc;
  u8 exc_ret;
  int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 2ae6402..fcaa73f 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -31,14 +31,6 @@

#define MAX_TIMESTAMP (~0ULL)

-/*

A64 instructions are always 4 bytes

Only A64 is supported, so can use this constant for converting between

addresses and instruction counts, calculting offsets etc

*/

-#define A64_INSTR_SIZE 4

struct cs_etm_auxtrace { struct auxtrace auxtrace; struct auxtrace_queues queues; @@ -492,21 +484,16 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq) etmq->last_branch_rb->nr = 0; }

-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet) -{
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
                                   u64 addr) {
  u8 instrBytes[2];
  /*
   * The packet records the execution range with an exclusive end address
   *
   * A64 instructions are constant size, so the last executed
   * instruction is A64_INSTR_SIZE before the end address
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
  cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
  /* T32 instruction size is indicated by bits[15:11] of the first
   * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
   * denote a 32-bit instruction.
   */
  return packet->end_addr - A64_INSTR_SIZE;
  return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
}

static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) @@ -518,27 +505,32 @@ static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet) return packet->start_addr; }

-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet) +static inline +u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) {
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction count by dividing.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
  /* Returns 0 for the CS_ETM_TRACE_ON packet */
  if (packet->sample_type == CS_ETM_TRACE_ON)
          return 0;
  return packet->end_addr - packet->last_instr_size;
}

-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet, +static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
                               const struct cs_etm_packet *packet,
                               u64 offset)
{
  /*
   * Only A64 instructions are currently supported, so can get
   * instruction address by muliplying.
   * Will need to do instruction level decode for T32 instructions as
   * they can be variable size (not yet supported).
   */
  return packet->start_addr + offset * A64_INSTR_SIZE;
  if (packet->isa == CS_ETM_ISA_T32) {
          u64 addr = packet->start_addr;
          while (offset > 0) {
                  addr += cs_etm__t32_instr_size(etmq, addr);
                  offset--;
          }
          return addr;
  }
  /* Assume a 4 byte instruction size (A32/A64) */
  return packet->start_addr + offset * 4;
}

static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq) @@ -867,9 +859,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) struct cs_etm_auxtrace *etm = etmq->etm; struct cs_etm_packet *tmp; int ret;
  u64 instrs_executed;
  u64 instrs_executed = etmq->packet->instr_count;
  instrs_executed = cs_etm__instr_count(etmq->packet);
  etmq->period_instructions += instrs_executed;

  /*
@@ -899,7 +890,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq) * executed, but PC has not advanced to next instruction) */ u64 offset = (instrs_executed - instrs_over - 1);
          u64 addr = cs_etm__instr_addr(etmq->packet, offset);
          u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);

          ret = cs_etm__synth_instruction_sample(
                  etmq, addr, etm->instructions_sample_period);
-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

leo.yan＠linaro.org

19 Oct 19 Oct

6:56 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Hi Rob, Mathieu,

On Mon, Oct 08, 2018 at 12:16:02PM +0100, Robert Walker wrote:

...

This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.4 of OpenCSD. A check for the new struct member has been added to the feature check for OpenCSD.

As I mentioned in another offline email, OpenCSD seems will have no version 0.9.4 and will directly jump to 0.1.0 version.

I did several verifications with this patch and with Mike's OpenCSD patches [1]:

- Tested with A64 binary, the decoding is same between with and without this patch; - Passed - Tested with A32/T32 instructions with static linkage; - Passed - Tested with A32/T32 instructions with dynamic linkage;

For this case you could see below minor issue I observed:

--- Disassembly --- 00000440 printf@plt: 440: e28fc600 add ip, pc, #0, 12 444: e28cca10 add ip, ip, #16, 20 ; 0x10000 448: e5bcfbc8 ldr pc, [ip, #3016]! ; 0xbc8

--- Decoding --- main 8451 1 branches: f7c0e414 coresight_test1+0x2c (/root/coresight_test/libcstest.so) => 41159e main+0x12 (/root/coresight_test/main) main 8451 1 branches: 4115a8 main+0x1c (/root/coresight_test/main) => 411440 printf@plt+0x0 (/root/coresight_test/main) main 8451 1 branches: 411442 printf@plt+0x2 (/root/coresight_test/main) => f7b4e21c printf+0x0 (/usr/lib/arm-linux-gnueabihf/libc-2.27.so)

So you could see the decoding says it will jump to "411440 printf@plt+0x0", this is right if we connect with printf@plt disassembly code, but in the next decoding log it says the branch instruction is at "411442 printf@plt+0x2", this is not correct. We should expect the branch is taken at "448" rather than "442".

I also enclosed decoding and disassembly logs for your checking, please let me know if need me to provide more detailed info for testing case and testing commands for this.

[...]

Thanks, Leo Yan

[1] https://lists.linaro.org/pipermail/coresight/2018-October/001835.html

Robert Walker

9:44 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On 19/10/2018 07:56, leo.yan@linaro.org wrote:

...

Hi Rob, Mathieu,

On Mon, Oct 08, 2018 at 12:16:02PM +0100, Robert Walker wrote:

...
This patch adds support for generating instruction samples from trace of AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between addresses and instruction counts requires extra information from the trace decoder, requiring version 0.9.4 of OpenCSD. A check for the new struct member has been added to the feature check for OpenCSD.

As I mentioned in another offline email, OpenCSD seems will have no version 0.9.4 and will directly jump to 0.1.0 version.

I used 0.9.4 as a temporary version as I didn't know what the next OpenCSD version would be. I'll change this to 0.10.0 when the patch is posted to the perf maintainers (after OpenCSD 0.10.0 is released).

...

I did several verifications with this patch and with Mike's OpenCSD patches [1]:

Tested with A64 binary, the decoding is same between with and without this patch; - Passed

Tested with A32/T32 instructions with static linkage; - Passed

Tested with A32/T32 instructions with dynamic linkage; For this case you could see below minor issue I observed:

--- Disassembly --- 00000440 printf@plt: 440: e28fc600 add ip, pc, #0, 12 444: e28cca10 add ip, ip, #16, 20 ; 0x10000 448: e5bcfbc8 ldr pc, [ip, #3016]! ; 0xbc8

--- Decoding --- main 8451 1 branches: f7c0e414 coresight_test1+0x2c (/root/coresight_test/libcstest.so) => 41159e main+0x12 (/root/coresight_test/main) main 8451 1 branches: 4115a8 main+0x1c (/root/coresight_test/main) => 411440 printf@plt+0x0 (/root/coresight_test/main) main 8451 1 branches: 411442 printf@plt+0x2 (/root/coresight_test/main) => f7b4e21c printf+0x0 (/usr/lib/arm-linux-gnueabihf/libc-2.27.so)

So you could see the decoding says it will jump to "411440 printf@plt+0x0", this is right if we connect with printf@plt disassembly code, but in the next decoding log it says the branch instruction is at "411442 printf@plt+0x2", this is not correct. We should expect the branch is taken at "448" rather than "442".

I also enclosed decoding and disassembly logs for your checking, please let me know if need me to provide more detailed info for testing case and testing commands for this.

[...]

I assume this is a simple printf("Hello world") program - I've been able to reproduce the same issue with a similar program (so the addresses are different):

Decode:

hello 3859 1 branches:u: 10428 __libc_csu_init+0x30 (/home/root/hello) => 103d4 main+0x0 (/home/root/ hello 3859 1 branches:u: 103e8 main+0x14 (/home/root/hello) => 102e4 printf@plt+0x0 (/home/root/hello hello 3859 1 branches:u: 102e6 printf@plt+0x2 (/home/root/hello) => 102d0 _init+0xc (/home/root/hello hello 3859 1 branches:u: 102dc _init+0x18 (/home/root/hello) => 0 [unknown] ([unknown])

Disassembly:

000102d0 printf@plt-0x14: 102d0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 102d4: e59fe004 ldr lr, [pc, #4] ; 102e0 <_init+0x1c> 102d8: e08fe00e add lr, pc, lr 102dc: e5bef008 ldr pc, [lr, #8]! 102e0: 00010d20 .word 0x00010d20

000102e4 printf@plt: 102e4: e28fc600 add ip, pc, #0, 12 102e8: e28cca10 add ip, ip, #16, 20 ; 0x10000 102ec: e5bcfd20 ldr pc, [ip, #3360]! ; 0xd20

If I log the packets emitted from the OpenCSD decoder we get this:

CS_ETM_RANGE: [0xf747487c-0xf7474880] CS_ETM_RANGE: [0xf7474880-0xf74748aa] br CS_ETM_RANGE: [0x103d4-0x103ec] br <<---- entry to main() up to call to printf() CS_ETM_RANGE: [0x102e4-0x102e8] br <<---- execution of printf@plt here CS_ETM_RANGE: [0x102d0-0x102e0] br <<---- execution of printf@plt-0x14, jumping to actual printf CS_ETM_RANGE: [0xf7559be8-0xf7559c00] br CS_ETM_RANGE: [0xf7555024-0xf7555028]

So the reported samples match the output from the decoder: It would appear to have interpreted the instruction at 0x00102e4 as a branch. Either this is an error in the decoder, or it's some effect of the dynamic linking (the PLT wrapper functions may patched with the actual address of the printf function) - but I think the decode is based on the unpatched code from the image files.

Regards

Rob

Al Grant

10:10 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

...

Decode:
        hello  3859          1 branches:u:             10428
__libc_csu_init+0x30 (/home/root/hello) => 103d4 main+0x0 (/home/root/ hello 3859 1 branches:u: 103e8 main+0x14 (/home/root/hello) => 102e4 printf@plt+0x0 (/home/root/hello hello 3859 1 branches:u: 102e6 printf@plt+0x2 (/home/root/hello) => 102d0 _init+0xc (/home/root/hello hello 3859 1 branches:u: 102dc _init+0x18 (/home/root/hello) => 0 [unknown] ([unknown])

Disassembly:

000102d0 printf@plt-0x14: 102d0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 102d4: e59fe004 ldr lr, [pc, #4] ; 102e0 <_init+0x1c> 102d8: e08fe00e add lr, pc, lr 102dc: e5bef008 ldr pc, [lr, #8]! 102e0: 00010d20 .word 0x00010d20

000102e4 printf@plt: 102e4: e28fc600 add ip, pc, #0, 12 102e8: e28cca10 add ip, ip, #16, 20 ; 0x10000 102ec: e5bcfd20 ldr pc, [ip, #3360]! ; 0xd20

If I log the packets emitted from the OpenCSD decoder we get this:
     CS_ETM_RANGE: [0xf747487c-0xf7474880]
     CS_ETM_RANGE: [0xf7474880-0xf74748aa] br
     CS_ETM_RANGE: [0x103d4-0x103ec] br  <<---- entry to main() up to call to
printf() CS_ETM_RANGE: [0x102e4-0x102e8] br <<---- execution of printf@plt here CS_ETM_RANGE: [0x102d0-0x102e0] br <<---- execution of printf@plt- 0x14, jumping to actual printf CS_ETM_RANGE: [0xf7559be8-0xf7559c00] br CS_ETM_RANGE: [0xf7555024-0xf7555028]

So the reported samples match the output from the decoder: It would appear to have interpreted the instruction at 0x00102e4 as a branch. Either this is an error in the decoder, or it's some effect of the dynamic linking (the PLT wrapper functions may patched with the actual address of the printf function) - but I think the decode is based on the unpatched code from the image files.

It looks like it might not have spotted that this is a T32 BLX to A32 code in the PLT, and is decoding

...

102e4:       e28fc600        add     ip, pc, #0, 12

as if it was T32. This decodes to a random 16-bit STM (0xc600) at 102e4, followed by a random 16-bit direct branch (0xe28f) at 102e6. The decoder tells you that 102e6 is the branch origin:

...

        hello  3859          1 branches:u:             102e6
printf@plt+0x2 (/home/root/hello) => 102d0 _init+0xc (/home/root/hello

and ends the ETM range at 102e8:

...

     CS_ETM_RANGE: [0x102e4-0x102e8] br  <<---- execution of printf@plt

here

The LDR PC (in the actual PLT code) will emit an ETM branch address packet which gets the decoder back on track. It just gets the wrong waypoint address for the origin of the branch.

A32/T32 state changes that can be statically inferred from the instructions (e.g. BLX) aren't specially indicated in the ETM packets, you just get an E atom. The decoder has to deduce the state change from the code.

...

Regards

Rob

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

leo.yan＠linaro.org

20 Oct 20 Oct

7:42 a.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

On Fri, Oct 19, 2018 at 10:10:49AM +0000, Al Grant wrote:

...

...
Decode:
        hello  3859          1 branches:u:             10428
__libc_csu_init+0x30 (/home/root/hello) => 103d4 main+0x0 (/home/root/ hello 3859 1 branches:u: 103e8 main+0x14 (/home/root/hello) => 102e4 printf@plt+0x0 (/home/root/hello hello 3859 1 branches:u: 102e6 printf@plt+0x2 (/home/root/hello) => 102d0 _init+0xc (/home/root/hello hello 3859 1 branches:u: 102dc _init+0x18 (/home/root/hello) => 0 [unknown] ([unknown])

Disassembly:

000102d0 printf@plt-0x14: 102d0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 102d4: e59fe004 ldr lr, [pc, #4] ; 102e0 <_init+0x1c> 102d8: e08fe00e add lr, pc, lr 102dc: e5bef008 ldr pc, [lr, #8]! 102e0: 00010d20 .word 0x00010d20

000102e4 printf@plt: 102e4: e28fc600 add ip, pc, #0, 12 102e8: e28cca10 add ip, ip, #16, 20 ; 0x10000 102ec: e5bcfd20 ldr pc, [ip, #3360]! ; 0xd20

If I log the packets emitted from the OpenCSD decoder we get this:
     CS_ETM_RANGE: [0xf747487c-0xf7474880]
     CS_ETM_RANGE: [0xf7474880-0xf74748aa] br
     CS_ETM_RANGE: [0x103d4-0x103ec] br  <<---- entry to main() up to call to
printf() CS_ETM_RANGE: [0x102e4-0x102e8] br <<---- execution of printf@plt here CS_ETM_RANGE: [0x102d0-0x102e0] br <<---- execution of printf@plt- 0x14, jumping to actual printf CS_ETM_RANGE: [0xf7559be8-0xf7559c00] br CS_ETM_RANGE: [0xf7555024-0xf7555028]

So the reported samples match the output from the decoder: It would appear to have interpreted the instruction at 0x00102e4 as a branch. Either this is an error in the decoder, or it's some effect of the dynamic linking (the PLT wrapper functions may patched with the actual address of the printf function) - but I think

I don't familiar with PLT and GOT, after took a bit time to google related info and I think we should assume PLT code will not be changed and GOT will be updated for dynamic link, right?

If so this means the PLT code will be not patched with other instructions on the fly. Just curious if I miss something for this.

...

...
the decode is based on the unpatched code from the image files.

It looks like it might not have spotted that this is a T32 BLX to A32 code in the PLT, and is decoding

...
102e4:       e28fc600        add     ip, pc, #0, 12
as if it was T32. This decodes to a random 16-bit STM (0xc600) at 102e4, followed by a random 16-bit direct branch (0xe28f) at 102e6. The decoder tells you that 102e6 is the branch origin:

...
        hello  3859          1 branches:u:             102e6
printf@plt+0x2 (/home/root/hello) => 102d0 _init+0xc (/home/root/hello
and ends the ETM range at 102e8:

...
     CS_ETM_RANGE: [0x102e4-0x102e8] br  <<---- execution of printf@plt
here
The LDR PC (in the actual PLT code) will emit an ETM branch address packet which gets the decoder back on track. It just gets the wrong waypoint address for the origin of the branch.

A32/T32 state changes that can be statically inferred from the instructions (e.g. BLX) aren't specially indicated in the ETM packets, you just get an E atom. The decoder has to deduce the state change from the code.

If the decoding misses to change state from T32 to A32, should we expect an updated decoding in OpenCSD for related fixing?

Thanks, Leo Yan

Mike Leach

23 Oct 23 Oct

12:13 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

HI,

Looks like the ETMv4 decoder is not tracking the ISA change as Al says. This will need a mod and I'll need to generate a test case unless the raw trace and images can be supplied from any of the above tests.

Mike On Sat, 20 Oct 2018 at 08:43, leo.yan@linaro.org wrote:

...

On Fri, Oct 19, 2018 at 10:10:49AM +0000, Al Grant wrote:

...
...
Decode:
        hello  3859          1 branches:u:             10428
__libc_csu_init+0x30 (/home/root/hello) => 103d4 main+0x0 (/home/root/ hello 3859 1 branches:u: 103e8 main+0x14 (/home/root/hello) => 102e4 printf@plt+0x0 (/home/root/hello hello 3859 1 branches:u: 102e6 printf@plt+0x2 (/home/root/hello) => 102d0 _init+0xc (/home/root/hello hello 3859 1 branches:u: 102dc _init+0x18 (/home/root/hello) => 0 [unknown] ([unknown])

Disassembly:

000102d0 printf@plt-0x14: 102d0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 102d4: e59fe004 ldr lr, [pc, #4] ; 102e0 <_init+0x1c> 102d8: e08fe00e add lr, pc, lr 102dc: e5bef008 ldr pc, [lr, #8]! 102e0: 00010d20 .word 0x00010d20

000102e4 printf@plt: 102e4: e28fc600 add ip, pc, #0, 12 102e8: e28cca10 add ip, ip, #16, 20 ; 0x10000 102ec: e5bcfd20 ldr pc, [ip, #3360]! ; 0xd20

If I log the packets emitted from the OpenCSD decoder we get this:
     CS_ETM_RANGE: [0xf747487c-0xf7474880]
     CS_ETM_RANGE: [0xf7474880-0xf74748aa] br
     CS_ETM_RANGE: [0x103d4-0x103ec] br  <<---- entry to main() up to call to
printf() CS_ETM_RANGE: [0x102e4-0x102e8] br <<---- execution of printf@plt here CS_ETM_RANGE: [0x102d0-0x102e0] br <<---- execution of printf@plt- 0x14, jumping to actual printf CS_ETM_RANGE: [0xf7559be8-0xf7559c00] br CS_ETM_RANGE: [0xf7555024-0xf7555028]

So the reported samples match the output from the decoder: It would appear to have interpreted the instruction at 0x00102e4 as a branch. Either this is an error in the decoder, or it's some effect of the dynamic linking (the PLT wrapper functions may patched with the actual address of the printf function) - but I think
I don't familiar with PLT and GOT, after took a bit time to google related info and I think we should assume PLT code will not be changed and GOT will be updated for dynamic link, right?

If so this means the PLT code will be not patched with other instructions on the fly. Just curious if I miss something for this.

...
...
the decode is based on the unpatched code from the image files.

It looks like it might not have spotted that this is a T32 BLX to A32 code in the PLT, and is decoding

...
102e4:       e28fc600        add     ip, pc, #0, 12
as if it was T32. This decodes to a random 16-bit STM (0xc600) at 102e4, followed by a random 16-bit direct branch (0xe28f) at 102e6. The decoder tells you that 102e6 is the branch origin:

...
        hello  3859          1 branches:u:             102e6
printf@plt+0x2 (/home/root/hello) => 102d0 _init+0xc (/home/root/hello
and ends the ETM range at 102e8:

...
     CS_ETM_RANGE: [0x102e4-0x102e8] br  <<---- execution of printf@plt
here
The LDR PC (in the actual PLT code) will emit an ETM branch address packet which gets the decoder back on track. It just gets the wrong waypoint address for the origin of the branch.

A32/T32 state changes that can be statically inferred from the instructions (e.g. BLX) aren't specially indicated in the ETM packets, you just get an E atom. The decoder has to deduce the state change from the code.
If the decoding misses to change state from T32 to A32, should we expect an updated decoding in OpenCSD for related fixing?

Thanks, Leo Yan _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK

leo.yan＠linaro.org

2:04 p.m.

New subject: [RFC v3 1/2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Hi Mike,

On Tue, Oct 23, 2018 at 01:13:40PM +0100, Mike Leach wrote:

...

HI,

Looks like the ETMv4 decoder is not tracking the ISA change as Al says. This will need a mod and I'll need to generate a test case unless the raw trace and images can be supplied from any of the above tests.

Have uploaded the tar file which includes elf/.so lib/perf.data, if I miss something just let me know. You could fetch from [1].

Thanks, Leo Yan

[1] http://people.linaro.org/~leo.yan/opencsd_juno/coresight_test.tgz

Robert Walker

27 Sep 27 Sep

10:23 a.m.

New subject: [RFC v3 2/2] perf: Full support for Arm T32 instructions with older version of OpenCSD

To provide accurate instruction sampling when T32 instructions are used, it is necessary to count the number of instructions in each range executed by examining each instruction. With OpenCSD 0.9.0 and later this is performed by the OpenCSD library while decoding the trace. When using older versions of OpenCSD, this patch counts the instructions in each range emitted by the decoder library (which is slower than doing it in the decoder library).

Signed-off-by: Robert Walker robert.walker@arm.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 64 +++++++++++++++++++++---- 1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 260f3b3a..446033e 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -314,29 +314,77 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,

#ifdef HAVE_CSTRACE_INSTR_INFO

-static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +static int cs_etm_decoder__instr_count(struct cs_etm_decoder *decoder __maybe_unused, + const ocsd_generic_trace_elem *elem) { return elem->num_instr_range; }

-static int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem) +static int cs_etm_decoder__last_instr_size(struct cs_etm_decoder *decoder __maybe_unused, + const ocsd_generic_trace_elem *elem) { return elem->last_instr_sz; }

#else

-static int cs_etm_decoder__instr_count(const ocsd_generic_trace_elem *elem) +static inline +int cs_etm_decoder__t32_instr_size(struct cs_etm_decoder *decoder, + u64 addr) +{ + u8 instrBytes[2]; + + decoder->mem_access(decoder->data, + addr, + ARRAY_SIZE(instrBytes), + instrBytes); + + /* T32 instruction size is indicated by bits[15:11] of the first + * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111 + * denote a 32-bit instruction. + */ + return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2; +} + +static int cs_etm_decoder__instr_count(struct cs_etm_decoder *decoder, + const ocsd_generic_trace_elem *elem) { - /* Assume a 4-byte instruction size - will be wrong for T32 */ WARN_ONCE(elem->isa == ocsd_isa_thumb2, - "Instruction counts not available for T32. Please upgrade to OpenCSD >= 0.9.0\n"); + "Using slow method for T32 instruction counts. Please upgrade to OpenCSD >= 0.9.0\n"); + + if (elem->isa == ocsd_isa_thumb2) { + /* Count each instruction for T32 */ + u64 addr = elem->st_addr; + u64 count = 0; + + while (addr < elem->en_addr) { + addr += cs_etm_decoder__t32_instr_size(decoder, addr); + count++; + } + return count; + } + return (elem->en_addr - elem->st_addr) / 4; }

static -int cs_etm_decoder__last_instr_size(const ocsd_generic_trace_elem *elem __maybe_unused) +int cs_etm_decoder__last_instr_size(struct cs_etm_decoder *decoder, + const ocsd_generic_trace_elem *elem) { + if (elem->isa == ocsd_isa_thumb2) { + /* + * T32 instructions can be either 2 or 4 bytes + */ + if (elem->en_addr - elem->st_addr == 2) + /* Only one 2 byte instruction in packet*/ + return 2; + else if (cs_etm_decoder__t32_instr_size(decoder, elem->en_addr - 4) == 4) + return 4; + else + return 2; + } + + /* Otherwise a 4 byte instruction size (A32/A64) */ return 4; }

@@ -377,7 +425,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,

packet->start_addr = elem->st_addr; packet->end_addr = elem->en_addr; - packet->instr_count = cs_etm_decoder__instr_count(elem); + packet->instr_count = cs_etm_decoder__instr_count(decoder, elem);

switch (elem->last_i_type) { case OCSD_INSTR_BR: @@ -392,7 +440,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder, break; }

- packet->last_instr_size = cs_etm_decoder__last_instr_size(elem); + packet->last_instr_size = cs_etm_decoder__last_instr_size(decoder, elem);

return ret; }

-- 2.7.4

Mike Leach

1:39 p.m.

Hi Rob, On Thu, 27 Sep 2018 at 11:24, Robert Walker robert.walker@arm.com wrote:

...

Hi,

I'm taking this back to the linaro coresight list so we can get the OpenCSD library versioning sorted out.

The first patch splits the OpenCSD feature check into two parts. The original check is left as is - this just checks for the presence of an OpenCSD library. A new check (libopencsd-numinstr) is added that checks for the new OpenCSD (>0.9.0) that has the num_instr_range member in the ocsd_generic_trace_elem struct. This feature is then used to set a flag used in cs-etm-decoder.c to select which versions of 2 functions are used to get the instruction count / last instruction size of each instruction block - if the flag is not set, then the previous assumptions of a 4 byte instruction size are used. It was suggested that OpenCSD should export a version header - I agree this is a good idea, but this will require a new release of the library, so we would miss support for the instruction sizes when OpenCSD 0.9.{0,1,2} is installed - hence why I've kept the version check using the presence of num_instr_range.

I agree with the version check using the num_instr range for now. But my view is that we should simply fail to build if the version of the library is insufficient for the current set of perf features - with an appropriate warning of course.

...

The second patch adds support for finding the T32 instruction counts when the OpenCSD library doesn't report the instruction counts. As this involves iterating through the block of instructions and examining each instruction, there is a significant peformance hit (about 5x slower than using the OpenCSD library to report the instruction counts), so I'm not sure this patch should go into upstream.

I don't think that is should - what is the advantage of building a new version of perf against an old version of the OpenCSD library?

If a user builds a version of perf without these patches against the new library then it will work - that's fine. With the patches then we should require the correct library version.

The problem we cannot solve at this point is a user taking a perf built against one version (e.g. 0.8.x,) and running it against the other (0.9.x). The executable links against libopencsd.0

Once we up grade the version to 1.0.0 and beyond, then we will have to be more careful about breaking changes - these will have to rev the major version number in future.

Mike

...

Regards

Rob

Robert Walker (2): perf: Support for Arm A32/T32 instruction sets in CoreSight trace perf: Full support for Arm T32 instructions with older version of OpenCSD

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 + tools/build/feature/test-libopencsd-numinstr.c | 15 ++++ tools/perf/Makefile.config | 3 + tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 106 ++++++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 +++ tools/perf/util/cs-etm.c | 71 +++++++--------- 7 files changed, 171 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK

Robert Walker

1 Oct 1 Oct

8 a.m.

On 27/09/18 14:39, Mike Leach wrote:

...

Hi Rob, On Thu, 27 Sep 2018 at 11:24, Robert Walker robert.walker@arm.com wrote:

...
Hi,

I'm taking this back to the linaro coresight list so we can get the OpenCSD library versioning sorted out.

The first patch splits the OpenCSD feature check into two parts. The original check is left as is - this just checks for the presence of an OpenCSD library. A new check (libopencsd-numinstr) is added that checks for the new OpenCSD (>0.9.0) that has the num_instr_range member in the ocsd_generic_trace_elem struct. This feature is then used to set a flag used in cs-etm-decoder.c to select which versions of 2 functions are used to get the instruction count / last instruction size of each instruction block - if the flag is not set, then the previous assumptions of a 4 byte instruction size are used. It was suggested that OpenCSD should export a version header - I agree this is a good idea, but this will require a new release of the library, so we would miss support for the instruction sizes when OpenCSD 0.9.{0,1,2} is installed - hence why I've kept the version check using the presence of num_instr_range.

I agree with the version check using the num_instr range for now. But my view is that we should simply fail to build if the version of the library is insufficient for the current set of perf features - with an appropriate warning of course.

...
The second patch adds support for finding the T32 instruction counts when the OpenCSD library doesn't report the instruction counts. As this involves iterating through the block of instructions and examining each instruction, there is a significant peformance hit (about 5x slower than using the OpenCSD library to report the instruction counts), so I'm not sure this patch should go into upstream.

I don't think that is should - what is the advantage of building a new version of perf against an old version of the OpenCSD library?

If a user builds a version of perf without these patches against the new library then it will work - that's fine. With the patches then we should require the correct library version.

The problem we cannot solve at this point is a user taking a perf built against one version (e.g. 0.8.x,) and running it against the other (0.9.x). The executable links against libopencsd.0

Once we up grade the version to 1.0.0 and beyond, then we will have to be more careful about breaking changes - these will have to rev the major version number in future.

Mike

The previous version of this patch did fail the build if OpenCSD 0.9.x wasn't available - feedback at the time was that we should try to support those users with only the older version of OpenCSD available (although I agree updating OpenCSD is the better option). The configuration checks added here would allow us to fail the build with a more detailed error message ("Old version of OpenCSD detected, but 0.9.x is required")

Regards

Rob

...

...
Regards

Rob

Robert Walker (2): perf: Support for Arm A32/T32 instruction sets in CoreSight trace perf: Full support for Arm T32 instructions with older version of OpenCSD

tools/build/Makefile.feature | 3 +- tools/build/feature/Makefile | 4 + tools/build/feature/test-libopencsd-numinstr.c | 15 ++++ tools/perf/Makefile.config | 3 + tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 106 ++++++++++++++++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 +++ tools/perf/util/cs-etm.c | 71 +++++++--------- 7 files changed, 171 insertions(+), 41 deletions(-) create mode 100644 tools/build/feature/test-libopencsd-numinstr.c

-- 2.7.4

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

2443

days inactive

2469

days old

coresight@lists.linaro.org

25 comments

participants

tags (0)

participants (5)

Al Grant
leo.yan＠linaro.org
Mathieu Poirier
Mike Leach
Robert Walker