Fix thread tracking when decoding Coresight trace and add a new test for it.
The new test is added as a Perf test workload instead of a custom binary with its own build system, but this requires a new feature in Perf test to pass in control pipes which can enable and disable events. This scopes the recording to just the workload and helps to reduce the amount of data recorded in tracing tests.
With this new feature we can re-write all of the Coresight tests to make use of it and remove the remaining binaries which fixes the following issues:
* They didn't work in out of source builds * A lot of the tests unnecessarily required root and didn't skip without it * They were mainly qualitative tests which didn't look for specific behavior
Most importantly, the long build and runtime has been reduced. On a Radxa Orion O6, unroll_loop_thread.c took 37s to compile which is longer than the entire Perf build. Now the build time is negligible and the before and after test runtimes for all the Coresight tests are:
| N1SDP | Orion O6 ----------------------------------- Before | 4m 0s | 14m 49s After | 26s | 56s -----------------------------------
Signed-off-by: James Clark james.clark@linaro.org --- Changes in v3: - Minor sashiko comments - Close some more pipes - Fix warning messages - Error handling improvements - Pass packet into cs_etm__synth_instruction_sample() - Fixup stale comment (Leo) - Link to v2: https://lore.kernel.org/r/20260602-james-cs-context-tracking-fix-v2-0-85b5ce...
Changes in v2: - Add --workload-ctl option to Perf test - Re-write all the Coresight tests and speed them up - Pass packet to memory access function so frontend can use either the previous or current packet's EL - Link to v1: https://lore.kernel.org/r/20260526-james-cs-context-tracking-fix-v1-0-ebd602...
--- James Clark (19): perf cs-etm: Queue context packets for frontend perf test: Add workload-ctl option perf test: Add a workload that forces context switches perf test cs-etm: Test process attribution perf test: Add deterministic workload perf test cs-etm: Replace unroll loop thread with deterministic decode test perf test cs-etm: Remove asm_pure_loop test perf test cs-etm: Replace memcpy test with raw dump stress test perf test: Add named_threads workload perf test cs-etm: Test decoding for concurrent threads test perf test cs-etm: Remove duplicate branch tests perf test cs-etm: Skip if not root perf test cs-etm: Reduce snapshot size perf test cs-etm: Speed up basic test perf test cs-etm: Remove unused Coresight workloads perf test cs-etm: Make disassembly test use kcore perf test cs-etm: Add all branch instructions to test perf test cs-etm: Speed up disassembly test perf test cs-etm: Move existing tests to coresight folder
Documentation/trace/coresight/coresight-perf.rst | 78 +------ MAINTAINERS | 2 - tools/perf/Documentation/perf-test.txt | 18 +- tools/perf/Makefile.perf | 14 +- tools/perf/scripts/python/arm-cs-trace-disasm.py | 20 +- tools/perf/tests/builtin-test.c | 187 +++++++++++++++- tools/perf/tests/shell/coresight/Makefile | 29 --- .../perf/tests/shell/coresight/Makefile.miniconfig | 14 -- tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 -- .../tests/shell/coresight/asm_pure_loop/.gitignore | 1 - .../tests/shell/coresight/asm_pure_loop/Makefile | 34 --- .../shell/coresight/asm_pure_loop/asm_pure_loop.S | 30 --- .../tests/shell/coresight/concurrent_threads.sh | 45 ++++ .../tests/shell/coresight/context_switch_thread.sh | 69 ++++++ tools/perf/tests/shell/coresight/deterministic.sh | 71 +++++++ .../tests/shell/coresight/memcpy_thread/.gitignore | 1 - .../tests/shell/coresight/memcpy_thread/Makefile | 33 --- .../shell/coresight/memcpy_thread/memcpy_thread.c | 80 ------- .../tests/shell/coresight/memcpy_thread_16k_10.sh | 22 -- .../perf/tests/shell/coresight/raw_dump_stress.sh | 48 +++++ .../shell/{ => coresight}/test_arm_coresight.sh | 43 ++-- .../{ => coresight}/test_arm_coresight_disasm.sh | 17 +- .../tests/shell/coresight/thread_loop/.gitignore | 1 - .../tests/shell/coresight/thread_loop/Makefile | 33 --- .../shell/coresight/thread_loop/thread_loop.c | 85 -------- .../shell/coresight/thread_loop_check_tid_10.sh | 23 -- .../shell/coresight/thread_loop_check_tid_2.sh | 23 -- .../shell/coresight/unroll_loop_thread/.gitignore | 1 - .../shell/coresight/unroll_loop_thread/Makefile | 33 --- .../unroll_loop_thread/unroll_loop_thread.c | 75 ------- .../tests/shell/coresight/unroll_loop_thread_10.sh | 22 -- tools/perf/tests/shell/lib/coresight.sh | 134 ------------ tools/perf/tests/tests.h | 3 + tools/perf/tests/workloads/Build | 4 + tools/perf/tests/workloads/context_switch_loop.c | 101 +++++++++ tools/perf/tests/workloads/deterministic.c | 39 ++++ tools/perf/tests/workloads/named_threads.c | 109 ++++++++++ tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 +- tools/perf/util/cs-etm.c | 234 ++++++++++++--------- tools/perf/util/cs-etm.h | 8 +- 40 files changed, 889 insertions(+), 938 deletions(-) --- base-commit: 5f0ca6b80b12bab1ce06839cdffb6148bb650ff4 change-id: 20260515-james-cs-context-tracking-fix-754998bae7ed
Best regards,
PE_CONTEXT elements update the context ID and exception level, but the decoder may still have prior packets cached for frontend processing. Updating the context immediately in the decoder backend can make those cached packets get consumed with the wrong thread or EL state.
Add a CS_ETM_CONTEXT packet carrying the TID and EL to the frontend, this keeps context changes ordered with the rest of the packet stream and avoids mismatches when synthesizing samples from cached packets.
Separate the memory access function into one for the frontend and one for decoding. The frontend also needs memory access to attach the instruction to samples. Because the frontend does memory access for both previous and current packets, change all the frontend memory access function signatures to take both a tidq and packet. But backend always uses the current backend EL and thread from the tidq.
Treat context packets as a boundary for branch sample generation and remove tidq->prev_packet_thread because it's not possible to branch to a different thread, so only tracking the current thread is required for sample generation.
Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces") Reported-by: Amir Ayupov aaupov@meta.com Closes: https://lore.kernel.org/linux-perf-users/20260515021135.1729028-1-aaupov@met... Co-authored-by: James Clark james.clark@linaro.org Signed-off-by: Leo Yan leo.yan@arm.com Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 ++- tools/perf/util/cs-etm.c | 234 ++++++++++++++---------- tools/perf/util/cs-etm.h | 8 +- 3 files changed, 162 insertions(+), 101 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index dee3020ceaa9..26940f1f1b0b 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -402,6 +402,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_queue *etmq, packet_queue->packet_buffer[et].flags = 0; packet_queue->packet_buffer[et].exception_number = UINT32_MAX; packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id; + packet_queue->packet_buffer[et].el = ocsd_EL_unknown; + packet_queue->packet_buffer[et].tid = -1;
if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1) return OCSD_RESP_WAIT; @@ -449,6 +451,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq, packet->last_instr_type = elem->last_i_type; packet->last_instr_subtype = elem->last_i_subtype; packet->last_instr_cond = elem->last_instr_cond; + packet->el = elem->context.exception_level;
if (elem->last_i_type == OCSD_INSTR_BR || elem->last_i_type == OCSD_INSTR_BR_INDIRECT) packet->last_instr_taken_branch = elem->last_instr_exec; @@ -525,7 +528,9 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, const ocsd_generic_trace_elem *elem, const uint8_t trace_chan_id) { + struct cs_etm_packet *packet; pid_t tid = -1; + int ret;
/* * Process the PE_CONTEXT packets if we have a valid contextID or VMID. @@ -546,12 +551,18 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, break; }
- if (cs_etm__etmq_set_tid_el(etmq, tid, trace_chan_id, - elem->context.exception_level)) + if (cs_etm__etmq_update_decode_context(etmq, trace_chan_id, + elem->context.exception_level, tid)) return OCSD_RESP_FATAL_SYS_ERR;
- if (tid == -1) - return OCSD_RESP_CONT; + ret = cs_etm_decoder__buffer_packet(etmq, packet_queue, trace_chan_id, + CS_ETM_CONTEXT); + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) + return ret; + + packet = &packet_queue->packet_buffer[packet_queue->tail]; + packet->tid = tid; + packet->el = elem->context.exception_level;
/* * A timestamp is generated after a PE_CONTEXT element so make sure @@ -559,7 +570,7 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq, */ cs_etm_decoder__reset_timestamp(packet_queue);
- return OCSD_RESP_CONT; + return ret; }
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer( diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 40c6ddfa8c8d..ce570913669c 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -85,15 +85,22 @@ struct cs_etm_traceid_queue { u64 period_instructions; size_t last_branch_pos; union perf_event *event_buf; - struct thread *thread; - struct thread *prev_packet_thread; - ocsd_ex_level prev_packet_el; - ocsd_ex_level el; struct branch_stack *last_branch; struct branch_stack *last_branch_rb; struct cs_etm_packet *prev_packet; struct cs_etm_packet *packet; struct cs_etm_packet_queue packet_queue; + + struct thread *decode_thread; + ocsd_ex_level decode_el; + + /* + * The frontend accesses the EL from '[prev_]packet' because it needs + * previous EL for branch and current EL for instruction samples. It's + * not possible to change thread in a single branch sample so no need to + * store or access the thread through the packet. + */ + struct thread *frontend_thread; };
enum cs_etm_format { @@ -614,10 +621,11 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
queue = &etmq->etm->queues.queue_array[etmq->queue_nr]; tidq->trace_chan_id = trace_chan_id; - tidq->el = tidq->prev_packet_el = ocsd_EL_unknown; - tidq->thread = machine__findnew_thread(&etm->session->machines.host, -1, + tidq->decode_el = ocsd_EL_unknown; + tidq->frontend_thread = machine__findnew_thread(&etm->session->machines.host, -1, + queue->tid); + tidq->decode_thread = machine__findnew_thread(&etm->session->machines.host, -1, queue->tid); - tidq->prev_packet_thread = machine__idle_thread(&etm->session->machines.host);
tidq->packet = zalloc(sizeof(struct cs_etm_packet)); if (!tidq->packet) @@ -750,21 +758,10 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm, /* * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for * the next incoming packet. - * - * Threads and exception levels are also tracked for both the - * previous and current packets. This is because the previous - * packet is used for the 'from' IP for branch samples, so the - * thread at that time must also be assigned to that sample. - * Across discontinuity packets the thread can change, so by - * tracking the thread for the previous packet the branch sample - * will have the correct info. */ tmp = tidq->packet; tidq->packet = tidq->prev_packet; tidq->prev_packet = tmp; - tidq->prev_packet_el = tidq->el; - thread__put(tidq->prev_packet_thread); - tidq->prev_packet_thread = thread__get(tidq->thread); } }
@@ -937,8 +934,8 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
/* Free this traceid_queue from the array */ tidq = etmq->traceid_queues[idx]; - thread__zput(tidq->thread); - thread__zput(tidq->prev_packet_thread); + thread__zput(tidq->frontend_thread); + thread__zput(tidq->decode_thread); zfree(&tidq->event_buf); zfree(&tidq->last_branch); zfree(&tidq->last_branch_rb); @@ -1083,47 +1080,43 @@ static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address, } }
-static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id, - u64 address, size_t size, u8 *buffer, - const ocsd_mem_space_acc_t mem_space) +static u32 __cs_etm__mem_access(struct cs_etm_queue *etmq, + u64 address, size_t size, u8 *buffer, + const ocsd_mem_space_acc_t mem_space, + ocsd_ex_level el, struct thread *thread) { u8 cpumode; u64 offset; int len; struct addr_location al; struct dso *dso; - struct cs_etm_traceid_queue *tidq; int ret = 0;
if (!etmq) return 0;
addr_location__init(&al); - tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); - if (!tidq) - goto out;
/* - * We've already tracked EL along side the PID in cs_etm__set_thread() - * so double check that it matches what OpenCSD thinks as well. It - * doesn't distinguish between EL0 and EL1 for this mem access callback - * so we had to do the extra tracking. Skip validation if it's any of - * the 'any' values. + * We track EL for the frontend and the backend when receiving context + * and range packets. OpenCSD doesn't distinguish between EL0 and EL1 + * for this mem access callback so we had to do the extra tracking. Skip + * validation if it's any of the 'any' values. */ if (!(mem_space == OCSD_MEM_SPACE_ANY || mem_space == OCSD_MEM_SPACE_N || mem_space == OCSD_MEM_SPACE_S)) { if (mem_space & OCSD_MEM_SPACE_EL1N) { /* Includes both non secure EL1 and EL0 */ - assert(tidq->el == ocsd_EL1 || tidq->el == ocsd_EL0); + assert(el == ocsd_EL1 || el == ocsd_EL0); } else if (mem_space & OCSD_MEM_SPACE_EL2) - assert(tidq->el == ocsd_EL2); + assert(el == ocsd_EL2); else if (mem_space & OCSD_MEM_SPACE_EL3) - assert(tidq->el == ocsd_EL3); + assert(el == ocsd_EL3); }
- cpumode = cs_etm__cpu_mode(etmq, address, tidq->el); + cpumode = cs_etm__cpu_mode(etmq, address, el);
- if (!thread__find_map(tidq->thread, cpumode, address, &al)) + if (!thread__find_map(thread, cpumode, address, &al)) goto out;
dso = map__dso(al.map); @@ -1138,7 +1131,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
map__load(al.map);
- len = dso__data_read_offset(dso, maps__machine(thread__maps(tidq->thread)), + len = dso__data_read_offset(dso, maps__machine(thread__maps(thread)), offset, buffer, size);
if (len <= 0) { @@ -1158,6 +1151,30 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id, return ret; }
+static u32 cs_etm__frontend_mem_access(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, + struct cs_etm_packet *packet, + u64 address, size_t size, u8 *buffer) +{ + return __cs_etm__mem_access(etmq, address, size, buffer, 0, packet->el, + tidq->frontend_thread); +} + +static u32 cs_etm__decoder_mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id, + u64 address, size_t size, u8 *buffer, + const ocsd_mem_space_acc_t mem_space) +{ + struct cs_etm_traceid_queue *tidq; + + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); + if (!tidq) + return 0; + + return __cs_etm__mem_access(etmq, address, size, buffer, + mem_space, tidq->decode_el, + tidq->decode_thread); +} + static struct cs_etm_queue *cs_etm__alloc_queue(void) { struct cs_etm_queue *etmq = zalloc(sizeof(*etmq)); @@ -1333,12 +1350,13 @@ void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq) }
static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq, - u8 trace_chan_id, u64 addr) + struct cs_etm_traceid_queue *tidq, + struct cs_etm_packet *packet, u64 addr) { u8 instrBytes[2];
- cs_etm__mem_access(etmq, trace_chan_id, addr, ARRAY_SIZE(instrBytes), - instrBytes, 0); + cs_etm__frontend_mem_access(etmq, tidq, packet, addr, + ARRAY_SIZE(instrBytes), instrBytes); /* * T32 instruction size is indicated by bits[15:11] of the first * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111 @@ -1371,16 +1389,16 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet) }
static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq, - u64 trace_chan_id, - const struct cs_etm_packet *packet, + struct cs_etm_traceid_queue *tidq, + struct cs_etm_packet *packet, u64 offset) { if (packet->isa == CS_ETM_ISA_T32) { u64 addr = packet->start_addr;
while (offset) { - addr += cs_etm__t32_instr_size(etmq, - trace_chan_id, addr); + addr += cs_etm__t32_instr_size(etmq, tidq, packet, + addr); offset--; } return addr; @@ -1490,34 +1508,51 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) return etmq->buf_len; }
-static void cs_etm__set_thread(struct cs_etm_queue *etmq, - struct cs_etm_traceid_queue *tidq, pid_t tid, - ocsd_ex_level el) +/* + * Convert a raw thread number to a thread struct and assign it to **thread. + */ +static int cs_etm__etmq_update_thread(struct cs_etm_queue *etmq, + ocsd_ex_level el, pid_t tid, + struct thread **thread) { struct machine *machine = cs_etm__get_machine(etmq, el);
+ if (!machine || !*thread) + return -EINVAL; + if (tid != -1) { - thread__zput(tidq->thread); - tidq->thread = machine__find_thread(machine, -1, tid); + thread__zput(*thread); + *thread = machine__find_thread(machine, -1, tid); }
/* Couldn't find a known thread */ - if (!tidq->thread) - tidq->thread = machine__idle_thread(machine); + if (!*thread) + *thread = machine__idle_thread(machine);
- tidq->el = el; + return 0; }
-int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid, - u8 trace_chan_id, ocsd_ex_level el) +/* + * Set the thread and EL of the decode context which is ahead in time of the + * frontend context. + */ +int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq, + u8 trace_chan_id, + ocsd_ex_level el, pid_t tid) { struct cs_etm_traceid_queue *tidq; + int ret;
tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id); if (!tidq) return -EINVAL;
- cs_etm__set_thread(etmq, tidq, tid, el); + ret = cs_etm__etmq_update_thread(etmq, el, tid, + &tidq->decode_thread); + if (ret) + return ret; + + tidq->decode_el = el; return 0; }
@@ -1527,8 +1562,8 @@ bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq) }
static void cs_etm__copy_insn(struct cs_etm_queue *etmq, - u64 trace_chan_id, - const struct cs_etm_packet *packet, + struct cs_etm_traceid_queue *tidq, + struct cs_etm_packet *packet, struct perf_sample *sample) { /* @@ -1545,14 +1580,14 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq, * cs_etm__t32_instr_size(). */ if (packet->isa == CS_ETM_ISA_T32) - sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, + sample->insn_len = cs_etm__t32_instr_size(etmq, tidq, packet, sample->ip); /* Otherwise, A64 and A32 instruction size are always 32-bit. */ else sample->insn_len = 4;
- cs_etm__mem_access(etmq, trace_chan_id, sample->ip, sample->insn_len, - (void *)sample->insn, 0); + cs_etm__frontend_mem_access(etmq, tidq, packet, sample->ip, + sample->insn_len, (void *)sample->insn); }
u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp) @@ -1579,6 +1614,7 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, + struct cs_etm_packet *packet, u64 addr, u64 period) { int ret = 0; @@ -1588,15 +1624,15 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
perf_sample__init(&sample, /*all=*/true); event->sample.header.type = PERF_RECORD_SAMPLE; - event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el); + event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, packet->el); event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */ sample.time = cs_etm__resolve_sample_time(etmq, tidq);
sample.ip = addr; - sample.pid = thread__pid(tidq->thread); - sample.tid = thread__tid(tidq->thread); + sample.pid = thread__pid(tidq->frontend_thread); + sample.tid = thread__tid(tidq->frontend_thread); sample.id = etmq->etm->instructions_id; sample.stream_id = etmq->etm->instructions_id; sample.period = period; @@ -1604,7 +1640,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
- cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample); + cs_etm__copy_insn(etmq, tidq, tidq->packet, &sample);
if (etm->synth_opts.last_branch) sample.branch_stack = tidq->last_branch; @@ -1649,15 +1685,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
event->sample.header.type = PERF_RECORD_SAMPLE; event->sample.header.misc = cs_etm__cpu_mode(etmq, ip, - tidq->prev_packet_el); + tidq->prev_packet->el); event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */ sample.time = cs_etm__resolve_sample_time(etmq, tidq);
sample.ip = ip; - sample.pid = thread__pid(tidq->prev_packet_thread); - sample.tid = thread__tid(tidq->prev_packet_thread); + sample.pid = thread__pid(tidq->frontend_thread); + sample.tid = thread__tid(tidq->frontend_thread); sample.addr = cs_etm__first_executed_instr(tidq->packet); sample.id = etmq->etm->branches_id; sample.stream_id = etmq->etm->branches_id; @@ -1666,8 +1702,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, sample.flags = tidq->prev_packet->flags; sample.cpumode = event->sample.header.misc;
- cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, - &sample); + cs_etm__copy_insn(etmq, tidq, tidq->prev_packet, &sample);
/* * perf report cannot handle events without a branch stack @@ -1788,7 +1823,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, { struct cs_etm_auxtrace *etm = etmq->etm; int ret; - u8 trace_chan_id = tidq->trace_chan_id; u64 instrs_prev;
/* Get instructions remainder from previous packet */ @@ -1874,10 +1908,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * been executed, but PC has not advanced to next * instruction) */ - addr = cs_etm__instr_addr(etmq, trace_chan_id, - tidq->packet, offset - 1); + addr = cs_etm__instr_addr(etmq, tidq, tidq->packet, + offset - 1); ret = cs_etm__synth_instruction_sample( - etmq, tidq, addr, + etmq, tidq, tidq->packet, addr, etm->instructions_sample_period); if (ret) return ret; @@ -1959,7 +1993,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq, addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( - etmq, tidq, addr, + etmq, tidq, tidq->prev_packet, addr, tidq->period_instructions); if (err) return err; @@ -2014,7 +2048,7 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq, addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample( - etmq, tidq, addr, + etmq, tidq, tidq->prev_packet, addr, tidq->period_instructions); if (err) return err; @@ -2051,9 +2085,9 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq) return etmq->buf_len; }
-static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, - struct cs_etm_packet *packet, - u64 end_addr) +static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, + struct cs_etm_traceid_queue *tidq, + struct cs_etm_packet *packet, u64 end_addr) { /* Initialise to keep compiler happy */ u16 instr16 = 0; @@ -2075,8 +2109,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, * so below only read 2 bytes as instruction size for T32. */ addr = end_addr - 2; - cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr16), - (u8 *)&instr16, 0); + cs_etm__frontend_mem_access(etmq, tidq, packet, addr, + sizeof(instr16), (u8 *)&instr16); if ((instr16 & 0xFF00) == 0xDF00) return true;
@@ -2091,8 +2125,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, * +---------+---------+-------------------------+ */ addr = end_addr - 4; - cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32), - (u8 *)&instr32, 0); + cs_etm__frontend_mem_access(etmq, tidq, packet, addr, + sizeof(instr32), (u8 *)&instr32); if ((instr32 & 0x0F000000) == 0x0F000000 && (instr32 & 0xF0000000) != 0xF0000000) return true; @@ -2108,8 +2142,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, * +-----------------------+---------+-----------+ */ addr = end_addr - 4; - cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32), - (u8 *)&instr32, 0); + cs_etm__frontend_mem_access(etmq, tidq, packet, addr, + sizeof(instr32), (u8 *)&instr32); if ((instr32 & 0xFFE0001F) == 0xd4000001) return true;
@@ -2125,7 +2159,6 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id, static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 magic) { - u8 trace_chan_id = tidq->trace_chan_id; struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -2140,7 +2173,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq, */ if (magic == __perf_cs_etmv4_magic) { if (packet->exception_number == CS_ETMV4_EXC_CALL && - cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet, + cs_etm__is_svc_instr(etmq, tidq, prev_packet, prev_packet->end_addr)) return true; } @@ -2178,7 +2211,6 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, struct cs_etm_traceid_queue *tidq, u64 magic) { - u8 trace_chan_id = tidq->trace_chan_id; struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -2204,7 +2236,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq, * (SMC, HVC) are taken as sync exceptions. */ if (packet->exception_number == CS_ETMV4_EXC_CALL && - !cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet, + !cs_etm__is_svc_instr(etmq, tidq, prev_packet, prev_packet->end_addr)) return true;
@@ -2228,7 +2260,6 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, { struct cs_etm_packet *packet = tidq->packet; struct cs_etm_packet *prev_packet = tidq->prev_packet; - u8 trace_chan_id = tidq->trace_chan_id; u64 magic; int ret;
@@ -2309,11 +2340,11 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, if (prev_packet->flags == (PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_INTERRUPT) && - cs_etm__is_svc_instr(etmq, trace_chan_id, - packet, packet->start_addr)) + cs_etm__is_svc_instr(etmq, tidq, packet, packet->start_addr)) { prev_packet->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_SYSCALLRET; + } break; case CS_ETM_DISCONTINUITY: /* @@ -2394,6 +2425,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq, PERF_IP_FLAG_RETURN | PERF_IP_FLAG_INTERRUPT; break; + case CS_ETM_CONTEXT: case CS_ETM_EMPTY: default: break; @@ -2469,6 +2501,19 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, */ cs_etm__sample(etmq, tidq); break; + case CS_ETM_CONTEXT: + /* + * Update context but don't swap packet. Keep the + * previous one for branch source address info, if + * tracing the kernel the context packet will be emitted + * between two ranges. + */ + ret = cs_etm__etmq_update_thread(etmq, tidq->packet->el, + tidq->packet->tid, + &tidq->frontend_thread); + if (ret) + goto out; + break; case CS_ETM_EXCEPTION: case CS_ETM_EXCEPTION_RET: /* @@ -2497,6 +2542,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq, } }
+out: return ret; }
@@ -2620,7 +2666,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, if (!tidq) continue;
- if (tid == -1 || thread__tid(tidq->thread) == tid) + if (tid == -1 || thread__tid(tidq->frontend_thread) == tid) cs_etm__run_per_thread_timeless_decoder(etmq); } else cs_etm__run_per_cpu_timeless_decoder(etmq); @@ -3328,7 +3374,7 @@ static int cs_etm__create_queue_decoders(struct cs_etm_queue *etmq) */ if (cs_etm_decoder__add_mem_access_cb(etmq->decoder, 0x0L, ((u64) -1L), - cs_etm__mem_access)) + cs_etm__decoder_mem_access)) goto out_free_decoder;
zfree(&t_params); diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h index aa9bb4a32eca..b81099c2b301 100644 --- a/tools/perf/util/cs-etm.h +++ b/tools/perf/util/cs-etm.h @@ -158,6 +158,7 @@ enum cs_etm_sample_type { CS_ETM_DISCONTINUITY, CS_ETM_EXCEPTION, CS_ETM_EXCEPTION_RET, + CS_ETM_CONTEXT, };
enum cs_etm_isa { @@ -184,6 +185,8 @@ struct cs_etm_packet { u8 last_instr_size; u8 trace_chan_id; int cpu; + int el; + pid_t tid; };
#define CS_ETM_PACKET_MAX_BUFFER 1024 @@ -259,8 +262,9 @@ enum cs_etm_pid_fmt { #include <opencsd/ocsd_if_types.h> int cs_etm__get_cpu(struct cs_etm_queue *etmq, u8 trace_chan_id, int *cpu); enum cs_etm_pid_fmt cs_etm__get_pid_fmt(struct cs_etm_queue *etmq); -int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid, - u8 trace_chan_id, ocsd_ex_level el); +int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq, + u8 trace_chan_id, ocsd_ex_level el, + pid_t tid); bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq); void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq, u8 trace_chan_id);
Add a --workload-ctl=fifo:ctl-fifo[,ack-fifo] option for 'perf test -w'. When set, run_workload() opens the named FIFO, writes enable before invoking the builtin workload, writes disable before returning, and waits for ack responses when an ack FIFO is provided to ensure that the workload doesn't run until the events are enabled.
This can be used to limit the scope of the recording to only the workload execution and avoid recording Perf setup and teardown code if Perf record is started with events disabled (-D 1).
Assisted-by: Codex:GPT-5.5 Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/Documentation/perf-test.txt | 6 ++ tools/perf/tests/builtin-test.c | 184 ++++++++++++++++++++++++++++++++- 2 files changed, 188 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt index 32da0d1fa86a..1faf30d4a7be 100644 --- a/tools/perf/Documentation/perf-test.txt +++ b/tools/perf/Documentation/perf-test.txt @@ -69,3 +69,9 @@ OPTIONS
--list-workloads:: List the available workloads to use with -w/--workload. + +--workload-ctl=fifo:ctl-fifo[,ack-fifo]:: + Write 'enable' to ctl-fifo before running the workload and 'disable' + before returning. If ack-fifo is provided, the workload runner waits for + an 'ack' response after each command. This scopes the recording to only + the workload if used with 'perf record -D 1 --control ...'. diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index f2c135891477..a9e67d7da700 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -50,6 +50,7 @@ static bool sequential; static unsigned int runs_per_test = 1; const char *dso_to_test; const char *test_objdump_path = "objdump"; +static const char *workload_control;
/* * List of architecture specific tests. Not a weak symbol as the array length is @@ -161,6 +162,11 @@ static struct test_workload *workloads[] = { #endif };
+struct workload_control { + int ctl_fd; + int ack_fd; +}; + #define workloads__for_each(workload) \ for (unsigned i = 0; i < ARRAY_SIZE(workloads) && ({ workload = workloads[i]; 1; }); i++)
@@ -711,13 +717,185 @@ static int workloads__fprintf_list(FILE *fp) return printed; }
+static int perf_control_open_fifo(struct workload_control *ctl, const char *str) +{ + char *s, *p; + int ret; + + if (strncmp(str, "fifo:", 5)) + return -EINVAL; + + str += 5; + if (!*str || *str == ',') + return -EINVAL; + + s = strdup(str); + if (!s) + return -ENOMEM; + + p = strchr(s, ','); + if (p) + *p = '\0'; + + ctl->ctl_fd = open(s, O_WRONLY | O_CLOEXEC); + if (ctl->ctl_fd < 0) { + ret = -errno; + pr_err("Failed to open workload control FIFO '%s': %m\n", s); + free(s); + return ret; + } + + if (p && *++p) { + ctl->ack_fd = open(p, O_RDONLY | O_CLOEXEC); + if (ctl->ack_fd < 0) { + ret = -errno; + pr_err("Failed to open workload control ack FIFO '%s': %m\n", p); + close(ctl->ctl_fd); + ctl->ctl_fd = -1; + free(s); + return ret; + } + } + + free(s); + return 0; +} + +static int perf_control_open(struct workload_control *ctl) +{ + int ret; + + if (!workload_control) + return 0; + + ret = perf_control_open_fifo(ctl, workload_control); + + if (ret == -EINVAL) { + pr_err("Unsupported workload control spec '%s', expected fifo:ctl-fifo[,ack-fifo]\n", + workload_control); + } + + return ret; +} + +static void perf_control_close(struct workload_control *ctl) +{ + if (ctl->ctl_fd >= 0) { + close(ctl->ctl_fd); + ctl->ctl_fd = -1; + } + if (ctl->ack_fd >= 0) { + close(ctl->ack_fd); + ctl->ack_fd = -1; + } +} + +static int perf_control_write_cmd(int fd, const char *cmd) +{ + size_t len = strlen(cmd); + ssize_t ret; + + while (len) { + ret = write(fd, cmd, len); + if (ret < 0) { + if (errno == EINTR) + continue; + pr_err("Failed to write perf control command: %m\n"); + return -1; + } + + if (!ret) { + pr_err("Failed to write perf control command: short write\n"); + return -1; + } + + cmd += ret; + len -= ret; + } + + return 0; +} + +static int perf_control_read_ack(int fd) +{ + char buf[16]; + ssize_t ret; + + do { + ret = read(fd, buf, sizeof(buf) - 1); + } while (ret < 0 && errno == EINTR); + + if (ret < 0) { + pr_err("Failed to read perf control ack: %m\n"); + return -1; + } + + if (!ret) { + pr_err("Unexpected EOF while reading perf control ack\n"); + return -1; + } + + buf[ret] = '\0'; + for (ssize_t i = 0; i < ret; i++) { + if (buf[i] == '\n' || buf[i] == '\0') { + buf[i] = '\0'; + break; + } + } + + if (strcmp(buf, "ack")) { + pr_err("Unexpected perf control ack: %s\n", buf); + return -1; + } + + return 0; +} + +static int perf_control_send(struct workload_control *ctl, const char *cmd) +{ + if (ctl->ctl_fd < 0) + return 0; + + if (perf_control_write_cmd(ctl->ctl_fd, cmd)) + return -1; + + if (ctl->ack_fd >= 0 && perf_control_read_ack(ctl->ack_fd)) + return -1; + + return 0; +} + static int run_workload(const char *work, int argc, const char **argv) { struct test_workload *twl;
workloads__for_each(twl) { - if (!strcmp(twl->name, work)) - return twl->func(argc, argv); + struct workload_control ctl = { + .ctl_fd = -1, + .ack_fd = -1, + }; + int control_ret, ret; + + if (strcmp(twl->name, work)) + continue; + + ret = perf_control_open(&ctl); + if (ret) + return ret; + + if (perf_control_send(&ctl, "enable\n")) { + perf_control_close(&ctl); + return -1; + } + + ret = twl->func(argc, argv); + + control_ret = perf_control_send(&ctl, "disable\n"); + perf_control_close(&ctl); + if (control_ret) + return -1; + + return ret; }
pr_info("No workload found: %s\n", work); @@ -799,6 +977,8 @@ int cmd_test(int argc, const char **argv) OPT_UINTEGER('r', "runs-per-test", &runs_per_test, "Run each test the given number of times, default 1"), OPT_STRING('w', "workload", &workload, "work", "workload to run for testing, use '--list-workloads' to list the available ones."), + OPT_STRING(0, "workload-ctl", &workload_control, "fifo:ctl-fifo[,ack-fifo]", + "Write enable to the fifo just before running the workload and disable after, with optional ack from ack-fifo"), OPT_BOOLEAN(0, "list-workloads", &list_workloads, "List the available builtin workloads to use with -w/--workload"), OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"), OPT_STRING(0, "objdump", &test_objdump_path, "path",
This workload launches two processes that block when reading and writing to each other forcing the other process to be scheduled for each read/write pair.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/Documentation/perf-test.txt | 7 +- tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/tests.h | 1 + tools/perf/tests/workloads/Build | 1 + tools/perf/tests/workloads/context_switch_loop.c | 101 +++++++++++++++++++++++ 5 files changed, 108 insertions(+), 3 deletions(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt index 1faf30d4a7be..9c0d7ac2bc64 100644 --- a/tools/perf/Documentation/perf-test.txt +++ b/tools/perf/Documentation/perf-test.txt @@ -55,15 +55,16 @@ OPTIONS
-w:: --workload=:: - Run a built-in workload, to list them use '--list-workloads', current ones include: - noploop, thloop, leafloop, sqrtloop, brstack, datasym and landlock. + Run a built-in workload, to list them use '--list-workloads', current + ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym, + context_switch_loop and landlock.
Used with the shell script regression tests.
Some accept an extra parameter:
seconds: leafloop, noploop, sqrtloop, thloop - nrloops: brstack + nrloops: brstack, context_switch_loop
The datasym and landlock workloads don't accept any.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index a9e67d7da700..2830a431771f 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -156,6 +156,7 @@ static struct test_workload *workloads[] = { &workload__landlock, &workload__traploop, &workload__inlineloop, + &workload__context_switch_loop,
#ifdef HAVE_RUST_SUPPORT &workload__code_with_type, diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index ee00518bf36f..79f50bacfc94 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -242,6 +242,7 @@ DECLARE_WORKLOAD(datasym); DECLARE_WORKLOAD(landlock); DECLARE_WORKLOAD(traploop); DECLARE_WORKLOAD(inlineloop); +DECLARE_WORKLOAD(context_switch_loop);
#ifdef HAVE_RUST_SUPPORT DECLARE_WORKLOAD(code_with_type); diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build index 2ef97f7affce..3bda6da04a35 100644 --- a/tools/perf/tests/workloads/Build +++ b/tools/perf/tests/workloads/Build @@ -9,6 +9,7 @@ perf-test-y += datasym.o perf-test-y += landlock.o perf-test-y += traploop.o perf-test-y += inlineloop.o +perf-test-y += context_switch_loop.o
ifeq ($(CONFIG_RUST_SUPPORT),y) perf-test-y += code_with_type.o diff --git a/tools/perf/tests/workloads/context_switch_loop.c b/tools/perf/tests/workloads/context_switch_loop.c new file mode 100644 index 000000000000..173d770ae619 --- /dev/null +++ b/tools/perf/tests/workloads/context_switch_loop.c @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/compiler.h> +#include <stdio.h> +#include <stdlib.h> +#include <sys/prctl.h> +#include <sys/wait.h> +#include <unistd.h> + +#include "../tests.h" + +static int loops = 100; +static char buf; +int context_switch_loop_work = 1234; + +#define write_block(fd) \ + do { \ + if (write(fd, &buf, 1) <= 0) \ + exit(1); \ + } while (0) + +#define read_block(fd) \ + do { \ + if (read(fd, &buf, 1) <= 0) \ + exit(1); \ + } while (0) + +/* Not static to avoid LTO clobbering the function name */ +void context_switch_loop_proc1(int in_fd, int out_fd); +noinline void context_switch_loop_proc1(int in_fd, int out_fd) +{ + for (int i = 0; i < loops; i++) { + read_block(in_fd); + context_switch_loop_work += i * 3; + write_block(out_fd); + } +} + +void context_switch_loop_proc2(int in_fd, int out_fd); +noinline void context_switch_loop_proc2(int in_fd, int out_fd) +{ + for (int i = 0; i < loops; i++) { + write_block(out_fd); + context_switch_loop_work += i * 7; + read_block(in_fd); + } +} + +/* + * Launches two processes that take turns to execute a multiplication N times + */ +static int context_switch_loop(int argc, const char **argv) +{ + int a_to_b[2], b_to_a[2]; + pid_t proc1_pid; + int status; + + if (argc > 0) { + loops = atoi(argv[0]); + if (loops < 0) { + fprintf(stderr, "Invalid number of loops: %s\n", argv[0]); + return 1; + } + } + + if (pipe(a_to_b) || pipe(b_to_a)) { + perror("Pipe error"); + return 1; + } + + proc1_pid = fork(); + if (proc1_pid < 0) { + perror("Fork error"); + return 1; + } + + if (!proc1_pid) { + close(a_to_b[0]); + close(b_to_a[1]); + prctl(PR_SET_NAME, "proc1", 0, 0, 0); + context_switch_loop_proc1(b_to_a[0], a_to_b[1]); + close(a_to_b[1]); + close(b_to_a[0]); + exit(0); + } + + close(a_to_b[1]); + close(b_to_a[0]); + prctl(PR_SET_NAME, "proc2", 0, 0, 0); + context_switch_loop_proc2(a_to_b[0], b_to_a[1]); + close(a_to_b[0]); + close(b_to_a[1]); + + if (waitpid(proc1_pid, &status, 0) != proc1_pid || !WIFEXITED(status) || + WEXITSTATUS(status)) + return 1; + + return 0; +} + +DEFINE_WORKLOAD(context_switch_loop);
Run the context switch workload on one CPU and trace it to test that symbols are attributed to the correct process and that the attribution changes at the exact point that the context switch happened.
Signed-off-by: James Clark james.clark@linaro.org --- .../tests/shell/coresight/context_switch_thread.sh | 69 ++++++++++++++++++++++ 1 file changed, 69 insertions(+)
diff --git a/tools/perf/tests/shell/coresight/context_switch_thread.sh b/tools/perf/tests/shell/coresight/context_switch_thread.sh new file mode 100755 index 000000000000..0992c35a329d --- /dev/null +++ b/tools/perf/tests/shell/coresight/context_switch_thread.sh @@ -0,0 +1,69 @@ +#!/bin/bash -e +# Coresight context switch thread attribution (exclusive) + +# SPDX-License-Identifier: GPL-2.0 + +# If Coresight is not available, skip the test +perf list pmu | grep -q cs_etm || exit 2 + +if [ "$(id -u)" != 0 ]; then + # Requires root for "-C 0" in record command + echo "[Skip] No root permission" + exit 2 +fi + +tmpdir=$(mktemp -d /tmp/__perf_test.coresight_context_switch.XXXXX) + +cleanup() { + rm -rf "${tmpdir}" + trap - EXIT TERM INT +} + +trap_cleanup() { + cleanup + exit 1 +} +trap trap_cleanup EXIT TERM INT + +check_samples() { + owner_samples=$(grep -c "proc1.*context_switch_loop_proc1" "$tmpdir/script" || true) + next_samples=$(grep -c "proc2.*context_switch_loop_proc2" "$tmpdir/script" || true) + + if [ "$owner_samples" -eq 0 ] || [ "$next_samples" -eq 0 ]; then + echo "No samples found" + cleanup + exit 1 + fi + + if grep "proc2.*context_switch_loop_proc1" "$tmpdir/script"; then + echo "Thread1 symbol was attributed to proc2" + cleanup + exit 1 + fi + + if grep "proc1.*context_switch_loop_proc2" "$tmpdir/script"; then + echo "Thread2 symbol was attributed to proc1" + cleanup + exit 1 + fi +} + +cf="$tmpdir/ctl" +af="$tmpdir/ack" +mkfifo "$cf" "$af" + +# Pin to one CPU so the two threads alternate running but record into the same +# trace buffer. Start disabled and use the control FIFO to only record the +# workload and not startup. +perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -C 0 -D -1 --control fifo:"$cf","$af" -- \ + taskset --cpu-list 0 perf test --workload-ctl fifo:"$cf","$af" \ + -w context_switch_loop > /dev/null 2>&1 + +# Test both instruction and branch sample generation modes. +perf script -i "$tmpdir/data" --itrace=i4 -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null +check_samples +perf script -i "$tmpdir/data" --itrace=b -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null +check_samples + +cleanup +exit 0
Add a workload that does the same thing every time for testing CPU trace decoding.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/Documentation/perf-test.txt | 4 +-- tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/tests.h | 1 + tools/perf/tests/workloads/Build | 2 ++ tools/perf/tests/workloads/deterministic.c | 39 ++++++++++++++++++++++++++++++ 5 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt index 9c0d7ac2bc64..7ec70c054cac 100644 --- a/tools/perf/Documentation/perf-test.txt +++ b/tools/perf/Documentation/perf-test.txt @@ -57,7 +57,7 @@ OPTIONS --workload=:: Run a built-in workload, to list them use '--list-workloads', current ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym, - context_switch_loop and landlock. + context_switch_loop, deterministic and landlock.
Used with the shell script regression tests.
@@ -66,7 +66,7 @@ OPTIONS seconds: leafloop, noploop, sqrtloop, thloop nrloops: brstack, context_switch_loop
- The datasym and landlock workloads don't accept any. + The datasym, landlock and deterministic workloads don't accept any.
--list-workloads:: List the available workloads to use with -w/--workload. diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index 2830a431771f..5a2ab67cd85d 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -157,6 +157,7 @@ static struct test_workload *workloads[] = { &workload__traploop, &workload__inlineloop, &workload__context_switch_loop, + &workload__deterministic,
#ifdef HAVE_RUST_SUPPORT &workload__code_with_type, diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index 79f50bacfc94..f8bba2d68769 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -243,6 +243,7 @@ DECLARE_WORKLOAD(landlock); DECLARE_WORKLOAD(traploop); DECLARE_WORKLOAD(inlineloop); DECLARE_WORKLOAD(context_switch_loop); +DECLARE_WORKLOAD(deterministic);
#ifdef HAVE_RUST_SUPPORT DECLARE_WORKLOAD(code_with_type); diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build index 3bda6da04a35..cca7ad354227 100644 --- a/tools/perf/tests/workloads/Build +++ b/tools/perf/tests/workloads/Build @@ -10,6 +10,7 @@ perf-test-y += landlock.o perf-test-y += traploop.o perf-test-y += inlineloop.o perf-test-y += context_switch_loop.o +perf-test-y += deterministic.o
ifeq ($(CONFIG_RUST_SUPPORT),y) perf-test-y += code_with_type.o @@ -22,3 +23,4 @@ CFLAGS_brstack.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE CFLAGS_datasym.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE CFLAGS_traploop.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE CFLAGS_inlineloop.o = -g -O2 +CFLAGS_deterministic.o = -g -O0 -U_FORTIFY_SOURCE diff --git a/tools/perf/tests/workloads/deterministic.c b/tools/perf/tests/workloads/deterministic.c new file mode 100644 index 000000000000..3caea8564043 --- /dev/null +++ b/tools/perf/tests/workloads/deterministic.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/compiler.h> +#include "../tests.h" + +int dt_work = 1234; + +static noinline void function1(void) +{ + dt_work *= 7; + dt_work *= 7; + dt_work *= 7; +} + +static noinline void function2(void) +{ + dt_work *= 7; + dt_work *= 7; + dt_work *= 7; +} + +static int deterministic(int argc __maybe_unused, + const char **argv __maybe_unused) +{ + dt_work *= 7; + dt_work *= 7; + dt_work *= 7; + + function1(); + + dt_work *= 7; + dt_work *= 7; + dt_work *= 7; + + function2(); + + return 0; +} + +DEFINE_WORKLOAD(deterministic);
Testing a long sequence without branches seems like it would be better as a decoder unit test, and this test doesn't test decoding either, so it's not clear what bugs this is trying to catch.
The new deterministic workload has somewhat long sequences when built unoptimized, and we can always increase them later if we want to. But now we test that decoding always gives the same result for the same sequence of code which we've never had before.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/coresight/deterministic.sh | 71 ++++++++++++++++++++++ .../tests/shell/coresight/unroll_loop_thread_10.sh | 22 ------- 2 files changed, 71 insertions(+), 22 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/deterministic.sh b/tools/perf/tests/shell/coresight/deterministic.sh new file mode 100755 index 000000000000..52e033fd6b82 --- /dev/null +++ b/tools/perf/tests/shell/coresight/deterministic.sh @@ -0,0 +1,71 @@ +#!/bin/bash -e +# Coresight deterministic workload decode (exclusive) + +# SPDX-License-Identifier: GPL-2.0 + +# If Coresight is not available, skip the test +perf list pmu | grep -q cs_etm || exit 2 + +tmpdir=$(mktemp -d /tmp/__perf_test.coresight_deterministic.XXXXX) + +cleanup() { + rm -rf "${tmpdir}" + trap - EXIT TERM INT +} + +trap_cleanup() { + cleanup + exit 1 +} +trap trap_cleanup EXIT TERM INT + +cf="$tmpdir/ctl" +af="$tmpdir/ack" +mkfifo "$cf" "$af" + +# Start disabled and use the control FIFO to only record the workload and not +# startup. +perf record -o "$tmpdir/data" -e cs_etm//u -D -1 --control fifo:"$cf","$af" -- \ + perf test --workload-ctl fifo:"$cf","$af" -w deterministic > /dev/null 2>&1 + +perf script -i "$tmpdir/data" --itrace=i1i -F ip,srcline | \ + grep "deterministic.c" | uniq > "$tmpdir/script" 2>/dev/null + + +# Remove open brace lines as they may not be hit depending on the compiler +sed -i \ + -e '/deterministic.c:8$/d' \ + -e '/deterministic.c:15$/d' \ + -e '/deterministic.c:23$/d' \ + "$tmpdir/script" + +cat > "$tmpdir/expected" << EOF + deterministic.c:24 + deterministic.c:25 + deterministic.c:26 + deterministic.c:28 + deterministic.c:9 + deterministic.c:10 + deterministic.c:11 + deterministic.c:12 + deterministic.c:30 + deterministic.c:31 + deterministic.c:32 + deterministic.c:34 + deterministic.c:16 + deterministic.c:17 + deterministic.c:18 + deterministic.c:19 + deterministic.c:36 + deterministic.c:37 +EOF + +if ! diff -q "$tmpdir/script" "$tmpdir/expected"; then + echo "FAIL: line numbers don't match expected: " + head -n 100 "$tmpdir/script" + cleanup + exit 1 +fi + +cleanup +exit 0 diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh b/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh deleted file mode 100755 index cb3e97a0a89f..000000000000 --- a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh +++ /dev/null @@ -1,22 +0,0 @@ -#!/bin/bash -e -# CoreSight / Unroll Loop Thread 10 (exclusive) - -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -TEST="unroll_loop_thread" - -# shellcheck source=../lib/coresight.sh -. "$(dirname $0)"/../lib/coresight.sh - -ARGS="10" -DATV="10" -# shellcheck disable=SC2153 -DATA="$DATD/perf-$TEST-$DATV.data" - -perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS - -perf_dump_aux_verify "$DATA" 10 10 10 - -err=$? -exit $err
It's not obvious what this test is for so remove it. It's not a stress test because it doesn't output lots of data and it's not a functional test because it only looks for raw trace output. It seems to imply that a program written in assembly influences whether trace would be generated by the CPU or not, but the CPU doesn't know what language the program is written in.
We already have lots of Coresight tests that test the full pipeline including decoding, and in many more modes of operation than this one, so if no trace was collected they will already fail leaving this one redundant.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 ---------------------- 1 file changed, 22 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop.sh b/tools/perf/tests/shell/coresight/asm_pure_loop.sh deleted file mode 100755 index 0301904b9637..000000000000 --- a/tools/perf/tests/shell/coresight/asm_pure_loop.sh +++ /dev/null @@ -1,22 +0,0 @@ -#!/bin/bash -e -# CoreSight / ASM Pure Loop (exclusive) - -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -TEST="asm_pure_loop" - -# shellcheck source=../lib/coresight.sh -. "$(dirname $0)"/../lib/coresight.sh - -ARGS="" -DATV="out" -# shellcheck disable=SC2153 -DATA="$DATD/perf-$TEST-$DATV.data" - -perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS - -perf_dump_aux_verify "$DATA" 10 10 10 - -err=$? -exit $err
Like asm_pure_loop, this memcpy test only checks that 10 of each of a few trace packet types occur after recording a lot of trace, which isn't more specific than other existing Coresight tests.
Assume it was supposed to be a stress test for dumping and replace it with one that doesn't require a custom binary and checks for a specific amount of raw output. Don't bother checking for packets because the other tests that test decoding will catch issues with malformed data.
This also adds coverage for exit snapshot mode which was missing.
Signed-off-by: James Clark james.clark@linaro.org --- .../tests/shell/coresight/memcpy_thread_16k_10.sh | 22 ---------- .../perf/tests/shell/coresight/raw_dump_stress.sh | 48 ++++++++++++++++++++++ 2 files changed, 48 insertions(+), 22 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh b/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh deleted file mode 100755 index 1f765d69acc3..000000000000 --- a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh +++ /dev/null @@ -1,22 +0,0 @@ -#!/bin/bash -e -# CoreSight / Memcpy 16k 10 Threads (exclusive) - -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -TEST="memcpy_thread" - -# shellcheck source=../lib/coresight.sh -. "$(dirname $0)"/../lib/coresight.sh - -ARGS="16 10 1" -DATV="16k_10" -# shellcheck disable=SC2153 -DATA="$DATD/perf-$TEST-$DATV.data" - -perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS - -perf_dump_aux_verify "$DATA" 10 10 10 - -err=$? -exit $err diff --git a/tools/perf/tests/shell/coresight/raw_dump_stress.sh b/tools/perf/tests/shell/coresight/raw_dump_stress.sh new file mode 100755 index 000000000000..025584472513 --- /dev/null +++ b/tools/perf/tests/shell/coresight/raw_dump_stress.sh @@ -0,0 +1,48 @@ +#!/bin/bash -e +# Coresight raw dump stress (exclusive) + +# SPDX-License-Identifier: GPL-2.0 + +if [ "$(id -u)" != 0 ]; then + # Requires root for larger buffer size + echo "[Skip] No root permission" + exit 2 +fi + +# If Coresight is not available, skip the test +perf list pmu | grep -q cs_etm || exit 2 + +tmpdir=$(mktemp -d /tmp/__perf_test.coresight_raw_dump_stress.XXXXX) + +cleanup() { + rm -r "${tmpdir}" + trap - EXIT TERM INT +} + +trap_cleanup() { + cleanup + exit 1 +} +trap trap_cleanup EXIT TERM INT + +# Use exit snapshot to record 2M of trace to make about 80MB of raw dump data. +echo "Recording..." +perf record -e cs_etm/timestamp=0/u -m,2M -Se -o "$tmpdir/data" -- \ + perf test -w brstack 20000 > /dev/null 2>&1 + +# Test raw dump runs to completion but don't decode because that's too slow for +# a test +echo "Dumping raw trace..." +perf report --dump-raw-trace -i "$tmpdir/data" 2>/dev/null > "$tmpdir/rawdump" +err=$? + +size=$(stat -c%s "$tmpdir/rawdump") +if [ $size -gt $((50 * 1024 * 1024)) ]; then + echo "PASS: Raw dump file is larger than 50MB" + cleanup + exit 0 +fi + +echo "FAIL: Got less than 50MB (${size} bytes)" +cleanup +exit 1
Add a workload that runs X threads that run a unique function named "named_threads_thread[x]" which performs a multiplication in a loop for Y loops. Each thread sets its name to "thread[x]".
This can be used to test that processor trace decoding handles concurrent threads correctly and the correct symbols and thread names are assigned to samples.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/Documentation/perf-test.txt | 5 +- tools/perf/tests/builtin-test.c | 1 + tools/perf/tests/tests.h | 1 + tools/perf/tests/workloads/Build | 1 + tools/perf/tests/workloads/named_threads.c | 109 +++++++++++++++++++++++++++++ 5 files changed, 116 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt index 7ec70c054cac..778c37f6efdb 100644 --- a/tools/perf/Documentation/perf-test.txt +++ b/tools/perf/Documentation/perf-test.txt @@ -57,7 +57,7 @@ OPTIONS --workload=:: Run a built-in workload, to list them use '--list-workloads', current ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym, - context_switch_loop, deterministic and landlock. + context_switch_loop, deterministic, named_threads and landlock.
Used with the shell script regression tests.
@@ -66,6 +66,9 @@ OPTIONS seconds: leafloop, noploop, sqrtloop, thloop nrloops: brstack, context_switch_loop
+ 'named_threads' accepts the number of threads and the number of loops to + do in each thread. + The datasym, landlock and deterministic workloads don't accept any.
--list-workloads:: diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index 5a2ab67cd85d..2fee93858c86 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -149,6 +149,7 @@ static struct test_suite *generic_tests[] = { static struct test_workload *workloads[] = { &workload__noploop, &workload__thloop, + &workload__named_threads, &workload__leafloop, &workload__sqrtloop, &workload__brstack, diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index f8bba2d68769..ef3c3a269132 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -235,6 +235,7 @@ struct test_workload workload__##work = { \ /* The list of test workloads */ DECLARE_WORKLOAD(noploop); DECLARE_WORKLOAD(thloop); +DECLARE_WORKLOAD(named_threads); DECLARE_WORKLOAD(leafloop); DECLARE_WORKLOAD(sqrtloop); DECLARE_WORKLOAD(brstack); diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build index cca7ad354227..7db5eea713a3 100644 --- a/tools/perf/tests/workloads/Build +++ b/tools/perf/tests/workloads/Build @@ -2,6 +2,7 @@
perf-test-y += noploop.o perf-test-y += thloop.o +perf-test-y += named_threads.o perf-test-y += leafloop.o perf-test-y += sqrtloop.o perf-test-y += brstack.o diff --git a/tools/perf/tests/workloads/named_threads.c b/tools/perf/tests/workloads/named_threads.c new file mode 100644 index 000000000000..dc8070a98df4 --- /dev/null +++ b/tools/perf/tests/workloads/named_threads.c @@ -0,0 +1,109 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <errno.h> +#include <limits.h> +#include <pthread.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <linux/compiler.h> +#include "../tests.h" + +#define MAX_THREADS 25 + +static int iterations = 500; +int named_threads_work = 1234; + +typedef void *(*thread_fn_t)(void *); + +#define DEFINE_THREAD(n) \ +noinline void *named_threads_thread##n(void *arg __maybe_unused) \ +{ \ + pthread_setname_np(pthread_self(), "thread" #n); \ + for (int i = 0; i < iterations; i++) \ + named_threads_work *= 3; \ + \ + return NULL; \ +} + +#define THREAD_LIST(macro) \ + macro(1) \ + macro(2) \ + macro(3) \ + macro(4) \ + macro(5) \ + macro(6) \ + macro(7) \ + macro(8) \ + macro(9) \ + macro(10) \ + macro(11) \ + macro(12) \ + macro(13) \ + macro(14) \ + macro(15) \ + macro(16) \ + macro(17) \ + macro(18) \ + macro(19) \ + macro(20) \ + macro(21) \ + macro(22) \ + macro(23) \ + macro(24) \ + macro(25) + +#define DECLARE_THREAD(n) void *named_threads_thread##n(void *arg); + +THREAD_LIST(DECLARE_THREAD) +THREAD_LIST(DEFINE_THREAD) + +#define THREAD_ENTRY(n) named_threads_thread##n, + +static thread_fn_t thread_fns[MAX_THREADS] = { + THREAD_LIST(THREAD_ENTRY) +}; + +/* + * Creates argv[0] threads that run a unique function named "thread[x]" which performs + * a multiplication in a loop for argv[1] loops. + */ +static int named_threads(int argc, const char **argv) +{ + pthread_t threads[MAX_THREADS]; + int nr_threads = 1; + int err = 0; + + if (argc > 0) + nr_threads = atoi(argv[0]); + + if (nr_threads <= 0 || nr_threads > MAX_THREADS) { + fprintf(stderr, "Error: num threads must be 1 - %d\n", MAX_THREADS); + return 1; + } + + if (argc > 1) + iterations = atoi(argv[1]); + + if (iterations < 0) { + fprintf(stderr, "Error: iterations must be non-negative\n"); + return 1; + } + + for (int i = 0; i < nr_threads; i++) { + int ret; + + ret = pthread_create(&threads[i], NULL, thread_fns[i], NULL); + if (ret) { + fprintf(stderr, "Error: failed to create thread%d: %s\n", + i + 1, strerror(ret)); + return 1; + } + } + + for (int i = 0; i < nr_threads; i++) + pthread_join(threads[i], NULL); + + return err; +} + +DEFINE_WORKLOAD(named_threads);
The thread_loop test only looks for context IDs in the raw trace. There's a lot more that can go wrong when decoding these, so replace it with a test that looks at the final output for matching thread names and symbols.
In the future we might use timestamps and context switch events to track threads, so looking at context IDs in the raw trace wouldn't always work.
Signed-off-by: James Clark james.clark@linaro.org --- .../tests/shell/coresight/concurrent_threads.sh | 45 ++++++++++++++++++++++ .../shell/coresight/thread_loop_check_tid_10.sh | 23 ----------- .../shell/coresight/thread_loop_check_tid_2.sh | 23 ----------- 3 files changed, 45 insertions(+), 46 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/concurrent_threads.sh b/tools/perf/tests/shell/coresight/concurrent_threads.sh new file mode 100755 index 000000000000..bf34d4ee77a6 --- /dev/null +++ b/tools/perf/tests/shell/coresight/concurrent_threads.sh @@ -0,0 +1,45 @@ +#!/bin/bash -e +# Coresight concurrent threads (exclusive) + +# SPDX-License-Identifier: GPL-2.0 + +# If Coresight is not available, skip the test +perf list pmu | grep -q cs_etm || exit 2 + +tmpdir=$(mktemp -d /tmp/__perf_test.coresight_concurrent_threads.XXXXX) + +cleanup() { + rm -rf "${tmpdir}" + trap - EXIT TERM INT +} + +trap_cleanup() { + cleanup + exit 1 +} +trap trap_cleanup EXIT TERM INT + +cf="$tmpdir/ctl" +af="$tmpdir/ack" +mkfifo "$cf" "$af" + +nthreads=10 + +# Timestamps off to reduce trace size, start disabled and use the control FIFO +# to only record the workload and not startup. +perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -D -1 --control fifo:"$cf","$af" \ + -- perf test --workload-ctl fifo:"$cf","$af" -w named_threads $nthreads 1 > /dev/null 2>&1 + +perf script -i "$tmpdir/data" > "$tmpdir/script" 2>/dev/null + +# Check all threads were traced and they have the correct thread name and symbol +for i in $(seq 1 $nthreads); do + if ! grep -q "thread${i} .* named_threads_thread${i}" "$tmpdir/script"; then + echo "Error: thread${i} missing" >&2 + cleanup + exit 1 + fi +done + +cleanup +exit 0 diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh deleted file mode 100755 index 7f43a93a2ac2..000000000000 --- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh +++ /dev/null @@ -1,23 +0,0 @@ -#!/bin/bash -e -# CoreSight / Thread Loop 10 Threads - Check TID (exclusive) - -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -TEST="thread_loop" - -# shellcheck source=../lib/coresight.sh -. "$(dirname $0)"/../lib/coresight.sh - -ARGS="10 1" -DATV="check-tid-10th" -# shellcheck disable=SC2153 -DATA="$DATD/perf-$TEST-$DATV.data" -STDO="$DATD/perf-$TEST-$DATV.stdout" - -SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO - -perf_dump_aux_tid_verify "$DATA" "$STDO" - -err=$? -exit $err diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh deleted file mode 100755 index a94d2079ed06..000000000000 --- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh +++ /dev/null @@ -1,23 +0,0 @@ -#!/bin/bash -e -# CoreSight / Thread Loop 2 Threads - Check TID (exclusive) - -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -TEST="thread_loop" - -# shellcheck source=../lib/coresight.sh -. "$(dirname $0)"/../lib/coresight.sh - -ARGS="2 20" -DATV="check-tid-2th" -# shellcheck disable=SC2153 -DATA="$DATD/perf-$TEST-$DATV.data" -STDO="$DATD/perf-$TEST-$DATV.stdout" - -SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO - -perf_dump_aux_tid_verify "$DATA" "$STDO" - -err=$? -exit $err
We already test branch output in perf script mode, but then retest it in Perf report mode. This is more of a test of Perf itself than Coresight because Perf uses the same samples to generate both outputs. Also we're already testing instruction output in Perf report mode.
Remove this test for a speedup. On the systemwide test also remove the Perf report test because systemwide mode records a lot more data so running multiple tests on it has a big runtime impact.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/test_arm_coresight.sh | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh index bbf89e944e7b..39553702c1f3 100755 --- a/tools/perf/tests/shell/test_arm_coresight.sh +++ b/tools/perf/tests/shell/test_arm_coresight.sh @@ -52,17 +52,6 @@ perf_script_branch_samples() { grep -E " +$1 +[0-9]+ .* +branches:(.*:)? +" > /dev/null 2>&1 }
-perf_report_branch_samples() { - echo "Looking at perf.data file for reporting branch samples:" - - # Below is an example of the branch samples reporting: - # 73.04% 73.04% touch libc-2.27.so [.] _dl_addr - # 7.71% 7.71% touch libc-2.27.so [.] getenv - # 2.59% 2.59% touch ld-2.27.so [.] strcmp - perf report --stdio -i ${perfdata} 2>&1 | \ - grep -E " +[0-9]+.[0-9]+% +[0-9]+.[0-9]+% +$1 " > /dev/null 2>&1 -} - perf_report_instruction_samples() { echo "Looking at perf.data file for instruction samples:"
@@ -123,7 +112,6 @@ arm_cs_iterate_devices() {
record_touch_file $device_name $2 && perf_script_branch_samples touch && - perf_report_branch_samples touch && perf_report_instruction_samples touch
err=$? @@ -154,9 +142,7 @@ arm_cs_etm_system_wide_test() {
# System-wide mode should include perf samples so test for that # instead of ls - perf_script_branch_samples perf && - perf_report_branch_samples perf && - perf_report_instruction_samples perf + perf_script_branch_samples perf
err=$? arm_cs_report "CoreSight system wide testing" $err @@ -179,7 +165,6 @@ arm_cs_etm_snapshot_test() { wait $PERFPID
perf_script_branch_samples dd && - perf_report_branch_samples dd && perf_report_instruction_samples dd
err=$? @@ -191,7 +176,6 @@ arm_cs_etm_basic_test() { perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1
perf_script_branch_samples ls && - perf_report_branch_samples ls && perf_report_instruction_samples ls
err=$?
Use the common idiom for skipping tests if not running as root, which is required for this test.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/test_arm_coresight.sh | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh index 39553702c1f3..8ed2c934c87d 100755 --- a/tools/perf/tests/shell/test_arm_coresight.sh +++ b/tools/perf/tests/shell/test_arm_coresight.sh @@ -20,6 +20,12 @@ skip_if_no_cs_etm_event() {
skip_if_no_cs_etm_event || exit 2
+if [ "$(id -u)" != 0 ]; then + # Requires root for -C and system wide tests + echo "[Skip] No root permission" + exit 2 +fi + perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX) file=$(mktemp /tmp/temporary_file.XXXXX)
The default buffer size for root is 4MB which is very slow to decode. We only need a few KB to verify that the dd process is hit so reduce the size to 128KB.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/test_arm_coresight.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh index 8ed2c934c87d..da2f599393e2 100755 --- a/tools/perf/tests/shell/test_arm_coresight.sh +++ b/tools/perf/tests/shell/test_arm_coresight.sh @@ -156,7 +156,7 @@ arm_cs_etm_system_wide_test() {
arm_cs_etm_snapshot_test() { echo "Recording trace with snapshot mode" - perf record -o ${perfdata} -e cs_etm// -S \ + perf record -o ${perfdata} -e cs_etm// -S -m,128K \ -- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 & PERFPID=$!
Like the name says, this should be the most basic test possible. Kernel recording is slow and already has coverage on the systemwide test. Perf report output also has coverage elsewhere. 'ls' also produces more trace than 'true'.
We only want to test if the combination of recording options works at all, so fix all of these things to make it as fast as possible.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/test_arm_coresight.sh | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh index da2f599393e2..83295a8fe179 100755 --- a/tools/perf/tests/shell/test_arm_coresight.sh +++ b/tools/perf/tests/shell/test_arm_coresight.sh @@ -179,10 +179,9 @@ arm_cs_etm_snapshot_test() {
arm_cs_etm_basic_test() { echo "Recording trace with '$*'" - perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1 + perf record -o ${perfdata} "$@" -- true > /dev/null 2>&1
- perf_script_branch_samples ls && - perf_report_instruction_samples ls + perf_script_branch_samples true
err=$? arm_cs_report "CoreSight basic testing with '$*'" $err @@ -246,12 +245,12 @@ arm_cs_etm_snapshot_test
# Test all combinations of per-thread, system-wide and normal mode with # and without timestamps -arm_cs_etm_basic_test -e cs_etm/timestamp=0/ --per-thread -arm_cs_etm_basic_test -e cs_etm/timestamp=1/ --per-thread -arm_cs_etm_basic_test -e cs_etm/timestamp=0/ -a -arm_cs_etm_basic_test -e cs_etm/timestamp=1/ -a -arm_cs_etm_basic_test -e cs_etm/timestamp=0/ -arm_cs_etm_basic_test -e cs_etm/timestamp=1/ +arm_cs_etm_basic_test -e cs_etm/timestamp=0/u --per-thread +arm_cs_etm_basic_test -e cs_etm/timestamp=1/u --per-thread +arm_cs_etm_basic_test -e cs_etm/timestamp=0/u -a +arm_cs_etm_basic_test -e cs_etm/timestamp=1/u -a +arm_cs_etm_basic_test -e cs_etm/timestamp=0/u +arm_cs_etm_basic_test -e cs_etm/timestamp=1/u
arm_cs_etm_sparse_cpus_test
These are now unused and had various issues like not working with out of source builds and being slow to compile. Delete them.
Signed-off-by: James Clark james.clark@linaro.org --- Documentation/trace/coresight/coresight-perf.rst | 78 +----------- MAINTAINERS | 1 - tools/perf/Makefile.perf | 14 +-- tools/perf/tests/shell/coresight/Makefile | 29 ----- .../perf/tests/shell/coresight/Makefile.miniconfig | 14 --- .../tests/shell/coresight/asm_pure_loop/.gitignore | 1 - .../tests/shell/coresight/asm_pure_loop/Makefile | 34 ------ .../shell/coresight/asm_pure_loop/asm_pure_loop.S | 30 ----- .../tests/shell/coresight/memcpy_thread/.gitignore | 1 - .../tests/shell/coresight/memcpy_thread/Makefile | 33 ----- .../shell/coresight/memcpy_thread/memcpy_thread.c | 80 ------------ .../tests/shell/coresight/thread_loop/.gitignore | 1 - .../tests/shell/coresight/thread_loop/Makefile | 33 ----- .../shell/coresight/thread_loop/thread_loop.c | 85 ------------- .../shell/coresight/unroll_loop_thread/.gitignore | 1 - .../shell/coresight/unroll_loop_thread/Makefile | 33 ----- .../unroll_loop_thread/unroll_loop_thread.c | 75 ------------ tools/perf/tests/shell/lib/coresight.sh | 134 --------------------- 18 files changed, 5 insertions(+), 672 deletions(-)
diff --git a/Documentation/trace/coresight/coresight-perf.rst b/Documentation/trace/coresight/coresight-perf.rst index 30be89320621..0a77741a431e 100644 --- a/Documentation/trace/coresight/coresight-perf.rst +++ b/Documentation/trace/coresight/coresight-perf.rst @@ -112,78 +112,6 @@ Example for triggering AUX pause and resume with PMU event:: Perf test - Verify kernel and userspace perf CoreSight work -----------------------------------------------------------
-When you run perf test, it will do a lot of self tests. Some of those -tests will cover CoreSight (only if enabled and on ARM64). You -generally would run perf test from the tools/perf directory in the -kernel tree. Some tests will check some internal perf support like: - - Check Arm CoreSight trace data recording and synthesized samples - Check Arm SPE trace data recording and synthesized samples - -Some others will actually use perf record and some test binaries that -are in tests/shell/coresight and will collect traces to ensure a -minimum level of functionality is met. The scripts that launch these -tests are in the same directory. These will all look like: - - CoreSight / ASM Pure Loop - CoreSight / Memcpy 16k 10 Threads - CoreSight / Thread Loop 10 Threads - Check TID - etc. - -These perf record tests will not run if the tool binaries do not exist -in tests/shell/coresight/*/ and will be skipped. If you do not have -CoreSight support in hardware then either do not build perf with -CoreSight support or remove these binaries in order to not have these -tests fail and have them skip instead. - -These tests will log historical results in the current working -directory (e.g. tools/perf) and will be named stats-*.csv like: - - stats-asm_pure_loop-out.csv - stats-memcpy_thread-16k_10.csv - ... - -These statistic files log some aspects of the AUX data sections in -the perf data output counting some numbers of certain encodings (a -good way to know that it's working in a very simple way). One problem -with CoreSight is that given a large enough amount of data needing to -be logged, some of it can be lost due to the processor not waking up -in time to read out all the data from buffers etc.. You will notice -that the amount of data collected can vary a lot per run of perf test. -If you wish to see how this changes over time, simply run perf test -multiple times and all these csv files will have more and more data -appended to it that you can later examine, graph and otherwise use to -figure out if things have become worse or better. - -This means sometimes these tests fail as they don't capture all the -data needed. This is about tracking quality and amount of data -produced over time and to see when changes to the Linux kernel improve -quality of traces. - -Be aware that some of these tests take quite a while to run, specifically -in processing the perf data file and dumping contents to then examine what -is inside. - -You can change where these csv logs are stored by setting the -PERF_TEST_CORESIGHT_STATDIR environment variable before running perf -test like:: - - export PERF_TEST_CORESIGHT_STATDIR=/var/tmp - perf test - -They will also store resulting perf output data in the current -directory for later inspection like:: - - perf-asm_pure_loop-out.data - perf-memcpy_thread-16k_10.data - ... - -You can alter where the perf data files are stored by setting the -PERF_TEST_CORESIGHT_DATADIR environment variable such as:: - - PERF_TEST_CORESIGHT_DATADIR=/var/tmp - perf test - -You may wish to set these above environment variables if you wish to -keep the output of tests outside of the current working directory for -longer term storage and examination. +There are a set of Perf tests for CoreSight which can be run with:: + + sudo perf test coresight diff --git a/MAINTAINERS b/MAINTAINERS index b539be153f6a..7efb893edcbb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2751,7 +2751,6 @@ F: tools/perf/arch/arm/util/cs-etm.h F: tools/perf/arch/arm/util/pmu.c F: tools/perf/tests/shell/*coresight* F: tools/perf/tests/shell/coresight/* -F: tools/perf/tests/shell/lib/*coresight* F: tools/perf/util/cs-etm-decoder/* F: tools/perf/util/cs-etm.*
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf index 4ac2a0cec9ee..e4f8c979f47b 100644 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@ -508,16 +508,7 @@ arm64-sysreg-defs-clean: $(Q)$(MAKE) -C $(arm64_gen_sysreg_dir) O=$(arm64_gen_sysreg_outdir) \ prefix= subdir= clean > /dev/null
-TESTS_CORESIGHT_DIR := $(srctree)/tools/perf/tests/shell/coresight - -tests-coresight-targets: FORCE - $(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR) - -tests-coresight-targets-clean: - $(call QUIET_CLEAN, coresight) - $(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR) O=$(OUTPUT) clean >/dev/null - -all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS) tests-coresight-targets +all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS)
# Create python binding output directory if not already present $(shell [ -d '$(OUTPUT)python' ] || mkdir -p '$(OUTPUT)python') @@ -896,7 +887,6 @@ install-tests: all install-gtk $(INSTALL) tests/shell/base_report/*.txt '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/base_report'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight' ; \ $(INSTALL) tests/shell/coresight/*.sh '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight' - $(Q)$(MAKE) -C tests/shell/coresight install-tests
install-bin: install-tools install-tests
@@ -939,7 +929,7 @@ endif
clean:: $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBSYMBOL)-clean $(LIBPERF)-clean \ arm64-sysreg-defs-clean fixdep-clean python-clean bpf-skel-clean \ - tests-coresight-targets-clean pmu-events-clean + pmu-events-clean $(call QUIET_CLEAN, core-objs) $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive \ $(OUTPUT)perf-iostat $(LANG_BINDINGS) $(Q)find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '*.a' -delete -o \ diff --git a/tools/perf/tests/shell/coresight/Makefile b/tools/perf/tests/shell/coresight/Makefile deleted file mode 100644 index fa08fd9a5991..000000000000 --- a/tools/perf/tests/shell/coresight/Makefile +++ /dev/null @@ -1,29 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0-only -# Carsten Haitzler carsten.haitzler@arm.com, 2021 -include ../../../../../tools/scripts/Makefile.include -include ../../../../../tools/scripts/Makefile.arch -include ../../../../../tools/scripts/utilities.mak - -SUBDIRS = \ - asm_pure_loop \ - memcpy_thread \ - thread_loop \ - unroll_loop_thread - -all: $(SUBDIRS) -$(SUBDIRS): - @$(MAKE) -C $@ >/dev/null - -INSTALLDIRS = $(SUBDIRS:%=install-%) - -install-tests: $(INSTALLDIRS) -$(INSTALLDIRS): - @$(MAKE) -C $(@:install-%=%) install-tests >/dev/null - -CLEANDIRS = $(SUBDIRS:%=clean-%) - -clean: $(CLEANDIRS) -$(CLEANDIRS): - $(call QUIET_CLEAN, test-$(@:clean-%=%)) $(MAKE) -C $(@:clean-%=%) clean >/dev/null - -.PHONY: all clean $(SUBDIRS) $(CLEANDIRS) $(INSTALLDIRS) diff --git a/tools/perf/tests/shell/coresight/Makefile.miniconfig b/tools/perf/tests/shell/coresight/Makefile.miniconfig deleted file mode 100644 index 5f72a9cb43f3..000000000000 --- a/tools/perf/tests/shell/coresight/Makefile.miniconfig +++ /dev/null @@ -1,14 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0-only -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -ifndef DESTDIR -prefix ?= $(HOME) -endif - -DESTDIR_SQ = $(subst ',''',$(DESTDIR)) -INSTALL = install -INSTDIR_SUB = tests/shell/coresight - -include ../../../../../scripts/Makefile.include -include ../../../../../scripts/Makefile.arch -include ../../../../../scripts/utilities.mak diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore b/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore deleted file mode 100644 index 468673ac32e8..000000000000 --- a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore +++ /dev/null @@ -1 +0,0 @@ -asm_pure_loop diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile b/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile deleted file mode 100644 index 206849e92bc9..000000000000 --- a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile +++ /dev/null @@ -1,34 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -include ../Makefile.miniconfig - -# Binary to produce -BIN=asm_pure_loop -# Any linking/libraries needed for the binary - empty if none needed -LIB= - -all: $(BIN) - -$(BIN): $(BIN).S -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Build line - this is raw asm with no libc to have an always exact binary - $(Q)$(CC) $(BIN).S -nostdlib -static -o $(BIN) $(LIB) -endif -endif - -install-tests: all -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Install the test tool in the right place - $(call QUIET_INSTALL, tests) \ - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \ - $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)' -endif -endif - -clean: - $(Q)$(RM) -f $(BIN) - -.PHONY: all clean install-tests diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S b/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S deleted file mode 100644 index 577760046772..000000000000 --- a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S +++ /dev/null @@ -1,30 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* Tamas Zsoldos tamas.zsoldos@arm.com, 2021 */ - -.globl _start -_start: - mov x0, 0x0000ffff - mov x1, xzr -loop: - nop - nop - cbnz x1, noskip - nop - nop - adrp x2, skip - add x2, x2, :lo12:skip - br x2 - nop - nop -noskip: - nop - nop -skip: - sub x0, x0, 1 - cbnz x0, loop - - mov x0, #0 - mov x8, #93 // __NR_exit syscall - svc #0 - -.section .note.GNU-stack, "", @progbits diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore b/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore deleted file mode 100644 index f8217e56091e..000000000000 --- a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore +++ /dev/null @@ -1 +0,0 @@ -memcpy_thread diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile b/tools/perf/tests/shell/coresight/memcpy_thread/Makefile deleted file mode 100644 index 2db637eb2c26..000000000000 --- a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile +++ /dev/null @@ -1,33 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 -include ../Makefile.miniconfig - -# Binary to produce -BIN=memcpy_thread -# Any linking/libraries needed for the binary - empty if none needed -LIB=-pthread - -all: $(BIN) - -$(BIN): $(BIN).c -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Build line - $(Q)$(CC) $(BIN).c -o $(BIN) $(LIB) -endif -endif - -install-tests: all -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Install the test tool in the right place - $(call QUIET_INSTALL, tests) \ - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \ - $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)' -endif -endif - -clean: - $(Q)$(RM) -f $(BIN) - -.PHONY: all clean install-tests diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c b/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c deleted file mode 100644 index 7e879217be30..000000000000 --- a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c +++ /dev/null @@ -1,80 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -// Carsten Haitzler carsten.haitzler@arm.com, 2021 -#include <stdio.h> -#include <stdlib.h> -#include <unistd.h> -#include <string.h> -#include <pthread.h> - -struct args { - unsigned long loops; - unsigned long size; - pthread_t th; - void *ret; -}; - -static void *thrfn(void *arg) -{ - struct args *a = arg; - unsigned long i, len = a->loops; - unsigned char *src, *dst; - - src = malloc(a->size * 1024); - dst = malloc(a->size * 1024); - if ((!src) || (!dst)) { - printf("ERR: Can't allocate memory\n"); - exit(1); - } - for (i = 0; i < len; i++) - memcpy(dst, src, a->size * 1024); - - return NULL; -} - -static pthread_t new_thr(void *(*fn) (void *arg), void *arg) -{ - pthread_t t; - pthread_attr_t attr; - - pthread_attr_init(&attr); - pthread_create(&t, &attr, fn, arg); - return t; -} - -int main(int argc, char **argv) -{ - unsigned long i, len, size, thr; - struct args args[256]; - long long v; - - if (argc < 4) { - printf("ERR: %s [copysize Kb] [numthreads] [numloops (hundreds)]\n", argv[0]); - exit(1); - } - - v = atoll(argv[1]); - if ((v < 1) || (v > (1024 * 1024))) { - printf("ERR: max memory 1GB (1048576 KB)\n"); - exit(1); - } - size = v; - thr = atol(argv[2]); - if ((thr < 1) || (thr > 256)) { - printf("ERR: threads 1-256\n"); - exit(1); - } - v = atoll(argv[3]); - if ((v < 1) || (v > 40000000000ll)) { - printf("ERR: loops 1-40000000000 (hundreds)\n"); - exit(1); - } - len = v * 100; - for (i = 0; i < thr; i++) { - args[i].loops = len; - args[i].size = size; - args[i].th = new_thr(thrfn, &(args[i])); - } - for (i = 0; i < thr; i++) - pthread_join(args[i].th, &(args[i].ret)); - return 0; -} diff --git a/tools/perf/tests/shell/coresight/thread_loop/.gitignore b/tools/perf/tests/shell/coresight/thread_loop/.gitignore deleted file mode 100644 index 6d4c33eaa9e8..000000000000 --- a/tools/perf/tests/shell/coresight/thread_loop/.gitignore +++ /dev/null @@ -1 +0,0 @@ -thread_loop diff --git a/tools/perf/tests/shell/coresight/thread_loop/Makefile b/tools/perf/tests/shell/coresight/thread_loop/Makefile deleted file mode 100644 index ea846c038e7a..000000000000 --- a/tools/perf/tests/shell/coresight/thread_loop/Makefile +++ /dev/null @@ -1,33 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 -include ../Makefile.miniconfig - -# Binary to produce -BIN=thread_loop -# Any linking/libraries needed for the binary - empty if none needed -LIB=-pthread - -all: $(BIN) - -$(BIN): $(BIN).c -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Build line - $(Q)$(CC) $(BIN).c -o $(BIN) $(LIB) -endif -endif - -install-tests: all -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Install the test tool in the right place - $(call QUIET_INSTALL, tests) \ - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \ - $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)' -endif -endif - -clean: - $(Q)$(RM) -f $(BIN) - -.PHONY: all clean install-tests diff --git a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c b/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c deleted file mode 100644 index 86f3f548b006..000000000000 --- a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c +++ /dev/null @@ -1,85 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -// Carsten Haitzler carsten.haitzler@arm.com, 2021 - -// define this for gettid() -#define _GNU_SOURCE - -#include <stdio.h> -#include <stdlib.h> -#include <unistd.h> -#include <string.h> -#include <pthread.h> -#include <sys/syscall.h> -#ifndef SYS_gettid -// gettid is 178 on arm64 -# define SYS_gettid 178 -#endif -#define gettid() syscall(SYS_gettid) - -struct args { - unsigned int loops; - pthread_t th; - void *ret; -}; - -static void *thrfn(void *arg) -{ - struct args *a = arg; - int i = 0, len = a->loops; - - if (getenv("SHOW_TID")) { - unsigned long long tid = gettid(); - - printf("%llu\n", tid); - } - asm volatile( - "loop:\n" - "add %w[i], %w[i], #1\n" - "cmp %w[i], %w[len]\n" - "blt loop\n" - : /* out */ - : /* in */ [i] "r" (i), [len] "r" (len) - : /* clobber */ - ); - return (void *)(long)i; -} - -static pthread_t new_thr(void *(*fn) (void *arg), void *arg) -{ - pthread_t t; - pthread_attr_t attr; - - pthread_attr_init(&attr); - pthread_create(&t, &attr, fn, arg); - return t; -} - -int main(int argc, char **argv) -{ - unsigned int i, len, thr; - struct args args[256]; - - if (argc < 3) { - printf("ERR: %s [numthreads] [numloops (millions)]\n", argv[0]); - exit(1); - } - - thr = atoi(argv[1]); - if ((thr < 1) || (thr > 256)) { - printf("ERR: threads 1-256\n"); - exit(1); - } - len = atoi(argv[2]); - if ((len < 1) || (len > 4000)) { - printf("ERR: max loops 4000 (millions)\n"); - exit(1); - } - len *= 1000000; - for (i = 0; i < thr; i++) { - args[i].loops = len; - args[i].th = new_thr(thrfn, &(args[i])); - } - for (i = 0; i < thr; i++) - pthread_join(args[i].th, &(args[i].ret)); - return 0; -} diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore b/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore deleted file mode 100644 index 2cb4e996dbf3..000000000000 --- a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore +++ /dev/null @@ -1 +0,0 @@ -unroll_loop_thread diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile b/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile deleted file mode 100644 index 6264c4e3abd1..000000000000 --- a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile +++ /dev/null @@ -1,33 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 -include ../Makefile.miniconfig - -# Binary to produce -BIN=unroll_loop_thread -# Any linking/libraries needed for the binary - empty if none needed -LIB=-pthread - -all: $(BIN) - -$(BIN): $(BIN).c -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Build line - $(Q)$(CC) $(BIN).c -o $(BIN) $(LIB) -endif -endif - -install-tests: all -ifdef CORESIGHT -ifeq ($(ARCH),arm64) -# Install the test tool in the right place - $(call QUIET_INSTALL, tests) \ - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \ - $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)' -endif -endif - -clean: - $(Q)$(RM) -f $(BIN) - -.PHONY: all clean install-tests diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c b/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c deleted file mode 100644 index 8f4e1c985ca3..000000000000 --- a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c +++ /dev/null @@ -1,75 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -// Carsten Haitzler carsten.haitzler@arm.com, 2021 -#include <stdio.h> -#include <stdlib.h> -#include <unistd.h> -#include <string.h> -#include <pthread.h> - -struct args { - pthread_t th; - unsigned int in; - void *ret; -}; - -static void *thrfn(void *arg) -{ - struct args *a = arg; - unsigned int i, in = a->in; - - for (i = 0; i < 10000; i++) { - asm volatile ( -// force an unroll of thia add instruction so we can test long runs of code -#define SNIP1 "add %w[in], %w[in], #1\n" -// 10 -#define SNIP2 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 -// 100 -#define SNIP3 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 -// 1000 -#define SNIP4 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 -// 10000 -#define SNIP5 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 -// 100000 - SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 - : /* out */ - : /* in */ [in] "r" (in) - : /* clobber */ - ); - } - - return NULL; -} - -static pthread_t new_thr(void *(*fn) (void *arg), void *arg) -{ - pthread_t t; - pthread_attr_t attr; - - pthread_attr_init(&attr); - pthread_create(&t, &attr, fn, arg); - return t; -} - -int main(int argc, char **argv) -{ - unsigned int i, thr; - struct args args[256]; - - if (argc < 2) { - printf("ERR: %s [numthreads]\n", argv[0]); - exit(1); - } - - thr = atoi(argv[1]); - if ((thr > 256) || (thr < 1)) { - printf("ERR: threads 1-256\n"); - exit(1); - } - for (i = 0; i < thr; i++) { - args[i].in = rand(); - args[i].th = new_thr(thrfn, &(args[i])); - } - for (i = 0; i < thr; i++) - pthread_join(args[i].th, &(args[i].ret)); - return 0; -} diff --git a/tools/perf/tests/shell/lib/coresight.sh b/tools/perf/tests/shell/lib/coresight.sh deleted file mode 100644 index 184d62e7e5bd..000000000000 --- a/tools/perf/tests/shell/lib/coresight.sh +++ /dev/null @@ -1,134 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 -# Carsten Haitzler carsten.haitzler@arm.com, 2021 - -# This is sourced from a driver script so no need for #!/bin... etc. at the -# top - the assumption below is that it runs as part of sourcing after the -# test sets up some basic env vars to say what it is. - -# This currently works with ETMv4 / ETF not any other packet types at thi -# point. This will need changes if that changes. - -# perf record options for the perf tests to use -PERFRECMEM="-m ,16M" -PERFRECOPT="$PERFRECMEM -e cs_etm//u" - -TOOLS=$(dirname $0) -DIR="$TOOLS/$TEST" -BIN="$DIR/$TEST" -# If the test tool/binary does not exist and is executable then skip the test -if ! test -x "$BIN"; then exit 2; fi -# If CoreSight is not available, skip the test -perf list pmu | grep -q cs_etm || exit 2 -DATD="." -# If the data dir env is set then make the data dir use that instead of ./ -if test -n "$PERF_TEST_CORESIGHT_DATADIR"; then - DATD="$PERF_TEST_CORESIGHT_DATADIR"; -fi -# If the stat dir env is set then make the data dir use that instead of ./ -STATD="." -if test -n "$PERF_TEST_CORESIGHT_STATDIR"; then - STATD="$PERF_TEST_CORESIGHT_STATDIR"; -fi - -# Called if the test fails - error code 1 -err() { - echo "$1" - exit 1 -} - -# Check that some statistics from our perf -check_val_min() { - STATF="$4" - if test "$2" -lt "$3"; then - echo ", FAILED" >> "$STATF" - err "Sanity check number of $1 is too low ($2 < $3)" - fi -} - -perf_dump_aux_verify() { - # Some basic checking that the AUX chunk contains some sensible data - # to see that we are recording something and at least a minimum - # amount of it. We should almost always see Fn packets in just about - # anything but certainly we will see some trace info and async - # packets - DUMP="$DATD/perf-tmp-aux-dump.txt" - perf report --stdio --dump -i "$1" | \ - grep -o -e I_ATOM_F -e I_ASYNC -e I_TRACE_INFO > "$DUMP" - # Simply count how many of these packets we find to see that we are - # producing a reasonable amount of data - exact checks are not sane - # as this is a lossy process where we may lose some blocks and the - # compiler may produce different code depending on the compiler and - # optimization options, so this is rough just to see if we're - # either missing almost all the data or all of it - ATOM_FX_NUM=$(grep -c I_ATOM_F "$DUMP") - ASYNC_NUM=$(grep -c I_ASYNC "$DUMP") - TRACE_INFO_NUM=$(grep -c I_TRACE_INFO "$DUMP") - rm -f "$DUMP" - - # Arguments provide minimums for a pass - CHECK_FX_MIN="$2" - CHECK_ASYNC_MIN="$3" - CHECK_TRACE_INFO_MIN="$4" - - # Write out statistics, so over time you can track results to see if - # there is a pattern - for example we have less "noisy" results that - # produce more consistent amounts of data each run, to see if over - # time any techinques to minimize data loss are having an effect or - # not - STATF="$STATD/stats-$TEST-$DATV.csv" - if ! test -f "$STATF"; then - echo "ATOM Fx Count, Minimum, ASYNC Count, Minimum, TRACE INFO Count, Minimum" > "$STATF" - fi - echo -n "$ATOM_FX_NUM, $CHECK_FX_MIN, $ASYNC_NUM, $CHECK_ASYNC_MIN, $TRACE_INFO_NUM, $CHECK_TRACE_INFO_MIN" >> "$STATF" - - # Actually check to see if we passed or failed. - check_val_min "ATOM_FX" "$ATOM_FX_NUM" "$CHECK_FX_MIN" "$STATF" - check_val_min "ASYNC" "$ASYNC_NUM" "$CHECK_ASYNC_MIN" "$STATF" - check_val_min "TRACE_INFO" "$TRACE_INFO_NUM" "$CHECK_TRACE_INFO_MIN" "$STATF" - echo ", Ok" >> "$STATF" -} - -perf_dump_aux_tid_verify() { - # Specifically crafted test will produce a list of Tread ID's to - # stdout that need to be checked to see that they have had trace - # info collected in AUX blocks in the perf data. This will go - # through all the TID's that are listed as CID=0xabcdef and see - # that all the Thread IDs the test tool reports are in the perf - # data AUX chunks - - # The TID test tools will print a TID per stdout line that are being - # tested - TIDS=$(cat "$2") - # Scan the perf report to find the TIDs that are actually CID in hex - # and build a list of the ones found - FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \ - grep -o "CID=0x[0-9a-z]+" | sed 's/CID=//g' | \ - uniq | sort | uniq) - # No CID=xxx found - maybe your kernel is reporting these as - # VMID=xxx so look there - if test -z "$FOUND_TIDS"; then - FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \ - grep -o "VMID=0x[0-9a-z]+" | sed 's/VMID=//g' | \ - uniq | sort | uniq) - fi - - # Iterate over the list of TIDs that the test says it has and find - # them in the TIDs found in the perf report - MISSING="" - for TID2 in $TIDS; do - FOUND="" - for TIDHEX in $FOUND_TIDS; do - TID=$(printf "%i" $TIDHEX) - if test "$TID" -eq "$TID2"; then - FOUND="y" - break - fi - done - if test -z "$FOUND"; then - MISSING="$MISSING $TID" - fi - done - if test -n "$MISSING"; then - err "Thread IDs $MISSING not found in perf AUX data" - fi -}
Hits in modules return empty disassembly with vmlinux as an input to objdump. Make the disassembly test more reliable by always using kcore. And update the comments to say that this is supported by the script.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/scripts/python/arm-cs-trace-disasm.py | 20 ++++++++++---------- tools/perf/tests/shell/test_arm_coresight_disasm.sh | 2 +- 2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py index ba208c90d631..8f6fa4a007b4 100755 --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py @@ -18,29 +18,29 @@ from perf_trace_context import perf_sample_srccode, perf_config_get
# Below are some example commands for using this script. # Note a --kcore recording is required for accurate decode -# due to the alternatives patching mechanism. However this -# script only supports reading vmlinux for disassembly dump, -# meaning that any patched instructions will appear -# as unpatched, but the instruction ranges themselves will -# be correct. In addition to this, source line info comes -# from Perf, and when using kcore there is no debug info. The -# following lists the supported features in each mode: +# due to the alternatives patching mechanism. In addition to this, +# source line info comes from Perf, and when using kcore there is +# no debug info. The following lists the supported features in each mode: # # +-----------+-----------------+------------------+------------------+ # | Recording | Accurate decode | Source line dump | Disassembly dump | # +-----------+-----------------+------------------+------------------+ # | --kcore | yes | no | yes | -# | normal | no | yes | yes | +# | normal | no | yes (inaccurate) | yes (inaccurate) | # +-----------+-----------------+------------------+------------------+ # # Output disassembly with objdump and auto detect vmlinux -# (when running on same machine.) +# (when running on same machine.): # perf script -s scripts/python/arm-cs-trace-disasm.py -d # # Output disassembly with llvm-objdump: # perf script -s scripts/python/arm-cs-trace-disasm.py \ # -- -d llvm-objdump-11 -k path/to/vmlinux # +# Output accurate disassembly by passing kcore to script: +# perf script -s scripts/python/arm-cs-trace-disasm.py \ +# -- -d -k perf.data/kcore_dir/kcore +# # Output only source line and symbols: # perf script -s scripts/python/arm-cs-trace-disasm.py
@@ -57,7 +57,7 @@ def int_arg(v):
args = argparse.ArgumentParser() args.add_argument("-k", "--vmlinux", - help="Set path to vmlinux file. Omit to autodetect if running on same machine") + help="Set path to vmlinux or kcore file. Omit to autodetect if running on same machine") args.add_argument("-d", "--objdump", nargs="?", const=default_objdump(), help="Show disassembly. Can also be used to change the objdump path"), args.add_argument("-v", "--verbose", action="store_true", help="Enable debugging log") diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh index 0dfb4fadf531..8b5c60a09012 100755 --- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh +++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh @@ -46,7 +46,7 @@ if [ -e /proc/kcore ]; then echo "Testing kernel disassembly" perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1 perf script -i ${perfdata} -s python:${script_path} -- \ - -d --stop-sample=30 2> /dev/null > ${file} + -d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file} grep -q -e ${branch_search} ${file} echo "Found kernel branches" else
If we reduce the number of samples searched to speed up the test, then there will be less chance of hitting one of these branches. Extend the regex to cover all branches so the test will always pass.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/test_arm_coresight_disasm.sh | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh index 8b5c60a09012..5ee87eb7973e 100755 --- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh +++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh @@ -38,8 +38,7 @@ cleanup_files() trap cleanup_files EXIT TERM INT
# Ranges start and end on branches, so check for some likely branch instructions -sep="\s|\s" -branch_search="\sbl${sep}b${sep}b.ne${sep}b.eq${sep}cbz\s" +branch_search='[[:space:]](bl|b(.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al))?|br|blr|ret|cbz|cbnz|tbz|tbnz|svc|eret)([[:space:]]|$)'
## Test kernel ## if [ -e /proc/kcore ]; then @@ -47,7 +46,7 @@ if [ -e /proc/kcore ]; then perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1 perf script -i ${perfdata} -s python:${script_path} -- \ -d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file} - grep -q -e ${branch_search} ${file} + grep -q -E ${branch_search} ${file} echo "Found kernel branches" else # kcore is required for correct kernel decode due to runtime code patching @@ -59,7 +58,7 @@ echo "Testing userspace disassembly" perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1 perf script -i ${perfdata} -s python:${script_path} -- \ -d --stop-sample=30 2> /dev/null > ${file} -grep -q -e ${branch_search} ${file} +grep -q -E ${branch_search} ${file} echo "Found userspace branches"
glb_err=0
We can use exit snapshot to limit the amount of trace to decode here too. Also each call to objdump is quite expensive on kcore so limit it to 2 samples instead of 30. We only want to see if there is no data at all.
Signed-off-by: James Clark james.clark@linaro.org --- tools/perf/tests/shell/test_arm_coresight_disasm.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh index 5ee87eb7973e..2a400fd38a36 100755 --- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh +++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh @@ -43,9 +43,9 @@ branch_search='[[:space:]](bl|b(.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al) ## Test kernel ## if [ -e /proc/kcore ]; then echo "Testing kernel disassembly" - perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1 + perf record -o ${perfdata} -e cs_etm//k --kcore -Se -m,64K -- touch $file > /dev/null 2>&1 perf script -i ${perfdata} -s python:${script_path} -- \ - -d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file} + -d --stop-sample=2 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file} grep -q -E ${branch_search} ${file} echo "Found kernel branches" else @@ -55,9 +55,9 @@ fi
## Test user ## echo "Testing userspace disassembly" -perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1 +perf record -o ${perfdata} -e cs_etm//u -Se -m,64K -- touch $file > /dev/null 2>&1 perf script -i ${perfdata} -s python:${script_path} -- \ - -d --stop-sample=30 2> /dev/null > ${file} + -d --stop-sample=2 2> /dev/null > ${file} grep -q -E ${branch_search} ${file} echo "Found userspace branches"
There is a subfolder for Coresight tests so might as well keep them all in here.
Signed-off-by: James Clark james.clark@linaro.org --- MAINTAINERS | 1 - tools/perf/tests/shell/{ => coresight}/test_arm_coresight.sh | 0 tools/perf/tests/shell/{ => coresight}/test_arm_coresight_disasm.sh | 2 +- 3 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS index 7efb893edcbb..ff8935b459ea 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2749,7 +2749,6 @@ F: tools/perf/arch/arm/util/auxtrace.c F: tools/perf/arch/arm/util/cs-etm.c F: tools/perf/arch/arm/util/cs-etm.h F: tools/perf/arch/arm/util/pmu.c -F: tools/perf/tests/shell/*coresight* F: tools/perf/tests/shell/coresight/* F: tools/perf/util/cs-etm-decoder/* F: tools/perf/util/cs-etm.* diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/coresight/test_arm_coresight.sh similarity index 100% rename from tools/perf/tests/shell/test_arm_coresight.sh rename to tools/perf/tests/shell/coresight/test_arm_coresight.sh diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh similarity index 96% rename from tools/perf/tests/shell/test_arm_coresight_disasm.sh rename to tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh index 2a400fd38a36..b196aab709f8 100755 --- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh +++ b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh @@ -24,7 +24,7 @@ perfdata_dir=$(mktemp -d /tmp/__perf_test.perf.data.XXXXX) perfdata=${perfdata_dir}/perf.data file=$(mktemp /tmp/temporary_file.XXXXX) # Relative path works whether it's installed or running from repo -script_path=$(dirname "$0")/../../scripts/python/arm-cs-trace-disasm.py +script_path=$(dirname "$0")/../../../scripts/python/arm-cs-trace-disasm.py
cleanup_files() {