From: Wojciech Zmuda <wzmuda(a)n7space.com>
Perf allows changing where the buildid cache directory is created.
Mention it in the howto document.
Signed-off-by: Wojciech Zmuda <wzmuda(a)n7space.com>
---
HOWTO.md | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/HOWTO.md b/HOWTO.md
index b16294a..a8b5ce9 100644
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -633,4 +633,5 @@ Best regards,
[2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
-[3]: Get in touch with us if you know a way to change this.
+[3]: It can be changed with: perf-config:
+ perf config --system buildid.dir=/my/own/buildid/dir
--
2.11.0
Hi All,
I follow this guide to do test on i.MX8MP,
https://elinux.org/images/b/b3/Hardware_Assisted_Tracing_on_ARM.pdf
I could see dump with perf report --dump (attached),
but when I use perf report --stdio, I met
Error:
The perf.data data has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
The perf I use to dump data is not compiled with CORESIGHT
support, but the host perf has been compiled with CORESIGHT support.
Do you know what might cause this issue?
Thanks,
Peng.
Hello,
I'm exploring possibilities of tracing two concurrent programs pinned to two CPU cores, sharing the same sink. I tried to spawn two concurrent perf sessions but my results are not satisfying, so I wonder if such possibility exists at all.
# taskset -c 1 ./progA
# taskset -c 2 ./progB
# perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progA_pid
# perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progB_pid
If progA and progB mostly sleep - I get trace data for both, which is fine. However If at least one of the programs gets more CPU-intensive (loops with arithmetic computations inside, no explicit sleeping nor waiting or IO), I get trace only for the more intensive one. If both are intensive, it seems random which one gets traced.
This observations suggests ETR buffer overflow. However, if CPU-intensive versions of progA and progB are scheduled on the same CPU - this method seem to work. I would expect ETR buffer being insufficient in this scenario as well.
To enlarge the ETR buffer I experimented with the -m flag, but I'm unable to use more than -m,16M. Independent RSZ polling shows that this value gets programmed, but anything above 16 MB makes TMC-ETR driver complain that cma_alloc failed to get that amount of memory. Anyway, enlarging buffer to 16MB doesn't seem to affect my issue. With bigger buffer my observations are identical.
Those observations make me suspect that another technical obstacle might exists, beside possible buffer overflow.
I also tried with CPU-wide mode and it seem to work:
taskset -c 1 ./progA
taskset -c 2 ./progB
perf record -e cs_etm/timestamp,@tmc_etr0/u -C 2,3
but this approach is quite limited as filters don't work in CPU-wide mode and perf itself is also traced (which is weird, as I tried setting CPU affinity of perf-record with taskset as well - didn't help).
To wrap-up:
1. Is it possible to trace two programs with two perf-record sessions at the same time, sharing a sink?
2. Is it possible to enlarge TMC-ETR buffer above 16MB? I guess SG mode might be an option here, but as I can't really modify my kernel and DT right now. Perhaps there's a possibility to make the kernel allocator work past the 16MB boundary?
Thank you and best regards,
Wojciech
PS Sorry I don't proceed with the Coresight@Zynq MPSoC support I started some time ago. My access to the board is limited recently and it's hard to proceed with kernel development remotely. I hope to get back to it soon.
This patch series is to address issues for synthesizing instruction
samples, especially when the instruction sample period is small enough,
the current logic cannot synthesize multiple instruction samples within
one instruction range packet.
Patch 0001 is to swap packets for instruction samples, so this allow
option '--itrace=iNNN' can work well.
Patch 0002 avoids to reset the last branches for every instruction
sample; if reset the last branches for every time generating sample, the
later samples in the same range packet cannot use the last branches
anymore.
Patch 0003 is the fixing for handling different instruction periods,
especially for small sample period.
Patch 0004 is an optimization for copying last branches; it only copies
last branches once if the instruction samples share the same last
branches.
Patch 0005 is a minor fix for unsigned variable comparison to zero.
This patch set has been rebased on the latest perf/core branch; and
verified on Juno board with below commands:
# perf script --itrace=i2
# perf script --itrace=i2il16
# perf inject --itrace=i2il16 -i perf.data -o perf.data.new
# perf inject --itrace=i100il16 -i perf.data -o perf.data.new
Changes from v4:
* Added Mike's review tag for patch 03;
* Added Mathieu's review tags for all patches.
Changes from v3:
* Refactored patch 0001 with new function cs_etm__packet_swap() (Mike);
* Refined instruction sample generation flow with single while loop,
which completely uses Mike's suggestions (Mike);
* Added Mike's review tags for patch 01/02/04/05.
Changes from v2:
* Added patch 0001 which is to fix swapping packets for instruction
samples;
* Refined minor commit logs and comments;
* Rebased on the latest perf/core branch.
Changes from v1:
* Rebased patch set on perf/core branch with latest commit 9fec3cd5fa4a
("perf map: Check if the map still has some refcounts on exit").
Leo Yan (5):
perf cs-etm: Swap packets for instruction samples
perf cs-etm: Continuously record last branch
perf cs-etm: Correct synthesizing instruction samples
perf cs-etm: Optimize copying last branches
perf cs-etm: Fix unsigned variable comparison to zero
tools/perf/util/cs-etm.c | 157 +++++++++++++++++++++++++++------------
1 file changed, 111 insertions(+), 46 deletions(-)
--
2.17.1
From: Leo Yan <leo.yan(a)linaro.org>
If an instruction range packet can generate multiple instruction
samples, these samples share the same last branches; it's not necessary
to copy the same last branches repeatedly for these samples within the
same packet.
This patch moves out the last branches copying from function
cs_etm__synth_instruction_sample(), and execute it prior to generating
instruction samples.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Reviewed-by: Mike Leach <mike.leach(a)linaro.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Robert Walker <robert.walker(a)arm.com>
Cc: Suzuki Poulouse <suzuki.poulose(a)arm.com>
Cc: coresight ml <coresight(a)lists.linaro.org>
Cc: linux-arm-kernel(a)lists.infradead.org
Link: http://lore.kernel.org/lkml/20200219021811.20067-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
---
tools/perf/util/cs-etm.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 1ddcc67e13dd..87d9943177bc 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1151,10 +1151,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
- if (etm->synth_opts.last_branch) {
- cs_etm__copy_last_branch_rb(etmq, tidq);
+ if (etm->synth_opts.last_branch)
sample.branch_stack = tidq->last_branch;
- }
if (etm->synth_opts.inject) {
ret = cs_etm__inject_event(event, &sample,
@@ -1431,6 +1429,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
u64 offset = etm->instructions_sample_period - instrs_prev;
u64 addr;
+ /* Prepare last branches for instruction sample */
+ if (etm->synth_opts.last_branch)
+ cs_etm__copy_last_branch_rb(etmq, tidq);
+
while (tidq->period_instructions >=
etm->instructions_sample_period) {
/*
@@ -1508,6 +1510,11 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
if (etmq->etm->synth_opts.last_branch &&
tidq->prev_packet->sample_type == CS_ETM_RANGE) {
+ u64 addr;
+
+ /* Prepare last branches for instruction sample */
+ cs_etm__copy_last_branch_rb(etmq, tidq);
+
/*
* Generate a last branch event for the branches left in the
* circular buffer at the end of the trace.
@@ -1515,7 +1522,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
* Use the address of the end of the last reported execution
* range
*/
- u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
+ addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
etmq, tidq, addr,
@@ -1560,11 +1567,16 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
*/
if (etmq->etm->synth_opts.last_branch &&
tidq->prev_packet->sample_type == CS_ETM_RANGE) {
+ u64 addr;
+
+ /* Prepare last branches for instruction sample */
+ cs_etm__copy_last_branch_rb(etmq, tidq);
+
/*
* Use the address of the end of the last reported execution
* range.
*/
- u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
+ addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
etmq, tidq, addr,
--
2.21.1
From: Leo Yan <leo.yan(a)linaro.org>
When 'etm->instructions_sample_period' is less than
'tidq->period_instructions', the function cs_etm__sample() cannot handle
this case properly with its logic.
Let's see below flow as an example:
- If we set itrace option '--itrace=i4', then function cs_etm__sample()
has variables with initialized values:
tidq->period_instructions = 0
etm->instructions_sample_period = 4
- When the first packet is coming:
packet->instr_count = 10; the number of instructions executed in this
packet is 10, thus update period_instructions as below:
tidq->period_instructions = 0 + 10 = 10
instrs_over = 10 - 4 = 6
offset = 10 - 6 - 1 = 3
tidq->period_instructions = instrs_over = 6
- When the second packet is coming:
packet->instr_count = 10; in the second pass, assume 10 instructions
in the trace sample again:
tidq->period_instructions = 6 + 10 = 16
instrs_over = 16 - 4 = 12
offset = 10 - 12 - 1 = -3 -> the negative value
tidq->period_instructions = instrs_over = 12
So after handle these two packets, there have below issues:
The first issue is that cs_etm__instr_addr() returns the address within
the current trace sample of the instruction related to offset, so the
offset is supposed to be always unsigned value. But in fact, function
cs_etm__sample() might calculate a negative offset value (in handling
the second packet, the offset is -3) and pass to cs_etm__instr_addr()
with u64 type with a big positive integer.
The second issue is it only synthesizes 2 samples for sample period = 4.
In theory, every packet has 10 instructions so the two packets have
total 20 instructions, 20 instructions should generate 5 samples
(4 x 5 = 20). This is because cs_etm__sample() only calls once
cs_etm__synth_instruction_sample() to generate instruction sample per
range packet.
This patch fixes the logic in function cs_etm__sample(); the basic
idea for handling coming packet is:
- To synthesize the first instruction sample, it combines the left
instructions from the previous packet and the head of the new
packet; then generate continuous samples with sample period;
- At the tail of the new packet, if it has the rest instructions,
these instructions will be left for the sequential sample.
Suggested-by: Mike Leach <mike.leach(a)linaro.org>
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Reviewed-by: Mike Leach <mike.leach(a)linaro.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Robert Walker <robert.walker(a)arm.com>
Cc: Suzuki Poulouse <suzuki.poulose(a)arm.com>
Cc: coresight ml <coresight(a)lists.linaro.org>
Cc: linux-arm-kernel(a)lists.infradead.org
Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
---
tools/perf/util/cs-etm.c | 87 ++++++++++++++++++++++++++++++++--------
1 file changed, 70 insertions(+), 17 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 2c4156c5ed09..1ddcc67e13dd 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1358,9 +1358,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
struct cs_etm_auxtrace *etm = etmq->etm;
int ret;
u8 trace_chan_id = tidq->trace_chan_id;
- u64 instrs_executed = tidq->packet->instr_count;
+ u64 instrs_prev;
- tidq->period_instructions += instrs_executed;
+ /* Get instructions remainder from previous packet */
+ instrs_prev = tidq->period_instructions;
+
+ tidq->period_instructions += tidq->packet->instr_count;
/*
* Record a branch when the last instruction in
@@ -1378,26 +1381,76 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
* TODO: allow period to be defined in cycles and clock time
*/
- /* Get number of instructions executed after the sample point */
- u64 instrs_over = tidq->period_instructions -
- etm->instructions_sample_period;
+ /*
+ * Below diagram demonstrates the instruction samples
+ * generation flows:
+ *
+ * Instrs Instrs Instrs Instrs
+ * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3)
+ * | | | |
+ * V V V V
+ * --------------------------------------------------
+ * ^ ^
+ * | |
+ * Period Period
+ * instructions(Pi) instructions(Pi')
+ *
+ * | |
+ * \---------------- -----------------/
+ * V
+ * tidq->packet->instr_count
+ *
+ * Instrs Sample(n...) are the synthesised samples occurring
+ * every etm->instructions_sample_period instructions - as
+ * defined on the perf command line. Sample(n) is being the
+ * last sample before the current etm packet, n+1 to n+3
+ * samples are generated from the current etm packet.
+ *
+ * tidq->packet->instr_count represents the number of
+ * instructions in the current etm packet.
+ *
+ * Period instructions (Pi) contains the the number of
+ * instructions executed after the sample point(n) from the
+ * previous etm packet. This will always be less than
+ * etm->instructions_sample_period.
+ *
+ * When generate new samples, it combines with two parts
+ * instructions, one is the tail of the old packet and another
+ * is the head of the new coming packet, to generate
+ * sample(n+1); sample(n+2) and sample(n+3) consume the
+ * instructions with sample period. After sample(n+3), the rest
+ * instructions will be used by later packet and it is assigned
+ * to tidq->period_instructions for next round calculation.
+ */
/*
- * Calculate the address of the sampled instruction (-1 as
- * sample is reported as though instruction has just been
- * executed, but PC has not advanced to next instruction)
+ * Get the initial offset into the current packet instructions;
+ * entry conditions ensure that instrs_prev is less than
+ * etm->instructions_sample_period.
*/
- u64 offset = (instrs_executed - instrs_over - 1);
- u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
- tidq->packet, offset);
+ u64 offset = etm->instructions_sample_period - instrs_prev;
+ u64 addr;
- ret = cs_etm__synth_instruction_sample(
- etmq, tidq, addr, etm->instructions_sample_period);
- if (ret)
- return ret;
+ while (tidq->period_instructions >=
+ etm->instructions_sample_period) {
+ /*
+ * Calculate the address of the sampled instruction (-1
+ * as sample is reported as though instruction has just
+ * been executed, but PC has not advanced to next
+ * instruction)
+ */
+ addr = cs_etm__instr_addr(etmq, trace_chan_id,
+ tidq->packet, offset - 1);
+ ret = cs_etm__synth_instruction_sample(
+ etmq, tidq, addr,
+ etm->instructions_sample_period);
+ if (ret)
+ return ret;
- /* Carry remaining instructions into next sample period */
- tidq->period_instructions = instrs_over;
+ offset += etm->instructions_sample_period;
+ tidq->period_instructions -=
+ etm->instructions_sample_period;
+ }
}
if (etm->sample_branches) {
--
2.21.1
From: Leo Yan <leo.yan(a)linaro.org>
Every time synthesize instruction sample, the last branch recording will
be reset. This is fine if the instruction period is big enough, for
example if use the option '--itrace=i100000', the last branch array is
reset for every sample with 100000 instructions per period; before
generate the next instruction sample, there has the sufficient packets
coming to fill the last branch array.
On the other hand, if set a very small period, the packets will be
significantly reduced between two continuous instruction samples, thus
the last branch array is almost empty for new instruction sample by
frequently resetting.
To allow the last branches to work properly for any instruction periods,
this patch avoids to reset the last branch for every instruction sample
and only reset it when flush the trace data. The last branches will be
reset only for two cases, one is for trace starting, another case is for
discontinuous trace; other cases can keep recording last branches for
continuous instruction samples.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Reviewed-by: Mike Leach <mike.leach(a)linaro.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Robert Walker <robert.walker(a)arm.com>
Cc: Suzuki Poulouse <suzuki.poulose(a)arm.com>
Cc: coresight ml <coresight(a)lists.linaro.org>
Cc: linux-arm-kernel(a)lists.infradead.org
Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
---
tools/perf/util/cs-etm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 294b09cfb034..2c4156c5ed09 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1170,9 +1170,6 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
"CS ETM Trace: failed to deliver instruction event, error %d\n",
ret);
- if (etm->synth_opts.last_branch)
- cs_etm__reset_last_branch_rb(tidq);
-
return ret;
}
@@ -1487,6 +1484,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
swap_packet:
cs_etm__packet_swap(etm, tidq);
+ /* Reset last branches after flush the trace */
+ if (etm->synth_opts.last_branch)
+ cs_etm__reset_last_branch_rb(tidq);
+
return err;
}
--
2.21.1
Hi,
Do you have Cmake based build or bitbake recipe for OpenCSD project?
My project is CMAKE based.
I want to integrate OpenCSD with it.
Regards,
Subhasish