On Wed, Feb 05, 2020 at 04:09:01PM +0000, Mike Leach wrote:
Hi Leo,
There are a couple of typos in the comments below, but I also believe that the sample loop could be considerably simplified
On Mon, 3 Feb 2020 at 01:52, Leo Yan leo.yan@linaro.org wrote:
When 'etm->instructions_sample_period' is less than 'tidq->period_instructions', the function cs_etm__sample() cannot handle this case properly with its logic.
Let's see below flow as an example:
If we set itrace option '--itrace=i4', then function cs_etm__sample() has variables with initialized values:
tidq->period_instructions = 0 etm->instructions_sample_period = 4
When the first packet is coming:
packet->instr_count = 10; the number of instructions executed in this packet is 10, thus update period_instructions as below:
tidq->period_instructions = 0 + 10 = 10 instrs_over = 10 - 4 = 6 offset = 10 - 6 - 1 = 3 tidq->period_instructions = instrs_over = 6
When the second packet is coming:
packet->instr_count = 10; in the second pass, assume 10 instructions in the trace sample again:
tidq->period_instructions = 6 + 10 = 16 instrs_over = 16 - 4 = 12 offset = 10 - 12 - 1 = -3 -> the negative value tidq->period_instructions = instrs_over = 12
So after handle these two packets, there have below issues:
The first issue is that cs_etm__instr_addr() returns the address within the current trace sample of the instruction related to offset, so the offset is supposed to be always unsigned value. But in fact, function cs_etm__sample() might calculate a negative offset value (in handling the second packet, the offset is -3) and pass to cs_etm__instr_addr() with u64 type with a big positive integer.
The second issue is it only synthesizes 2 samples for sample period = 4. In theory, every packet has 10 instructions so the two packets have total 20 instructions, 20 instructions should generate 5 samples (4 x 5 = 20). This is because cs_etm__sample() only calls once cs_etm__synth_instruction_sample() to generate instruction sample per range packet.
This patch fixes the logic in function cs_etm__sample(); the basic idea is to divide into three parts for handling coming packet:
- The first part is for synthesizing the first instruction sample, it combines the instructions from the tail of previous packet and the instructions from the head of the new packet;
- The second part is to simply generate samples with sample period aligned;
- The third part is the tail of new packet, the rest instructions will be left for the sequential sample handling.
Suggested-by: Mike Leach mike.leach@linaro.org Signed-off-by: Leo Yan leo.yan@linaro.org
tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++----- 1 file changed, 92 insertions(+), 13 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 3e28462609e7..c5a05f728eac 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * TODO: allow period to be defined in cycles and clock time */
/* Get number of instructions executed after the sample point */
u64 instrs_over = tidq->period_instructions -
etm->instructions_sample_period;
/*
* Below diagram demonstrates the instruction samples
* generation flows:
*
* Instrs Instrs Instrs Instrs
* Sample(n) Sample(n+1) Sample(n+2) Sample(n+3)
* | | | |
* V V V V
* --------------------------------------------------
* ^ ^
* | |
* Period Period
* instructions(Pi) instructions(Pi')
*
* | |
* \---------------- -----------------/
* V
* instrs_executed
*
* Period instructions (Pi) contains the the number of
* instructions executed after the sample point(n). When a new
* instruction packet is coming and generate for the next sample
* (n+1), it combines with two parts instructions, one is the
* tail of the old packet and another is the head of the new
* coming packet. So 'head' variable is used to cauclate the
typo : s/cauclate/calculate
Used checkpatch.pl but didn't see any complaints for this.
Thanks for pointing out and will fix it.
* instruction numbers in the new packet for sample(n+1).
*
* Sample(n+2) and sample(n+3) consume the instructions with
* sample period, so directly generate samples based on the
* sampe period.
*
typo: s/sampe/sample
Will fix.
* After sample(n+3), the rest instructions will be used by
* later packet; so use 'instrs_over' to track the rest
* instruction number and it is assigned to
* 'tidq->period_instructions' for next round calculation.
*/
u64 head, offset = 0;
u64 addr; /*
* Calculate the address of the sampled instruction (-1 as
* sample is reported as though instruction has just been
* executed, but PC has not advanced to next instruction)
* 'instrs_over' is the number of instructions executed after
* sample points, initialise it to 'instrs_executed' and will
* decrease it for consumed instructions in every synthesized
* instruction sample. */
u64 offset = (instrs_executed - instrs_over - 1);
u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
tidq->packet, offset);
u64 instrs_over = instrs_executed;
ret = cs_etm__synth_instruction_sample(
etmq, tidq, addr, etm->instructions_sample_period);
if (ret)
return ret;
/*
* 'head' is the instructions number of the head in the new
* packet, it combines with the tail of previous packet to
* generate a sample. So 'head' uses the sample period to
* decrease the instruction number introduced by the previous
* packet.
*/
head = etm->instructions_sample_period -
(tidq->period_instructions - instrs_executed);
if (head) {
offset = head;
/*
* Calculate the address of the sampled instruction (-1
* as sample is reported as though instruction has just
* been executed, but PC has not advanced to next
* instruction)
*/
addr = cs_etm__instr_addr(etmq, trace_chan_id,
tidq->packet, offset - 1);
ret = cs_etm__synth_instruction_sample(
etmq, tidq, addr,
etm->instructions_sample_period);
if (ret)
return ret;
instrs_over -= head;
}
while (instrs_over >= etm->instructions_sample_period) {
offset += etm->instructions_sample_period;
/*
* Calculate the address of the sampled instruction (-1
* as sample is reported as though instruction has just
* been executed, but PC has not advanced to next
* instruction)
*/
addr = cs_etm__instr_addr(etmq, trace_chan_id,
tidq->packet, offset - 1);
ret = cs_etm__synth_instruction_sample(
etmq, tidq, addr,
etm->instructions_sample_period);
if (ret)
return ret;
instrs_over -= etm->instructions_sample_period;
} /* Carry remaining instructions into next sample period */ tidq->period_instructions = instrs_over;
-- 2.17.1
I believe the following change would work and make for easier reading...
.... at the start of the function remove instrs_executed and replace .... /* get instructions remainder from previous packet */ u64 instrs_prev = tidq->period_instructions;
/* set available instructions to previous packet remainder + the current packet count */ tidq->period_instructions += tidq->packet->instr_count;
.... within the if(etm->sample_instructions && ...) statement I would be more explicit what the elements of the diagram are ....
/*
- Below diagram demonstrates the instruction samples
- generation flows:
- Instrs Instrs Instrs Instrs
- Sample(n) Sample(n+1) Sample(n+2) Sample(n+3)
- | | | |
- V V V V
^ ^
| |
Period Period
- instructions(Pi) instructions(Pi')
| |
\---------------- -----------------/
V
tidq->packet->instr_count;
- Instrs Sample(n...) are the synthesised samples occuring every
etm->instructions_sample_period
- instructions - as defined on the perf command line. Sample(n) being
the last sample before the
- current etm packet, n+1 to n+3 samples generated from the current etm packet.
- tidq->packet->instr_count represents the number of instructions in
the current etm packet.
- Period instructions (Pi) contains the the number of instructions
executed after the sample point(n)
- from the previous etm packet. This will always be less than
etm->instructions_sample_period.
.... continue with explanation here ....
.... then we can simplify the loop code removing some of the temporary variables ....
/* get the initial offset into the current packet instructions (entry conditions ensure that instrs_prev < etm->instructions_sample_period) */ u64 offset = etm->instructions_sample_period - instrs_prev; u64 addr;
/* Prepare last branches for instruction sample */ if (etm->synth_opts.last_branch) cs_etm__copy_last_branch_rb(etmq, tidq);
while (tidq->period_instructions >= etm->instructions_sample_period) {
/* * Calculate the address of the sampled instruction (-1 * as sample is reported as though instruction has just * been executed, but PC has not advanced to next * instruction) */ addr = cs_etm__instr_addr(etmq, trace_chan_id, tidq->packet, offset - 1); ret = cs_etm__synth_instruction_sample( etmq, tidq, addr, etm->instructions_sample_period); if (ret) return ret; offset += etm->instructions_sample_period; tidq->period_instructions -= etm->instructions_sample_period;
}
..... I believe the above should work, but cannot claim to have tried it out. What do you think?
Agree. To be honest, I considered to use your suggested way, but I worried about the boundary conditions for 'offset', so went back to use explict method with two code segments (head and sequential samples).
After review the suggested code, I don't find any issue. Will refine code as this way and give testing for it.
Very appreciate the suggestions :)
Leo