On Tue, Jun 02, 2020 at 10:14:02AM +0000, Al Grant wrote:
[...]
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct
cs_etm_auxtrace *etm,
} if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
}
For this patch itself, it looks good to me. But the change seems a workaround rather than directly fixing issues.
Essentially, this issue is caused by the data structure definition is mess for branch record, thus, I prefer to use below change (I don't test it, so just want to use the change to demonstrate the basic idea):
Sorry for rush sending email, give explaination for below change;
whether the sample type PERF_SAMPLE_BRANCH_HW_INDEX has been set or not, we can always pass hw_idx from kernel to user space
No, the flag must match whether hw_idx is passed. If you're always passing it (even if it's -1) you must set the flag.
The point of this flag is to indicate the ABI break in sample records passed from the kernel to userspace. perf.data captured from older kernels will have just the 'nr' field followed by the branches, perf.data captured from newer kernels will have 'nr' followed by 'hw_idx' followed by the branches. The reader uses the flag to decide whether it needs to expect that extra word. The flag must match the sample, otherwise the branch stack and everything that follows it is corrupt.
Indeed, if the flag PERF_SAMPLE_BRANCH_HW_INDEX is ABI relevant, then cannot use the change what I proposed.
, if without supporting hw_idx then will pass '-1' for the field.
Sure, this can be done any time - the kernel can do it, perf inject can do it. But it must set PERF_SAMPLE_BRANCH_HW_INDEX to say the field is there. And perf's sample parser needs to be able to continue to handle records from older kernels, where the field isn't present and the flag isn't set.
Yes, so another possible fixing is to add a new variant structure:
struct branch_stack_no_idx { u64 nr; struct branch_entryentries[0]; }
When don't set flag PERF_SAMPLE_BRANCH_HW_INDEX, we should to use the structure 'branch_stack_no_idx' to generate branch samples.
I don't like the way perf_sample__branch_entries() is being used, or the way "struct branch_stack" has different formats even as an internal type - but I'm not proposing to refactor that, because it might break all sorts of things (e.g. support for Intel PT and LBR). My patch is the simplest possible fix and makes synthesized samples from ETM work like those from PT.
Just a concern if set flag PERF_SAMPLE_BRANCH_HW_INDEX for cs-etm and intel-pt, later this will cause confusion that actually the synthenize samples don't use hw_idx.
But totally agree with your thinking for this. It's fine for me to upstream with your current approach firstly. We can divide into two parts: fixing and refactoring.
Thanks, Leo