(Hi - posted for review. There was a recent ABI breaking change (see commit reference below), after which perf inject on ETM creates corrupt files. The simple fix ends up with it using the new ABI. I have an alternative patch set that emulates the old ABI - for the benefit of tools like autofdo that can't yet consume the new ABI - but it's much messier.)
From: Al Grant al.grant@arm.com
Commit 42bbabed09ce6208026648a71a45b4394c74585a ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new format.
Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant al.grant@arm.com Acked-by: Andrea Brunato andrea.brunato@arm.com
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, }
if (etm->synth_opts.last_branch) + { attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + /* We don't use the hardware index, but the sample generation + code uses the new format branch_stack with this field, + so the event attributes must indicate that it's present. */ + attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + }
if (etm->synth_opts.instructions) { attr.config = PERF_COUNT_HW_INSTRUCTIONS; IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Al,
On Mon, Jun 01, 2020 at 04:00:02PM +0000, Al Grant wrote:
(Hi - posted for review. There was a recent ABI breaking change (see commit reference below), after which perf inject on ETM creates corrupt files. The simple fix ends up with it using the new ABI. I have an alternative patch set that emulates the old ABI - for the benefit of tools like autofdo that can't yet consume the new ABI - but it's much messier.)
From: Al Grant al.grant@arm.com
Commit 42bbabed09ce6208026648a71a45b4394c74585a ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new format.
Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant al.grant@arm.com Acked-by: Andrea Brunato andrea.brunato@arm.com
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, }
if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
}
For this patch itself, it looks good to me. But the change seems a workaround rather than directly fixing issues.
Essentially, this issue is caused by the data structure definition is mess for branch record, thus, I prefer to use below change (I don't test it, so just want to use the change to demonstrate the basic idea):
Thanks, Leo
---8<---
diff --git a/kernel/events/core.c b/kernel/events/core.c index e296c5c59c6f..f00a49293787 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6831,6 +6831,8 @@ void perf_output_sample(struct perf_output_handle *handle, perf_output_put(handle, data->br_stack->nr); if (perf_sample_save_hw_index(event)) perf_output_put(handle, data->br_stack->hw_idx); + else + perf_output_put(handle, -1); perf_output_copy(handle, data->br_stack->entries, size); } else { /* diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h index 17b2ccc61094..111bfb9d6635 100644 --- a/tools/perf/util/branch.h +++ b/tools/perf/util/branch.h @@ -50,23 +50,11 @@ struct branch_stack { };
/* - * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied. - * Otherwise, the output format of a sample with branch stack is - * struct branch_stack { - * u64 nr; - * struct branch_entry entries[0]; - * } - * Check whether the hw_idx is available, - * and return the corresponding pointer of entries[0]. + * Return the corresponding pointer of entries[0]. */ static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample) { - u64 *entry = (u64 *)sample->branch_stack; - - entry++; - if (sample->no_hw_idx) - return (struct branch_entry *)entry; - return (struct branch_entry *)(++entry); + return sample->branch_stack->entries; }
struct branch_type_stat { diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 96e5171dce41..88a919ee8f7c 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2159,8 +2159,10 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event, return -EFAULT;
sz = data->branch_stack->nr * sizeof(struct branch_entry); + /* +hw_idx */ + sz += sizeof(u64); if (evsel__has_branch_hw_idx(evsel)) - sz += sizeof(u64); + data->no_hw_idx = false; else data->no_hw_idx = true; OVERFLOW_CHECK(array, sz, max_size);
On Tue, Jun 02, 2020 at 04:58:20PM +0800, Leo Yan wrote:
Hi Al,
On Mon, Jun 01, 2020 at 04:00:02PM +0000, Al Grant wrote:
(Hi - posted for review. There was a recent ABI breaking change (see commit reference below), after which perf inject on ETM creates corrupt files. The simple fix ends up with it using the new ABI. I have an alternative patch set that emulates the old ABI - for the benefit of tools like autofdo that can't yet consume the new ABI - but it's much messier.)
From: Al Grant al.grant@arm.com
Commit 42bbabed09ce6208026648a71a45b4394c74585a ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new format.
Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant al.grant@arm.com Acked-by: Andrea Brunato andrea.brunato@arm.com
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, }
if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
}
For this patch itself, it looks good to me. But the change seems a workaround rather than directly fixing issues.
Essentially, this issue is caused by the data structure definition is mess for branch record, thus, I prefer to use below change (I don't test it, so just want to use the change to demonstrate the basic idea):
Sorry for rush sending email, give explaination for below change;
whether the sample type PERF_SAMPLE_BRANCH_HW_INDEX has been set or not, we can always pass hw_idx from kernel to user space, if without supporting hw_idx then will pass '-1' for the field. This can help us to unify the structure 'branch_stack', which will always contain the item branch_stack::hw_idx, thus when the kernel or perf tool (like cs-etm or intel-pt) to generate branch record sample with the same definition for 'struct branch_stack'. The benefit is for more readable code and this can resolve issues both for cs-etm and intel-pt.
I think here have one concern is that this change will introduce a penalty for passing redundant hw_idx from kernel to user space if PERF_SAMPLE_BRANCH_HW_INDEX is not used.
Thanks, Leo
---8<---
diff --git a/kernel/events/core.c b/kernel/events/core.c index e296c5c59c6f..f00a49293787 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6831,6 +6831,8 @@ void perf_output_sample(struct perf_output_handle *handle, perf_output_put(handle, data->br_stack->nr); if (perf_sample_save_hw_index(event)) perf_output_put(handle, data->br_stack->hw_idx);
else
} else { /*perf_output_put(handle, -1); perf_output_copy(handle, data->br_stack->entries, size);
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h index 17b2ccc61094..111bfb9d6635 100644 --- a/tools/perf/util/branch.h +++ b/tools/perf/util/branch.h @@ -50,23 +50,11 @@ struct branch_stack { }; /*
- The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied.
- Otherwise, the output format of a sample with branch stack is
- struct branch_stack {
- u64 nr;
- struct branch_entry entries[0];
- }
- Check whether the hw_idx is available,
- and return the corresponding pointer of entries[0].
*/
- Return the corresponding pointer of entries[0].
static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample) {
- u64 *entry = (u64 *)sample->branch_stack;
- entry++;
- if (sample->no_hw_idx)
return (struct branch_entry *)entry;
- return (struct branch_entry *)(++entry);
- return sample->branch_stack->entries;
} struct branch_type_stat { diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 96e5171dce41..88a919ee8f7c 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2159,8 +2159,10 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event, return -EFAULT; sz = data->branch_stack->nr * sizeof(struct branch_entry);
/* +hw_idx */
if (evsel__has_branch_hw_idx(evsel))sz += sizeof(u64);
sz += sizeof(u64);
else data->no_hw_idx = true; OVERFLOW_CHECK(array, sz, max_size);data->no_hw_idx = false;
-----Original Message----- From: Leo Yan leo.yan@linaro.org Sent: 02 June 2020 10:40 To: Al Grant Al.Grant@arm.com Cc: Coresight ML coresight@lists.linaro.org; Andrea Brunato Andrea.Brunato@arm.com Subject: Re: [PATCH] Fix corrupt data after perf inject from
On Tue, Jun 02, 2020 at 04:58:20PM +0800, Leo Yan wrote:
Hi Al,
On Mon, Jun 01, 2020 at 04:00:02PM +0000, Al Grant wrote:
(Hi - posted for review. There was a recent ABI breaking change (see commit reference below), after which perf inject on ETM creates corrupt files. The simple fix ends up with it using the new ABI. I have an alternative patch set that emulates the old ABI - for the benefit of tools like autofdo that can't yet consume the new ABI - but it's much messier.)
From: Al Grant al.grant@arm.com
Commit 42bbabed09ce6208026648a71a45b4394c74585a ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use
of the new format.
Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant al.grant@arm.com Acked-by: Andrea Brunato andrea.brunato@arm.com
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct
cs_etm_auxtrace *etm,
} if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
}
For this patch itself, it looks good to me. But the change seems a workaround rather than directly fixing issues.
Essentially, this issue is caused by the data structure definition is mess for branch record, thus, I prefer to use below change (I don't test it, so just want to use the change to demonstrate the basic idea):
Sorry for rush sending email, give explaination for below change;
whether the sample type PERF_SAMPLE_BRANCH_HW_INDEX has been set or not, we can always pass hw_idx from kernel to user space
No, the flag must match whether hw_idx is passed. If you're always passing it (even if it's -1) you must set the flag.
The point of this flag is to indicate the ABI break in sample records passed from the kernel to userspace. perf.data captured from older kernels will have just the 'nr' field followed by the branches, perf.data captured from newer kernels will have 'nr' followed by 'hw_idx' followed by the branches. The reader uses the flag to decide whether it needs to expect that extra word. The flag must match the sample, otherwise the branch stack and everything that follows it is corrupt.
, if without supporting hw_idx then will pass '-1' for the field.
Sure, this can be done any time - the kernel can do it, perf inject can do it. But it must set PERF_SAMPLE_BRANCH_HW_INDEX to say the field is there. And perf's sample parser needs to be able to continue to handle records from older kernels, where the field isn't present and the flag isn't set.
I don't like the way perf_sample__branch_entries() is being used, or the way "struct branch_stack" has different formats even as an internal type - but I'm not proposing to refactor that, because it might break all sorts of things (e.g. support for Intel PT and LBR). My patch is the simplest possible fix and makes synthesized samples from ETM work like those from PT.
Al
This can help us to unify the structure 'branch_stack', which will always contain the item branch_stack::hw_idx, thus when the kernel or perf tool (like cs-etm or intel-pt) to generate branch record sample with the same definition for 'struct branch_stack'. The benefit is for more readable code and this can resolve issues both for cs-etm and intel-pt.
I think here have one concern is that this change will introduce a penalty for passing redundant hw_idx from kernel to user space if PERF_SAMPLE_BRANCH_HW_INDEX is not used.
Thanks, Leo
---8<---
diff --git a/kernel/events/core.c b/kernel/events/core.c index e296c5c59c6f..f00a49293787 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6831,6 +6831,8 @@ void perf_output_sample(struct
perf_output_handle *handle,
perf_output_put(handle, data->br_stack->nr); if (perf_sample_save_hw_index(event)) perf_output_put(handle, data->br_stack- hw_idx); +else +perf_output_put(handle, -1); perf_output_copy(handle, data->br_stack->entries,
size);
} else { /* diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h index 17b2ccc61094..111bfb9d6635 100644 --- a/tools/perf/util/branch.h +++ b/tools/perf/util/branch.h @@ -50,23 +50,11 @@ struct branch_stack { };
/*
- The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is
applied.
- Otherwise, the output format of a sample with branch stack is
- struct branch_stack {
- *u64nr;
- *struct branch_entryentries[0];
- }
- Check whether the hw_idx is available,
- and return the corresponding pointer of entries[0].
*/
- Return the corresponding pointer of entries[0].
static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample) { -u64 *entry = (u64 *)sample->branch_stack;
-entry++; -if (sample->no_hw_idx) -return (struct branch_entry *)entry; -return (struct branch_entry *)(++entry); +return sample->branch_stack->entries; }
struct branch_type_stat { diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 96e5171dce41..88a919ee8f7c 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2159,8 +2159,10 @@ int evsel__parse_sample(struct evsel *evsel, union
perf_event *event,
return -EFAULT;
sz = data->branch_stack->nr * sizeof(struct branch_entry); +/* +hw_idx */ +sz += sizeof(u64); if (evsel__has_branch_hw_idx(evsel)) -sz += sizeof(u64); +data->no_hw_idx = false; else data->no_hw_idx = true; OVERFLOW_CHECK(array, sz, max_size);
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Tue, Jun 02, 2020 at 10:14:02AM +0000, Al Grant wrote:
[...]
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct
cs_etm_auxtrace *etm,
} if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
}
For this patch itself, it looks good to me. But the change seems a workaround rather than directly fixing issues.
Essentially, this issue is caused by the data structure definition is mess for branch record, thus, I prefer to use below change (I don't test it, so just want to use the change to demonstrate the basic idea):
Sorry for rush sending email, give explaination for below change;
whether the sample type PERF_SAMPLE_BRANCH_HW_INDEX has been set or not, we can always pass hw_idx from kernel to user space
No, the flag must match whether hw_idx is passed. If you're always passing it (even if it's -1) you must set the flag.
The point of this flag is to indicate the ABI break in sample records passed from the kernel to userspace. perf.data captured from older kernels will have just the 'nr' field followed by the branches, perf.data captured from newer kernels will have 'nr' followed by 'hw_idx' followed by the branches. The reader uses the flag to decide whether it needs to expect that extra word. The flag must match the sample, otherwise the branch stack and everything that follows it is corrupt.
Indeed, if the flag PERF_SAMPLE_BRANCH_HW_INDEX is ABI relevant, then cannot use the change what I proposed.
, if without supporting hw_idx then will pass '-1' for the field.
Sure, this can be done any time - the kernel can do it, perf inject can do it. But it must set PERF_SAMPLE_BRANCH_HW_INDEX to say the field is there. And perf's sample parser needs to be able to continue to handle records from older kernels, where the field isn't present and the flag isn't set.
Yes, so another possible fixing is to add a new variant structure:
struct branch_stack_no_idx { u64 nr; struct branch_entryentries[0]; }
When don't set flag PERF_SAMPLE_BRANCH_HW_INDEX, we should to use the structure 'branch_stack_no_idx' to generate branch samples.
I don't like the way perf_sample__branch_entries() is being used, or the way "struct branch_stack" has different formats even as an internal type - but I'm not proposing to refactor that, because it might break all sorts of things (e.g. support for Intel PT and LBR). My patch is the simplest possible fix and makes synthesized samples from ETM work like those from PT.
Just a concern if set flag PERF_SAMPLE_BRANCH_HW_INDEX for cs-etm and intel-pt, later this will cause confusion that actually the synthenize samples don't use hw_idx.
But totally agree with your thinking for this. It's fine for me to upstream with your current approach firstly. We can divide into two parts: fixing and refactoring.
Thanks, Leo
Hi Al,
On Mon, 1 Jun 2020 at 10:00, Al Grant Al.Grant@arm.com wrote:
(Hi - posted for review. There was a recent ABI breaking change (see commit reference below), after which perf inject on ETM creates corrupt files. The simple fix ends up with it using the new ABI. I have an alternative patch set that emulates the old ABI - for the benefit of tools like autofdo that can't yet consume the new ABI - but it's much messier.)
From: Al Grant al.grant@arm.com
Commit 42bbabed09ce6208026648a71a45b4394c74585a ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new format.
Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant al.grant@arm.com Acked-by: Andrea Brunato andrea.brunato@arm.com
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm, }
if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
I'm fine with this set and would ask you to send it to the public mailing list when 5.8-rc1 has been released. Out of interest I have the following 2 questions:
Would setting sample.no_hw_idx to 'true' would have the same effect?
Have you tried with PT? I see that Kan's patch applied the same fix to both cs-etm.c and intel-pt.c and as such I would expect to find the same problem there as well.
Thanks, Mathieu
} if (etm->synth_opts.instructions) { attr.config = PERF_COUNT_HW_INSTRUCTIONS;
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On Mon, 1 Jun 2020 at 10:00, Al Grant Al.Grant@arm.com wrote:
(Hi - posted for review. There was a recent ABI breaking change (see commit reference below), after which perf inject on ETM creates corrupt files. The simple fix ends up with it using the new ABI. I have an alternative patch set that emulates the old ABI - for the benefit of tools like autofdo that can't yet consume the new ABI - but it's much messier.)
From: Al Grant al.grant@arm.com
Commit 42bbabed09ce6208026648a71a45b4394c74585a ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new
format.
Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant al.grant@arm.com Acked-by: Andrea Brunato andrea.brunato@arm.com
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct
cs_etm_auxtrace *etm,
} if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |=
- PERF_SAMPLE_BRANCH_HW_INDEX;
I'm fine with this set and would ask you to send it to the public mailing list when 5.8-rc1 has been released. Out of interest I have the following 2 questions:
Would setting sample.no_hw_idx to 'true' would have the same effect?
Unfortunately not. The purpose of no_hw_idx seems to be to modify the layout of the branch_stack structure in the expanded perf_sample structure, and in the ETM code, that does have the hw_idx field. If we set no_hw_idx hen we have to change the layout of sample->branch_stack, i.e. remove hw_idx, or use perf_sample__branch_entries throughout.
And then synthetic-events.c produces new-format data regardless of no_hw_idx:
if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); /* nr, hw_idx */ sz += 2 * sizeof(u64); memcpy(array, sample->branch_stack, sz); array = (void *)array + sz; }
We would have to change it to something like
if (type & PERF_SAMPLE_BRANCH_STACK) { *array++ = sample->branch_stack->nr; if (!sample->no_hw_idx) *array++ = sample->branch_stack->hw_idx; sz = sample->branch_stack->nr * sizeof(struct branch_entry); memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); array = (void *)array + sz; }
It's possible, but the patch ends up being ten times bigger (I tried) and still doesn't fix the technical debt. perf_sample__branch_entries is just wrong. There is no need for non-standard layouts in the unpacked perf_sample or branch_stack. What matters is (a) parsing serialized data according to flags in the event attributes and (b) writing serialized (synthetic) data that matches the flags in the event attributes. Both of those suggest having access to the event attributes when we're writing data, instead of passing in a random subset of values like 'type'.
Have you tried with PT? I see that Kan's patch applied the same fix to both cs- etm.c and intel-pt.c and as such I would expect to find the same problem there as well.
Yes, now you mention it, I see the same problem with PT, fix is same:
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 23c8289c2472..334e59238f65 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2896,8 +2896,10 @@ static int intel_pt_synth_events(struct intel_pt *pt,
if (pt->synth_opts.callchain) attr.sample_type |= PERF_SAMPLE_CALLCHAIN; - if (pt->synth_opts.last_branch) + if (pt->synth_opts.last_branch) { attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + }
if (pt->synth_opts.instructions) { attr.config = PERF_COUNT_HW_INSTRUCTIONS;
Al
Thanks, Mathieu
} if (etm->synth_opts.instructions) { attr.config = PERF_COUNT_HW_INSTRUCTIONS; IMPORTANT
NOTICE: The contents of this email and any attachments are confidential and
may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, 3 Jun 2020 at 03:09, Al Grant Al.Grant@arm.com wrote:
On Mon, 1 Jun 2020 at 10:00, Al Grant Al.Grant@arm.com wrote:
(Hi - posted for review. There was a recent ABI breaking change (see commit reference below), after which perf inject on ETM creates corrupt files. The simple fix ends up with it using the new ABI. I have an alternative patch set that emulates the old ABI - for the benefit of tools like autofdo that can't yet consume the new ABI - but it's much messier.)
From: Al Grant al.grant@arm.com
Commit 42bbabed09ce6208026648a71a45b4394c74585a ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new
format.
Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant al.grant@arm.com Acked-by: Andrea Brunato andrea.brunato@arm.com
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 62d2f9b9ce1b..71a056e29675 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1332,7 +1332,13 @@ static int cs_etm__synth_events(struct
cs_etm_auxtrace *etm,
} if (etm->synth_opts.last_branch)
{ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
/* We don't use the hardware index, but the sample generation
code uses the new format branch_stack with this field,
so the event attributes must indicate that it's present. */
attr.branch_sample_type |=
- PERF_SAMPLE_BRANCH_HW_INDEX;
I'm fine with this set and would ask you to send it to the public mailing list when 5.8-rc1 has been released. Out of interest I have the following 2 questions:
Would setting sample.no_hw_idx to 'true' would have the same effect?
Unfortunately not. The purpose of no_hw_idx seems to be to modify the layout of the branch_stack structure in the expanded perf_sample structure, and in the ETM code, that does have the hw_idx field. If we set no_hw_idx hen we have to change the layout of sample->branch_stack, i.e. remove hw_idx, or use perf_sample__branch_entries throughout.
And then synthetic-events.c produces new-format data regardless of no_hw_idx:
if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); /* nr, hw_idx */ sz += 2 * sizeof(u64); memcpy(array, sample->branch_stack, sz); array = (void *)array + sz; }
We would have to change it to something like
if (type & PERF_SAMPLE_BRANCH_STACK) { *array++ = sample->branch_stack->nr; if (!sample->no_hw_idx) *array++ = sample->branch_stack->hw_idx; sz = sample->branch_stack->nr * sizeof(struct branch_entry); memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); array = (void *)array + sz; }
It's possible, but the patch ends up being ten times bigger (I tried) and still doesn't fix the technical debt.
Right, then you're in the core code and it's a different conversation.
perf_sample__branch_entries is just wrong. There is no need for non-standard layouts in the unpacked perf_sample or branch_stack. What matters is (a) parsing serialized data according to flags in the event attributes and (b) writing serialized (synthetic) data that matches the flags in the event attributes. Both of those suggest having access to the event attributes when we're writing data, instead of passing in a random subset of values like 'type'.
Have you tried with PT? I see that Kan's patch applied the same fix to both cs- etm.c and intel-pt.c and as such I would expect to find the same problem there as well.
Yes, now you mention it, I see the same problem with PT, fix is same:
Perfect, so it's not related to something we do.
Thanks, Mathieu
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 23c8289c2472..334e59238f65 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2896,8 +2896,10 @@ static int intel_pt_synth_events(struct intel_pt *pt,
if (pt->synth_opts.callchain) attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
if (pt->synth_opts.last_branch)
if (pt->synth_opts.last_branch) { attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
} if (pt->synth_opts.instructions) { attr.config = PERF_COUNT_HW_INSTRUCTIONS;
Al
Thanks, Mathieu
} if (etm->synth_opts.instructions) { attr.config = PERF_COUNT_HW_INSTRUCTIONS; IMPORTANT
NOTICE: The contents of this email and any attachments are confidential and
may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.