Hi,
While running perf on some function-heavy code I noticed the ETM return stack isn't enabled (TRCCONFIGR bit 9). I know that it can easily be enabled with retstack=1, but have we considered making this the default as we move to simplification of perf options? It could lead to more efficient profiles and shouldn't be any more difficult to decode than ETM normally is.
Al
Good afternoon,
On Wed, 8 Jul 2020 at 10:04, Al Grant Al.Grant@arm.com wrote:
Hi,
While running perf on some function-heavy code I noticed the ETM return stack
isn’t enabled (TRCCONFIGR bit 9). I know that it can easily be enabled with retstack=1,
but have we considered making this the default as we move to simplification of
perf options? It could lead to more efficient profiles and shouldn’t be any more
difficult to decode than ETM normally is.
Can you give me more details on this simplification of the perf options you are referring to?
I am debating whether the result of a perf trace session is a user visible change and I haven't made up my mind yet. I am personally fine with enabling the return stack option by default but if somebody yells then we might have to rever the patch. It would be nice to hear back from users on this list - is anyone strongly opinionated on this?
Thanks, Mathieu
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Matheiu Poirier wrote:
On Wed, 8 Jul 2020 at 10:04, Al Grant Al.Grant@arm.com wrote:
While running perf on some function-heavy code I noticed the ETM return stack
isn’t enabled (TRCCONFIGR bit 9). I know that it can easily be enabled with retstack=1,
but have we considered making this the default as we move to simplification of
perf options? It could lead to more efficient profiles and shouldn’t be any more
difficult to decode than ETM normally is.
Can you give me more details on this simplification of the perf options you are referring to?
I was thinking of the implicit sinks proposal, which will mean that most of the time we can say simply
perf record -e cs_etm// ...
The easier it is to use, the more useful it is for the default settings to be the most useful ones.
I am debating whether the result of a perf trace session is a user visible change and I haven't made up my mind yet. I am personally fine with enabling the return stack option by default but if somebody yells then we might have to rever the patch. It would be nice to hear back from users on this list - is anyone strongly opinionated on this?
OpenCSD can cope, so it should all be transparent, the only impact I can see is people doing their own trace decode, with a simplistic decoder that can't handle the return stack, and who were relying on it being disabled by default. There may be some code where there's difficulty accessing the original image and the explicit return addresses are useful - but in that situation you might also struggle to follow the E/N atoms and need to enable branch-broadcast (which causes even more bloat in the trace).
The ETM spec allows for considerable variation between implementations - anything that's making assumptions about the way ETM trace looks, based on what it's looked like in the past, is likely to break anyway.
Al
Thanks, Mathieu
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On Fri, 10 Jul 2020 at 07:18, Al Grant Al.Grant@arm.com wrote:
Matheiu Poirier wrote:
On Wed, 8 Jul 2020 at 10:04, Al Grant Al.Grant@arm.com wrote:
While running perf on some function-heavy code I noticed the ETM return stack
isn’t enabled (TRCCONFIGR bit 9). I know that it can easily be enabled with retstack=1,
but have we considered making this the default as we move to simplification of
perf options? It could lead to more efficient profiles and shouldn’t be any more
difficult to decode than ETM normally is.
Can you give me more details on this simplification of the perf options you are referring to?
I was thinking of the implicit sinks proposal, which will mean that most of the time we can say simply
perf record -e cs_etm// ...
I understand now.
The easier it is to use, the more useful it is for the default settings to be the most useful ones.
I am debating whether the result of a perf trace session is a user visible change and I haven't made up my mind yet. I am personally fine with enabling the return stack option by default but if somebody yells then we might have to rever the patch. It would be nice to hear back from users on this list - is anyone strongly opinionated on this?
OpenCSD can cope, so it should all be transparent, the only impact I can see is people doing their own trace decode, with a simplistic decoder that can't handle the return stack, and who were relying on it being disabled by default.
Right, that is exactly what I have in mind.
There may be some code where there's difficulty accessing the original image and the explicit return addresses are useful - but in that situation you might also struggle to follow the E/N atoms and need to enable branch-broadcast (which causes even more bloat in the trace).
The ETM spec allows for considerable variation between implementations - anything that's making assumptions about the way ETM trace looks, based on what it's looked like in the past, is likely to break anyway.
Al
Thanks, Mathieu
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi,
The basis for resource management in ETMv4 is to start with a completely blank sheet and only add stuff at user request. I have previously removed unneeded resource usage in preparation for this.
The question here is what happens now if someone creates a configuration without return-stack - do we still set this behind their back or not? Or do we only switch if off if something incompatible such as branch broadcast is operational? That is the problem with setting options without the user being aware of it - as a general rule don't do it!
We do have users who do their own analysis of the trace stream - there was a question earlier this month from a user spotting addresses in the trace output on an FPGA, before any decode is done. There have also been users looking directly at perf.data. This change will be visible to them, as well as any user who compares a prior 'perf report --dump' to one with this change.
If we do make this change, then we need to change the 'retstack' option to a 'noretstack' option to allow users to recover the previous operation.
Regards
Mike
On Mon, 13 Jul 2020 at 18:46, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On Fri, 10 Jul 2020 at 07:18, Al Grant Al.Grant@arm.com wrote:
Matheiu Poirier wrote:
On Wed, 8 Jul 2020 at 10:04, Al Grant Al.Grant@arm.com wrote:
While running perf on some function-heavy code I noticed the ETM return stack
isn’t enabled (TRCCONFIGR bit 9). I know that it can easily be enabled with retstack=1,
but have we considered making this the default as we move to simplification of
perf options? It could lead to more efficient profiles and shouldn’t be any more
difficult to decode than ETM normally is.
Can you give me more details on this simplification of the perf options you are referring to?
I was thinking of the implicit sinks proposal, which will mean that most of the time we can say simply
perf record -e cs_etm// ...
I understand now.
The easier it is to use, the more useful it is for the default settings to be the most useful ones.
I am debating whether the result of a perf trace session is a user visible change and I haven't made up my mind yet. I am personally fine with enabling the return stack option by default but if somebody yells then we might have to rever the patch. It would be nice to hear back from users on this list - is anyone strongly opinionated on this?
OpenCSD can cope, so it should all be transparent, the only impact I can see is people doing their own trace decode, with a simplistic decoder that can't handle the return stack, and who were relying on it being disabled by default.
Right, that is exactly what I have in mind.
There may be some code where there's difficulty accessing the original image and the explicit return addresses are useful - but in that situation you might also struggle to follow the E/N atoms and need to enable branch-broadcast (which causes even more bloat in the trace).
The ETM spec allows for considerable variation between implementations - anything that's making assumptions about the way ETM trace looks, based on what it's looked like in the past, is likely to break anyway.
Al
Thanks, Mathieu
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight