On 27 April 2018 at 12:24, Robert Walker Robert.Walker@arm.com wrote:
Hi,
Strobing the ETM to reduce the amount of trace data when collecting profiles for AutoFDO seems to be working and providing useful optimizations. We’re currently working with some proof of concept patches (attached for reference) that add parameters to sysfs to configure the strobe period – before running perf record, the user must write to these parameters for each ETM. This isn’t suitable for production use as it has to be done for each ETM and the values persist after the trace session. To get this into upstream, we need to have this done by the perf record tool.
I understand there is work planned to enable more complex ETM configurations (such as strobing) from perf, possibly using a file to load register values from. Is this still the case, and if so, when is it likely to be done?
Hi Robert,
I am currently working on supporting CPU-wide trace scenarios where I can start seeing the end of the tunnel. After that my plan was to add support for ETMv3.x/PTM trace decoding followed by support for N:N source/sink topology. Part of the latter is to introduce a way to enable more complex ETM configuration using a configuration file. In fact I already stumbled on how I want to do that and have a (very) small prototype that works.
So that is what I had in mind... But it doesn't mean I can't be talked into changing my priorities. In fact I will gladly do so if we, as a group, decide it is more important to introduce support for complex configuration before ETMv3.x/PTM decoding. I personally don't have a preference, it is simply a matter of deciding what we want to do.
I have CC'ed the coresight mailing list in order to reach a broader audience. Please speak up if you really have an issue (along with the rational) with supporting ETM complex configurations before ETMv3.x/PTM decoding.
Best regards, Mathieu
Thanks
Rob
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
HI Rob,
On 27 April 2018 at 20:08, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 27 April 2018 at 12:24, Robert Walker Robert.Walker@arm.com wrote:
Hi,
Strobing the ETM to reduce the amount of trace data when collecting profiles for AutoFDO seems to be working and providing useful optimizations. We’re currently working with some proof of concept patches (attached for reference) that add parameters to sysfs to configure the strobe period – before running perf record, the user must write to these parameters for each ETM. This isn’t suitable for production use as it has to be done for each ETM and the values persist after the trace session. To get this into upstream, we need to have this done by the perf record tool.
Given this patch is not intended for production use, we can ignore the hard-coded resource allocation and lack of resource management - i.e. no checks to see if what you want to use is not in use by another function. For example, we may want to use a counter to increase timestamp frequency (which is an approach being considered for the per-CPU trace being looked at by Mathieu at present.)
Therefore for specialisations like these, I don't think continually adding new functions to the driver is a scalable approach.
It does look like some sort of "file based" approach could work - at least in the initial set up - programming up the list of registers.
It is not immediately clear to me why this has to be done by the perf tool - writing a script to access all the ETMs does not seem out of the question - assuming we can get perf to respect the ETM settings when collecting the data.
For this we could simply have a "use settings" flag that sysfs can set and perf can respect. That way what you write will always be used for the session, rather than being over-written as happens at present.
This also would provide a quicker upstreamable solution than implementing a full blown config file mechanism.
(and I believe - possibly from a conversation that occured at connect one time - that intel approach more complex trace configuration by using sysfs - perhaps Mathieu can confirm this?)
I understand there is work planned to enable more complex ETM configurations (such as strobing) from perf, possibly using a file to load register values from. Is this still the case, and if so, when is it likely to be done?
Hi Robert,
I am currently working on supporting CPU-wide trace scenarios where I can start seeing the end of the tunnel. After that my plan was to add support for ETMv3.x/PTM trace decoding followed by support for N:N source/sink topology. Part of the latter is to introduce a way to enable more complex ETM configuration using a configuration file. In fact I already stumbled on how I want to do that and have a (very) small prototype that works.
So that is what I had in mind... But it doesn't mean I can't be talked into changing my priorities. In fact I will gladly do so if we, as a group, decide it is more important to introduce support for complex configuration before ETMv3.x/PTM decoding. I personally don't have a preference, it is simply a matter of deciding what we want to do.
I have CC'ed the coresight mailing list in order to reach a broader audience. Please speak up if you really have an issue (along with the rational) with supporting ETM complex configurations before ETMv3.x/PTM decoding.
Best regards, Mathieu
The last time I looked into the question of a "programming file" approach I came up with a few issues that need to be considered - which pretty much come down to resources and error handling:
- resource management: even with a file name passed to the driver thorough the perf command line, there is still the question of additional command line options that use resources. - resource usage priority: with multiple perf command line options - or if using config files from sysfs where the user may have multiple "recipe" files - and supplement with direct access to resources via sysfs, clear prioritisation rules need to be in play so that outcomes are understandable. - named register access: use names based on the TRM - not a file full of offsets and values. - resource type requests. Name register use can be extended to allow the file to contain a non-specific register name where multiple of a resource exist, to allow the request of the next available resource - e.g. counter_n rather than counter_0 - error handllng: clear definitions as to what happens if part of the file is incorrect / insufficient resources - is the application of a configuration file atomic?
I think this approach needs to be correct from the start - otherwise we risk creating something that is difficult to use and maintain, and will result in a lot of time spent answering support questions.
Regards
Mike
Thanks
Rob
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On 30 April 2018 at 09:05, Mike Leach mike.leach@linaro.org wrote:
HI Rob,
On 27 April 2018 at 20:08, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 27 April 2018 at 12:24, Robert Walker Robert.Walker@arm.com wrote:
Hi,
Strobing the ETM to reduce the amount of trace data when collecting profiles for AutoFDO seems to be working and providing useful optimizations. We’re currently working with some proof of concept patches (attached for reference) that add parameters to sysfs to configure the strobe period – before running perf record, the user must write to these parameters for each ETM. This isn’t suitable for production use as it has to be done for each ETM and the values persist after the trace session. To get this into upstream, we need to have this done by the perf record tool.
Given this patch is not intended for production use, we can ignore the hard-coded resource allocation and lack of resource management - i.e. no checks to see if what you want to use is not in use by another function. For example, we may want to use a counter to increase timestamp frequency (which is an approach being considered for the per-CPU trace being looked at by Mathieu at present.)
Resource management is indeed a problem I'm currently facing in the development of the CPU-wide trace support. Since in-kernel driver configuration happens last I'm being careful to no overwrite potential sysFS configuration. I'm well aware that this approach is brittle and things will break at one point, hence trying to move to a configuration file option. When that happens we can introduce the policy that if a config file is fed to the drivers, all the configuration is taken from there and any other option is nullified. But we can talk about that later when I actually get to implement the feature.
Therefore for specialisations like these, I don't think continually adding new functions to the driver is a scalable approach.
It does look like some sort of "file based" approach could work - at least in the initial set up - programming up the list of registers.
It is not immediately clear to me why this has to be done by the perf tool - writing a script to access all the ETMs does not seem out of the question - assuming we can get perf to respect the ETM settings when collecting the data.
For this we could simply have a "use settings" flag that sysfs can set and perf can respect. That way what you write will always be used for the session, rather than being over-written as happens at present.
This also would provide a quicker upstreamable solution than implementing a full blown config file mechanism.
(and I believe - possibly from a conversation that occured at connect one time - that intel approach more complex trace configuration by using sysfs - perhaps Mathieu can confirm this?)
Correct, that's what I understand but it doesn't come from Alex Shishkin himself. I need to touch base with Alex but want to finish support for CPU-wide trace scenarios first - otherwise I'll get sucked in and will never finish anything.
I understand there is work planned to enable more complex ETM configurations (such as strobing) from perf, possibly using a file to load register values from. Is this still the case, and if so, when is it likely to be done?
Hi Robert,
I am currently working on supporting CPU-wide trace scenarios where I can start seeing the end of the tunnel. After that my plan was to add support for ETMv3.x/PTM trace decoding followed by support for N:N source/sink topology. Part of the latter is to introduce a way to enable more complex ETM configuration using a configuration file. In fact I already stumbled on how I want to do that and have a (very) small prototype that works.
So that is what I had in mind... But it doesn't mean I can't be talked into changing my priorities. In fact I will gladly do so if we, as a group, decide it is more important to introduce support for complex configuration before ETMv3.x/PTM decoding. I personally don't have a preference, it is simply a matter of deciding what we want to do.
I have CC'ed the coresight mailing list in order to reach a broader audience. Please speak up if you really have an issue (along with the rational) with supporting ETM complex configurations before ETMv3.x/PTM decoding.
Best regards, Mathieu
The last time I looked into the question of a "programming file" approach I came up with a few issues that need to be considered - which pretty much come down to resources and error handling:
- resource management: even with a file name passed to the driver
thorough the perf command line, there is still the question of additional command line options that use resources.
- resource usage priority: with multiple perf command line options -
or if using config files from sysfs where the user may have multiple "recipe" files - and supplement with direct access to resources via sysfs, clear prioritisation rules need to be in play so that outcomes are understandable.
- named register access: use names based on the TRM - not a file full
of offsets and values.
- resource type requests. Name register use can be extended to allow
the file to contain a non-specific register name where multiple of a resource exist, to allow the request of the next available resource - e.g. counter_n rather than counter_0
- error handllng: clear definitions as to what happens if part of the
file is incorrect / insufficient resources - is the application of a configuration file atomic?
I think this approach needs to be correct from the start - otherwise we risk creating something that is difficult to use and maintain, and will result in a lot of time spent answering support questions.
I completely agree with you here - we need to get this right from the beginning or we'll never see the end of it. I started playing with the concept of a configuration file when thinking about how to specify the source/sink relationship in an hypothetical N:N topology and quickly got wondering about prioritisation of configuration method. And I didn't get far enough (in my head) to consider how the naming convention should be... One thing I am sure about though is that when I get to that feature there will be plenty of consultation on how to proceed, potentially a great candidate for discussion in Vancouver.
Regards
Mike
Thanks
Rob
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK
-----Original Message----- From: Mathieu Poirier mathieu.poirier@linaro.org Sent: 27 April 2018 20:08 To: Robert Walker Robert.Walker@arm.com Cc: Mike Leach Mike.Leach@arm.com; Al Grant Al.Grant@arm.com; Travis Walton Travis.Walton@arm.com; coresight@lists.linaro.org Subject: Re: Upstream support for ETM strobing
On 27 April 2018 at 12:24, Robert Walker Robert.Walker@arm.com wrote:
Hi,
Strobing the ETM to reduce the amount of trace data when collecting profiles for AutoFDO seems to be working and providing useful optimizations. We’re currently working with some proof of concept patches (attached for reference) that add parameters to sysfs to configure the strobe period – before running perf record, the user must write to these parameters for each ETM. This isn’t suitable for production use as it has to be done for each ETM and the values persist after the trace session. To get this into upstream, we need to have this done by the perf record tool.
I understand there is work planned to enable more complex ETM configurations (such as strobing) from perf, possibly using a file to load register values from. Is this still the case, and if so, when is it likely to
be done?
Hi Robert,
I am currently working on supporting CPU-wide trace scenarios where I can start seeing the end of the tunnel. After that my plan was to add support for ETMv3.x/PTM trace decoding followed by support for N:N source/sink topology. Part of the latter is to introduce a way to enable more complex ETM configuration using a configuration file. In fact I already stumbled on how I want to do that and have a (very) small prototype that works.
So that is what I had in mind... But it doesn't mean I can't be talked into changing my priorities. In fact I will gladly do so if we, as a group, decide it is more important to introduce support for complex configuration before ETMv3.x/PTM decoding. I personally don't have a preference, it is simply a matter of deciding what we want to do.
I have CC'ed the coresight mailing list in order to reach a broader audience. Please speak up if you really have an issue (along with the rational) with supporting ETM complex configurations before ETMv3.x/PTM decoding.
Best regards, Mathieu
Hi Mathieu,
Looking at the follow up emails, it does seem there's a bit of thought needed to get this working well. I agree it's important to get this right and we shouldn't rush in a change for one particular use case.
We would like this to be as easy to use as possible to make AutoFDO simple for most users - maybe with some kind of wrapper script that will generate the config files and do the necessary sysfs accesses to configure it. This would be similar to AutoFDO on x86 which uses a wrapper script to select the correct PMU event number for the last branch records.
Most of Arm's focus is now on Arm v8-A platforms with ETMv4 - we're not seeing much activity on Arm v7-A platforms with ETMv3 / PTM. We have partners actively interested in AutoFDO.
Regards
Rob IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 2 May 2018 at 10:06, Robert Walker Robert.Walker@arm.com wrote:
-----Original Message----- From: Mathieu Poirier mathieu.poirier@linaro.org Sent: 27 April 2018 20:08 To: Robert Walker Robert.Walker@arm.com Cc: Mike Leach Mike.Leach@arm.com; Al Grant Al.Grant@arm.com; Travis Walton Travis.Walton@arm.com; coresight@lists.linaro.org Subject: Re: Upstream support for ETM strobing
On 27 April 2018 at 12:24, Robert Walker Robert.Walker@arm.com wrote:
Hi,
Strobing the ETM to reduce the amount of trace data when collecting profiles for AutoFDO seems to be working and providing useful optimizations. We’re currently working with some proof of concept patches (attached for reference) that add parameters to sysfs to configure the strobe period – before running perf record, the user must write to these parameters for each ETM. This isn’t suitable for production use as it has to be done for each ETM and the values persist after the trace session. To get this into upstream, we need to have this done by the perf record tool.
I understand there is work planned to enable more complex ETM configurations (such as strobing) from perf, possibly using a file to load register values from. Is this still the case, and if so, when is it likely to
be done?
Hi Robert,
I am currently working on supporting CPU-wide trace scenarios where I can start seeing the end of the tunnel. After that my plan was to add support for ETMv3.x/PTM trace decoding followed by support for N:N source/sink topology. Part of the latter is to introduce a way to enable more complex ETM configuration using a configuration file. In fact I already stumbled on how I want to do that and have a (very) small prototype that works.
So that is what I had in mind... But it doesn't mean I can't be talked into changing my priorities. In fact I will gladly do so if we, as a group, decide it is more important to introduce support for complex configuration before ETMv3.x/PTM decoding. I personally don't have a preference, it is simply a matter of deciding what we want to do.
I have CC'ed the coresight mailing list in order to reach a broader audience. Please speak up if you really have an issue (along with the rational) with supporting ETM complex configurations before ETMv3.x/PTM decoding.
Best regards, Mathieu
Hi Mathieu,
Hello Robert and thanks for the follow up.
Looking at the follow up emails, it does seem there's a bit of thought needed to get this working well. I agree it's important to get this right and we shouldn't rush in a change for one particular use case.
Indeed. My plan was to publish a very small prototype on this list. That way people can look at the syntax, think about existing (and upcoming) features and how best to describe the configuration tags in the file. We need this to be a concerted effort so that we aren't left with things we don't like.
We would like this to be as easy to use as possible to make AutoFDO simple for most users - maybe with some kind of wrapper script that will generate the config files and do the necessary sysfs accesses to configure it. This would be similar to AutoFDO on x86 which uses a wrapper script to select the correct PMU event number for the last branch records.
To be honest my hope is that once the complex configuration feature is available we can forget about sysFS access when working from the perf interface. I really hate the dancing around I have to do in the configuration process to avoid clobbering what's been done in sysFS. It's just a matter of time before things break.
Most of Arm's focus is now on Arm v8-A platforms with ETMv4 - we're not seeing much activity on Arm v7-A platforms with ETMv3 / PTM. We have partners actively interested in AutoFDO.
With the current work on CPU-wide scenarios and plans for the complex configuration feature I have pretty much accepted that we will do ARMv8 all the way and then look at ARMv7. Which brings us to answering your question from the original post: "When is it likely to be done?"
Don't go to town with this but in about a month sounds realistic for me. It could be sooner if things go my way with CPU-wide scenarios but I won't make promises. I'll get back to you in a couple of weeks when things get clearer.
Mathieu
Regards
Rob IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.