Hi,
STM has 16 channels per 4KB page, (or 256 channels per 64KB page for the biggest page granule in ARMv8). And the STM supports up to 65k channels, so up to 4096 separate pages (AArch32 with 4KB pages) or 256 separate pages (AArch64 with 64KB pages), using 16MB of address space. This allows the STM channel space to be partitioned up between independent software agents. However implementers often want to allocate a smaller amount of physical space to STM.
Mapping just one page of STM is all right when there is just one agent (e.g. low-level firmware) but if there are multiple software agents it creates a problem. Is there now enough documentation on how the Linux STM driver will map pages of STM channel space, for us to use this in setting expectations on how much should be physically mapped in?
In particular, will the driver support mapping parts of STM channel space into userspace processes, allowing userspace to generate STM messages by storing directly into channel space without going via the kernel?
Al IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 2 February 2016 at 02:31, Al Grant Al.Grant@arm.com wrote:
Hi,
STM has 16 channels per 4KB page, (or 256 channels per 64KB page for the biggest page granule in ARMv8). And the STM supports up to 65k channels, so up to 4096 separate pages (AArch32 with 4KB pages) or 256 separate pages (AArch64 with 64KB pages), using 16MB of address space. This allows the STM channel space to be partitioned up between independent software agents. However implementers often want to allocate a smaller amount of physical space to STM.
Mapping just one page of STM is all right when there is just one agent (e.g. low-level firmware) but if there are multiple software agents it creates a problem. Is there now enough documentation on how the Linux STM driver will map pages of STM channel space, for us to use this in setting expectations on how much should be physically mapped in?
Sorry, I don't understand the question clearly enough to give you an accurate answer. Right now the CS-STM driver reads the amount of available channels in STMDEVID::NUMSP to map the right amount of space. That value can be tuned in the device tree by specifying the amount of IO space allocated to that device - the boot code will take the minimum of the two values. As such the amount of mapped space can be fined tune in the DT.
Get back to me if you still have questions on that topic.
In particular, will the driver support mapping parts of STM channel space into userspace processes, allowing userspace to generate STM messages by storing directly into channel space without going via the kernel?
The generic STM API has an optional mmio_addr() callback used especially for mmap() operations in user space, allowing to bypass the kernel when writing to channels. The CS-STM driver that has just been released for public review doesn't implement this interface (simply because we haven't done it yet).
Thanks, Mathieu
Al IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Mapping just one page of STM is all right when there is just one agent (e.g.
low-level firmware) but if there are multiple software agents it creates a problem. Is there now enough documentation on how the Linux STM driver will map pages of STM channel space, for us to use this in setting expectations on how much should be physically mapped in?
Sorry, I don't understand the question clearly enough to give you an accurate answer. Right now the CS-STM driver reads the amount of available channels in STMDEVID::NUMSP to map the right amount of space. That value can be tuned in the device tree by specifying the amount of IO space allocated to that device - the boot code will take the minimum of the two values. As such the amount of mapped space can be fined tune in the DT.
Get back to me if you still have questions on that topic.
No, I understand that, thanks.
In particular, will the driver support mapping parts of STM channel space into
userspace processes, allowing userspace to generate STM messages by storing directly into channel space without going via the kernel?
The generic STM API has an optional mmio_addr() callback used especially for mmap() operations in user space, allowing to bypass the kernel when writing to channels. The CS-STM driver that has just been released for public review doesn't implement this interface (simply because we haven't done it yet).
That's more what I was after, although I don't grasp exactly what you mean by mmio_addr(). What I am envisaging is that a userspace process could request a page of STM channels. Then the kernel might return a file descriptor and the process mmap()s that into its address space, so that the process can write STM messages directly. Is that roughly what will happen when it's done?
So the implication of that would be you need at least one page worth of channels per process, plus one for the kernel. There is kernel-enforced channel separation between pages, but channel usage within a page relies on the various users of that page coordinating their channel use.
Given that, we would want to recommend an STM channel space (size of physically memory-mapped area) that allowed separate pages for the kernel and for the maximum concurrent number of userspace processes using STM pages.
It doesn't have to work like that - everything could go through the kernel. But direct access from userspace is far more efficient. Are you already looking at timescales for doing that in CoreSight STM?
Al
Thanks, Mathieu
Al IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 2 February 2016 at 08:44, Al Grant Al.Grant@arm.com wrote:
Mapping just one page of STM is all right when there is just one agent (e.g.
low-level firmware) but if there are multiple software agents it creates a problem. Is there now enough documentation on how the Linux STM driver will map pages of STM channel space, for us to use this in setting expectations on how much should be physically mapped in?
Sorry, I don't understand the question clearly enough to give you an accurate answer. Right now the CS-STM driver reads the amount of available channels in STMDEVID::NUMSP to map the right amount of space. That value can be tuned in the device tree by specifying the amount of IO space allocated to that device - the boot code will take the minimum of the two values. As such the amount of mapped space can be fined tune in the DT.
Get back to me if you still have questions on that topic.
No, I understand that, thanks.
In particular, will the driver support mapping parts of STM channel space into
userspace processes, allowing userspace to generate STM messages by storing directly into channel space without going via the kernel?
The generic STM API has an optional mmio_addr() callback used especially for mmap() operations in user space, allowing to bypass the kernel when writing to channels. The CS-STM driver that has just been released for public review doesn't implement this interface (simply because we haven't done it yet).
That's more what I was after, although I don't grasp exactly what you mean by mmio_addr(). What I am envisaging is that a userspace process could request a page of STM channels. Then the kernel might return a file descriptor and the process mmap()s that into its address space, so that the process can write STM messages directly. Is that roughly what will happen when it's done?
Roughly yes.
So the implication of that would be you need at least one page worth of channels per process, plus one for the kernel. There is kernel-enforced channel separation between pages, but channel usage within a page relies on the various users of that page coordinating their channel use.
The generic STM API doesn't deal with pages - it only knows about channels. The channels allocated to various clients goes through debugFS and the STM policies. If users need to work with pages they will have to know about the architecture in order to specify the right alignment.
Given that, we would want to recommend an STM channel space (size of physically memory-mapped area) that allowed separate pages for the kernel and for the maximum concurrent number of userspace processes using STM pages.
It doesn't have to work like that - everything could go through the kernel. But direct access from userspace is far more efficient. Are you already looking at timescales for doing that in CoreSight STM?
Keeping the kernel out of the way is always a good thing. Implementation the mmio_addr() callback isn't very difficult, just a little tricky. We don't have a time frame for this activity, though Chunyan may be interested in taking a look when she's done integrating with Ftrace.
Al
Thanks, Mathieu
Al IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
The generic STM API doesn't deal with pages - it only knows about channels. The channels allocated to various clients goes through debugFS and the STM policies. If users need to work with pages they will have to know about the architecture in order to specify the right alignment.
I would expect userspace to call mmap() with a null pointer and have the kernel decide where to map it into userspace, rather than userspace specifying alignment. It would be similar to using the mmap() interface to perf sample events, wouldn't it?
If they ask for 16 channels and the minimum page size is 16K then they will get back a page that allows them to access to 64 channels. I don't see any other way to do it, assuming the STM is contiguous in the physical memory map. There might or might not be anything that tells them they have those extra 48 channels (although they could infer it from the page size), but the main thing is the kernel can't use that part of the physical address space, i.e. those channels, for anything else, until the first process has released them. So it needs at least one page per userspace agent simultaneously using STM.
Al
Given that, we would want to recommend an STM channel space (size of physically memory-mapped area) that allowed separate pages for the kernel and for the maximum concurrent number of userspace processes using STM pages.
It doesn't have to work like that - everything could go through the kernel. But direct access from userspace is far more efficient. Are you already looking at timescales for doing that in CoreSight STM?
Keeping the kernel out of the way is always a good thing. Implementation the mmio_addr() callback isn't very difficult, just a little tricky. We don't have a time frame for this activity, though Chunyan may be interested in taking a look when she's done integrating with Ftrace.
Al
Thanks, Mathieu
Al IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 2 February 2016 at 09:49, Al Grant Al.Grant@arm.com wrote:
The generic STM API doesn't deal with pages - it only knows about channels. The channels allocated to various clients goes through debugFS and the STM policies. If users need to work with pages they will have to know about the architecture in order to specify the right alignment.
I would expect userspace to call mmap() with a null pointer and have the kernel decide where to map it into userspace, rather than userspace specifying alignment. It would be similar to using the mmap() interface to perf sample events, wouldn't it?
That's exactly how it works.
If they ask for 16 channels and the minimum page size is 16K then they will get back a page that allows them to access to 64 channels. I don't see any other way to do it, assuming the STM is contiguous in the physical memory map.
That is correct.
There might or might not be anything that tells them they have those extra 48 channels (although they could infer it from the page size), but the main thing is the kernel can't use that part of the physical address space, i.e. those channels, for anything else, until the first process has released them. So it needs at least one page per userspace agent simultaneously using STM.
That's also correct, when proceeding to access the STM channels via the mmap() interface.
Al
Given that, we would want to recommend an STM channel space (size of physically memory-mapped area) that allowed separate pages for the kernel and for the maximum concurrent number of userspace processes using STM pages.
It doesn't have to work like that - everything could go through the kernel. But direct access from userspace is far more efficient. Are you already looking at timescales for doing that in CoreSight STM?
Keeping the kernel out of the way is always a good thing. Implementation the mmio_addr() callback isn't very difficult, just a little tricky. We don't have a time frame for this activity, though Chunyan may be interested in taking a look when she's done integrating with Ftrace.
Al
Thanks, Mathieu
Al IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Al, Mathieu and All,
On Wed, Feb 3, 2016 at 12:13 AM, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 2 February 2016 at 08:44, Al Grant Al.Grant@arm.com wrote:
Mapping just one page of STM is all right when there is just one agent (e.g.
low-level firmware) but if there are multiple software agents it creates a problem. Is there now enough documentation on how the Linux STM driver will map pages of STM channel space, for us to use this in setting expectations on how much should be physically mapped in?
Sorry, I don't understand the question clearly enough to give you an accurate answer. Right now the CS-STM driver reads the amount of available channels in STMDEVID::NUMSP to map the right amount of space. That value can be tuned in the device tree by specifying the amount of IO space allocated to that device - the boot code will take the minimum of the two values. As such the amount of mapped space can be fined tune in the DT.
Get back to me if you still have questions on that topic.
No, I understand that, thanks.
In particular, will the driver support mapping parts of STM channel space into
userspace processes, allowing userspace to generate STM messages by storing directly into channel space without going via the kernel?
The generic STM API has an optional mmio_addr() callback used especially for mmap() operations in user space, allowing to bypass the kernel when writing to channels. The CS-STM driver that has just been released for public review doesn't implement this interface (simply because we haven't done it yet).
That's more what I was after, although I don't grasp exactly what you mean by mmio_addr(). What I am envisaging is that a userspace process could request a page of STM channels. Then the kernel might return a file descriptor and the process mmap()s that into its address space, so that the process can write STM messages directly. Is that roughly what will happen when it's done?
Roughly yes.
So the implication of that would be you need at least one page worth of channels per process, plus one for the kernel. There is kernel-enforced channel separation between pages, but channel usage within a page relies on the various users of that page coordinating their channel use.
The generic STM API doesn't deal with pages - it only knows about channels. The channels allocated to various clients goes through debugFS and the STM policies. If users need to work with pages they will have to know about the architecture in order to specify the right alignment.
Given that, we would want to recommend an STM channel space (size of physically memory-mapped area) that allowed separate pages for the kernel and for the maximum concurrent number of userspace processes using STM pages.
It doesn't have to work like that - everything could go through the kernel. But direct access from userspace is far more efficient. Are you already looking at timescales for doing that in CoreSight STM?
Keeping the kernel out of the way is always a good thing. Implementation the mmio_addr() callback isn't very difficult, just a little tricky. We don't have a time frame for this activity, though Chunyan may be interested in taking a look when she's done integrating with Ftrace.
There're still some points on this topic/question I haven't understood very clearly though, I think what you are talking about seems very interesting, I will spend time to look at it.
I will get back to this email when I think I have made clear this question and mmap() operation in user space.
Thanks, Chunyan
Al
Thanks, Mathieu
Al IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Al,
I have added mmap() interface for CoreSight STM, the operation from the user space would be like what the attachment does. Since I just simply added a CoreSight STM hook to the generic STM framework, the operations of mapping channel space to a memory from user space is a little complex.
Get back to me when you want to use this feature, I can push the code to a place where you can get for the time being.
Thanks, Chunyan
On Wed, Feb 3, 2016 at 9:07 PM, Chunyan Zhang zhang.chunyan@linaro.org wrote:
Hi Al, Mathieu and All,
On Wed, Feb 3, 2016 at 12:13 AM, Mathieu Poirier mathieu.poirier@linaro.org wrote:
On 2 February 2016 at 08:44, Al Grant Al.Grant@arm.com wrote:
Mapping just one page of STM is all right when there is just one agent (e.g.
low-level firmware) but if there are multiple software agents it creates a problem. Is there now enough documentation on how the Linux STM driver will map pages of STM channel space, for us to use this in setting expectations on how much should be physically mapped in?
Sorry, I don't understand the question clearly enough to give you an accurate answer. Right now the CS-STM driver reads the amount of available channels in STMDEVID::NUMSP to map the right amount of space. That value can be tuned in the device tree by specifying the amount of IO space allocated to that device - the boot code will take the minimum of the two values. As such the amount of mapped space can be fined tune in the DT.
Get back to me if you still have questions on that topic.
No, I understand that, thanks.
In particular, will the driver support mapping parts of STM channel space into
userspace processes, allowing userspace to generate STM messages by storing directly into channel space without going via the kernel?
The generic STM API has an optional mmio_addr() callback used especially for mmap() operations in user space, allowing to bypass the kernel when writing to channels. The CS-STM driver that has just been released for public review doesn't implement this interface (simply because we haven't done it yet).
That's more what I was after, although I don't grasp exactly what you mean by mmio_addr(). What I am envisaging is that a userspace process could request a page of STM channels. Then the kernel might return a file descriptor and the process mmap()s that into its address space, so that the process can write STM messages directly. Is that roughly what will happen when it's done?
Roughly yes.
So the implication of that would be you need at least one page worth of channels per process, plus one for the kernel. There is kernel-enforced channel separation between pages, but channel usage within a page relies on the various users of that page coordinating their channel use.
The generic STM API doesn't deal with pages - it only knows about channels. The channels allocated to various clients goes through debugFS and the STM policies. If users need to work with pages they will have to know about the architecture in order to specify the right alignment.
Given that, we would want to recommend an STM channel space (size of physically memory-mapped area) that allowed separate pages for the kernel and for the maximum concurrent number of userspace processes using STM pages.
It doesn't have to work like that - everything could go through the kernel. But direct access from userspace is far more efficient. Are you already looking at timescales for doing that in CoreSight STM?
Keeping the kernel out of the way is always a good thing. Implementation the mmio_addr() callback isn't very difficult, just a little tricky. We don't have a time frame for this activity, though Chunyan may be interested in taking a look when she's done integrating with Ftrace.
There're still some points on this topic/question I haven't understood very clearly though, I think what you are talking about seems very interesting, I will spend time to look at it.
I will get back to this email when I think I have made clear this question and mmap() operation in user space.
Thanks, Chunyan
Al
Thanks, Mathieu
Al IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight