Hi,
I'd like to get your thoughts on an appropriate low-level userspace API for STM. The API I'm after would assume the ability to map a range of channels into userspace as detailed in Documentation/trace/stm.txt:
"Some STM devices may allow direct mapping of the channel mmio region to userspace for zero-copy writing..."
What that doesn't describe is what you do with the region once you've got it. Which suggests a userspace API that can abstract over the way channels are represented in the memory map, i.e. how bits of the address are used to influence the data packets. Given a channel number (or a relative channel number) and some data, it would generate a store to the right address, or an lvalue at the right address. So you could write something like this:
STM_TRACE_DATA(p, TIMESTAMP, uint32, 123);
or perhaps
*STM_TRACE_CHANNEL(p, TIMESTAMP, uint32) = 123;
and this would become inline code that would be a single store that would cause a D32 packet. The ARM and Intel implementations would differ in how they calculated the address and perhaps on whether some features were available. 64-bit writes would be unavailable on some ARM systems.
Userspace libraries that implemented higher-level messaging formats could then sit on top of this lower-level API.
I don't think I've seen this in any of the patches that have come round but is anyone working on anything like this?
Al
On 30 March 2016 at 11:08, Al Grant Al.Grant@arm.com wrote:
Hi,
I'd like to get your thoughts on an appropriate low-level userspace API for STM. The API I'm after would assume the ability to map a range of channels into userspace as detailed in Documentation/trace/stm.txt:
"Some STM devices may allow direct mapping of the channel mmio region to userspace for zero-copy writing..."
What that doesn't describe is what you do with the region once you've got it. Which suggests a userspace API that can abstract over the way channels are represented in the memory map, i.e. how bits of the address are used to influence the data packets. Given a channel number (or a relative channel number) and some data, it would generate a store to the right address, or an lvalue at the right address. So you could write something like this:
STM_TRACE_DATA(p, TIMESTAMP, uint32, 123);
or perhaps
*STM_TRACE_CHANNEL(p, TIMESTAMP, uint32) = 123;
and this would become inline code that would be a single store that would cause a D32 packet. The ARM and Intel implementations would differ in how they calculated the address and perhaps on whether some features were available. 64-bit writes would be unavailable on some ARM systems.
Userspace libraries that implemented higher-level messaging formats could then sit on top of this lower-level API.
I don't think I've seen this in any of the patches that have come round but is anyone working on anything like this?
Good afternoon,
Nobody is working a user space API that would do something like that nor do I personally have plan for the team to do so. Aside from a few exception (like the openCSD project) Linaro's activities are mainly focused on the kernel.
That being said I never thought of this team as an autocracy - if someone wants to work on this I'll be happy to help draft the card.
Regards, Mathieu
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On 30/03/2016, 22:58, "CoreSight on behalf of Mathieu Poirier" <coresight-bounces@lists.linaro.org on behalf of mathieu.poirier@linaro.org> wrote:
On 30 March 2016 at 11:08, Al Grant Al.Grant@arm.com wrote:
Hi,
I'd like to get your thoughts on an appropriate low-level userspace API for STM. The API I'm after would assume the ability to map a range of channels into userspace as detailed in Documentation/trace/stm.txt:
"Some STM devices may allow direct mapping of the channel mmio region to userspace for zero-copy writing..."
What that doesn't describe is what you do with the region once you've got it. Which suggests a userspace API that can abstract over the way channels are represented in the memory map, i.e. how bits of the address are used to influence the data packets. Given a channel number (or a relative channel number) and some data, it would generate a store to the right address, or an lvalue at the right address. So you could write something like this:
STM_TRACE_DATA(p, TIMESTAMP, uint32, 123);
or perhaps
*STM_TRACE_CHANNEL(p, TIMESTAMP, uint32) = 123;
and this would become inline code that would be a single store that would cause a D32 packet. The ARM and Intel implementations would differ in how they calculated the address and perhaps on whether some features were available. 64-bit writes would be unavailable on some ARM systems.
Userspace libraries that implemented higher-level messaging formats could then sit on top of this lower-level API.
I don't think I've seen this in any of the patches that have come round but is anyone working on anything like this?
Good afternoon,
Nobody is working a user space API that would do something like that nor do I personally have plan for the team to do so. Aside from a few exception (like the openCSD project) Linaro's activities are mainly focused on the kernel.
Meaning what? Tracing from the kernel only? Or CoreSight tracing as an OS/kernel service to applications through a system call interface? Or?
‹ Ola
That being said I never thought of this team as an autocracy - if someone wants to work on this I'll be happy to help draft the card.
Regards, Mathieu
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On 31 March 2016 at 03:43, Ola Liljedahl Ola.Liljedahl@arm.com wrote:
On 30/03/2016, 22:58, "CoreSight on behalf of Mathieu Poirier" <coresight-bounces@lists.linaro.org on behalf of mathieu.poirier@linaro.org> wrote:
On 30 March 2016 at 11:08, Al Grant Al.Grant@arm.com wrote:
Hi,
I'd like to get your thoughts on an appropriate low-level userspace API for STM. The API I'm after would assume the ability to map a range of channels into userspace as detailed in Documentation/trace/stm.txt:
"Some STM devices may allow direct mapping of the channel mmio region to userspace for zero-copy writing..."
What that doesn't describe is what you do with the region once you've got it. Which suggests a userspace API that can abstract over the way channels are represented in the memory map, i.e. how bits of the address are used to influence the data packets. Given a channel number (or a relative channel number) and some data, it would generate a store to the right address, or an lvalue at the right address. So you could write something like this:
STM_TRACE_DATA(p, TIMESTAMP, uint32, 123);
or perhaps
*STM_TRACE_CHANNEL(p, TIMESTAMP, uint32) = 123;
and this would become inline code that would be a single store that would cause a D32 packet. The ARM and Intel implementations would differ in how they calculated the address and perhaps on whether some features were available. 64-bit writes would be unavailable on some ARM systems.
Userspace libraries that implemented higher-level messaging formats could then sit on top of this lower-level API.
I don't think I've seen this in any of the patches that have come round but is anyone working on anything like this?
Good afternoon,
Nobody is working a user space API that would do something like that nor do I personally have plan for the team to do so. Aside from a few exception (like the openCSD project) Linaro's activities are mainly focused on the kernel.
Meaning what? Tracing from the kernel only? Or CoreSight tracing as an OS/kernel service to applications through a system call interface? Or?
Meaning that we focus on providing solutions that are confined to kernel space. User space is a completely different realm that we try to avoid due to the large spectrum it covers. We got involved in the openCSD project because without an open source decoding solution, the kernel driver stack is useless.
We simply don't have the resources and the use cases to get involved in user space. Our goal is to bring things forward to a point where people can start using them to build product or solve problems. The drivers can expose interfaces to user space but we won't get involved in how those interface are get used.
Mathieu
‹ Ola
That being said I never thought of this team as an autocracy - if someone wants to work on this I'll be happy to help draft the card.
Regards, Mathieu
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Al,
On Thu, Mar 31, 2016 at 1:08 AM, Al Grant Al.Grant@arm.com wrote:
Hi,
I'd like to get your thoughts on an appropriate low-level userspace API for STM. The API I'm after would assume the ability to map a range of channels into userspace as detailed in Documentation/trace/stm.txt:
"Some STM devices may allow direct mapping of the channel mmio region to userspace for zero-copy writing..."
What that doesn't describe is what you do with the region once you've got it.
It is like mmap() interface I have done for STM, I replied one your previous email on March 1st with addressing how to use the mmap() interface from user space, the subject of that email was "STM memory space", if you cannot find that email in you mailbox, please let me know, I can forward that to you.
Regards, Chunyan
Which suggests a userspace API that can abstract over the way channels are represented in the memory map, i.e. how bits of the address are used to influence the data packets. Given a channel number (or a relative channel number) and some data, it would generate a store to the right address, or an lvalue at the right address. So you could write something like this:
STM_TRACE_DATA(p, TIMESTAMP, uint32, 123);
or perhaps
*STM_TRACE_CHANNEL(p, TIMESTAMP, uint32) = 123;
and this would become inline code that would be a single store that would cause a D32 packet. The ARM and Intel implementations would differ in how they calculated the address and perhaps on whether some features were available. 64-bit writes would be unavailable on some ARM systems.
Userspace libraries that implemented higher-level messaging formats could then sit on top of this lower-level API.
I don't think I've seen this in any of the patches that have come round but is anyone working on anything like this?
Al
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
It is like mmap() interface I have done for STM, I replied one your previous email on March 1st with addressing how to use the mmap() interface from user space, the subject of that email was "STM memory space", if you cannot find that email in you mailbox, please let me know, I can forward that to you.
Thanks, I got the one where you sent sample code for allocating pages into userspace:
map = (char *)mmap(0, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset); if (map == MAP_FAILED) { printf("Failed to map %s\n", strerror(errno)); goto out; } printf("Success to map channel(%u~%u) to 0x%x\n", policy->channel, (policy->width - policy->channel), (unsigned long)map);
write(fd, trace_data, size);
/* unmap the area & error checking */ if (munmap(map, length) == -1) perror("user: Error un-mmapping the file");
So you have mapped channels into userspace and you print out the address in userspace (map) but you don't store anything into it. What I am suggesting here is an API for generating STP packets by doing those stores, and doing this in a way that's both portable and low-overhead.
Apolpogies if you have already suggested this somewhere else.
Al
On Thu, Mar 31, 2016 at 6:36 PM, Al Grant Al.Grant@arm.com wrote:
It is like mmap() interface I have done for STM, I replied one your previous email on March 1st with addressing how to use the mmap() interface from user space, the subject of that email was "STM memory space", if you cannot find that email in you mailbox, please let me know, I can forward that to you.
Thanks, I got the one where you sent sample code for allocating pages into userspace:
map = (char *)mmap(0, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset); if (map == MAP_FAILED) { printf("Failed to map %s\n", strerror(errno)); goto out; } printf("Success to map channel(%u~%u) to 0x%x\n", policy->channel, (policy->width - policy->channel), (unsigned long)map); write(fd, trace_data, size); /* unmap the area & error checking */ if (munmap(map, length) == -1) perror("user: Error un-mmapping the file");
So you have mapped channels into userspace and you print out the address in userspace (map) but you don't store anything into it. What I am suggesting
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Thanks, Chunyan
here is an API for generating STP packets by doing those stores, and doing this in a way that's both portable and low-overhead.
Apolpogies if you have already suggested this somewhere else.
Al
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
Al
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com> wrote:
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Al and Ola,
I have fixed the problem of mmap for STM, the kernel code is here [1], the user space example program is on linaro pastebin [2], I have tested the program on my device. If you have any questions, please let me know.
Thanks, Chunyan
[1] https://git.linaro.org/people/zhang.chunyan/linux.git/shortlog/refs/heads/st... [2] https://pastebin.linaro.org/view/21995dfa
On Fri, Apr 1, 2016 at 5:41 PM, Ola Liljedahl Ola.Liljedahl@arm.com wrote:
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com> wrote:
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Chunyan and Al,
I suggest that you guys work together to add one of Al's user space API in the demonstration program. That program, along with the user space API specification, can be submitted upstream with the mmap patch. That way people have something to start with and the API, although not official, has the opportunity to be seen and used by the community.
Mathieu
On 7 April 2016 at 07:22, Chunyan Zhang zhang.chunyan@linaro.org wrote:
Hi Al and Ola,
I have fixed the problem of mmap for STM, the kernel code is here [1], the user space example program is on linaro pastebin [2], I have tested the program on my device. If you have any questions, please let me know.
Thanks, Chunyan
[1] https://git.linaro.org/people/zhang.chunyan/linux.git/shortlog/refs/heads/st... [2] https://pastebin.linaro.org/view/21995dfa
On Fri, Apr 1, 2016 at 5:41 PM, Ola Liljedahl Ola.Liljedahl@arm.com wrote:
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com> wrote:
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
I have fixed the problem of mmap for STM, the kernel code is here [1], the user space example program is on linaro pastebin [2], I have tested the program on my device. If you have any questions, please let me know.
Hi Chunyan,
I think this isn't a good way of writing to STM external stimulus ports:
/* write data to map space */ memcpy(map, (char*)trace_data, sizeof(unsigned int) * TEST_DATA_SIZE);
Even though the external stimulus area is mapped into userspace it still needs to be treated as a memory-mapped peripheral. Writes should be done with the correct size to the correct address. The address is determined by what kind of STP packet you want to be output. Writes should be to addresses that are 0 modulo 8.
If you are seeing correct STP output at all I think you're lucky - it's a combination of the way memcpy() happens to be implemented - or perhaps the way the compiler is expanding memcpy when the size is known to be 16 bytes - and the way STM is treating writes to addresses that aren't 0 modulo 8. If this generates D32 at all I'd expect this to generate two D32MTS packets followed by two D32M packets. With a longer string you'd likely get other combinations of flags until you eventually got into non-data packets or missing packets like TRIG. Different compilers and libraries will generate different packet streams because they implement memcpy() differently.
A low-level (packet-oriented) API would handle these issues and allow a higher-level (message-oriented) API to emit messages as a sequence of packets.
I'm wondering what's the best way for me to put my proposal up for review. Does Linaro have a Confluence system?
Al
Thanks, Chunyan
[1] https://git.linaro.org/people/zhang.chunyan/linux.git/shortlog/refs/heads/st... mmap-test-v4.6 [2] https://pastebin.linaro.org/view/21995dfa
On Fri, Apr 1, 2016 at 5:41 PM, Ola Liljedahl Ola.Liljedahl@arm.com wrote:
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com> wrote:
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Al,
On Mon, Apr 11, 2016 at 3:31 PM, Al Grant Al.Grant@arm.com wrote:
I have fixed the problem of mmap for STM, the kernel code is here [1], the user space example program is on linaro pastebin [2], I have tested the program on my device. If you have any questions, please let me know.
Hi Chunyan,
I think this isn't a good way of writing to STM external stimulus ports:
/* write data to map space */ memcpy(map, (char*)trace_data, sizeof(unsigned int) * TEST_DATA_SIZE);
Even though the external stimulus area is mapped into userspace it still needs to be treated as a memory-mapped peripheral. Writes should be done with the correct size to the correct address. The address is determined by what kind of STP packet you want to be output. Writes should be to addresses that are 0 modulo 8.
Ok, I understand this. I certainly didn't get the decoded traces while using mmap() to write data to STM like above.
If you are seeing correct STP output at all I think you're lucky - it's a combination of the way memcpy() happens to be implemented - or perhaps the way the compiler is expanding memcpy when the size is known to be 16 bytes - and the way STM is treating writes to addresses that aren't 0 modulo 8. If this generates D32 at all I'd expect this to generate two D32MTS packets followed by two D32M packets. With a longer string you'd likely get other combinations of flags until you eventually got into non-data packets or missing packets like TRIG. Different compilers and libraries will generate different packet streams because they implement memcpy() differently.
A low-level (packet-oriented) API would handle these issues and allow a higher-level (message-oriented) API to emit messages as a sequence of packets.
IIUC, writing STM device directly can fulfill this request, like below:
/* write the data via STM device file */ write(fd, (char*)trace_data, sizeof(unsigned int) * TEST_DATA_SIZE);
But I think it may be not fast enough for your requirement. Apart from this way, what else I can think out is to implement a user space wrapper program based on the mmap() system call I have finished in kernel space.
Thanks, Chunyan
I'm wondering what's the best way for me to put my proposal up for review. Does Linaro have a Confluence system?
Al
Thanks, Chunyan
[1] https://git.linaro.org/people/zhang.chunyan/linux.git/shortlog/refs/heads/st... mmap-test-v4.6 [2] https://pastebin.linaro.org/view/21995dfa
On Fri, Apr 1, 2016 at 5:41 PM, Ola Liljedahl Ola.Liljedahl@arm.com wrote:
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com> wrote:
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
A low-level (packet-oriented) API would handle these issues and allow a higher-level (message-oriented) API to emit messages as a sequence of packets.
IIUC, writing STM device directly can fulfill this request, like below:
/* write the data via STM device file */ write(fd, (char*)trace_data, sizeof(unsigned int) * TEST_DATA_SIZE);
But I think it may be not fast enough for your requirement.
Yes you're write that if you have a large amount of data you could write it out using write() to the STM device driver, but the overhead would depend on the size of the message.
The cost of going via a syscall is 1 syscall overhead (~1000ns?) plus ~30ns per STM packet.
The cost of going via a low-level userspace API is ~30ns per packet.
So you would have to be writing hundreds of packets in one call, for the syscall overhead to become insignificant. Even some of the general-purpose higher level message protocols are designed to only use about 10 packets per message - that would be ~300ns userspace, ~1300 with write(). For me the whole attraction of STM is that you can write messages 'little and often' and they get individual timestamps and channel identifiers. You could do quite a lot with just one packet per message.
Apart from this way, what else I can think out is to implement a user space wrapper program based on the mmap() system call I have finished in kernel space.
Ok I'll look forward to it.
We'll be putting the CoreSight Access Library up on ARM's public github area soon, and that might be a way to share ideas for an STM userspace API.
Al
Thanks, Chunyan
I'm wondering what's the best way for me to put my proposal up for review. Does Linaro have a Confluence system?
Al
Thanks, Chunyan
[1] https://git.linaro.org/people/zhang.chunyan/linux.git/shortlog/refs/h eads/stm- mmap-test-v4.6 [2] https://pastebin.linaro.org/view/21995dfa
On Fri, Apr 1, 2016 at 5:41 PM, Ola Liljedahl Ola.Liljedahl@arm.com
wrote:
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com>
wrote:
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
Hi Al,
On Mon, Apr 11, 2016 at 8:30 PM, Al Grant Al.Grant@arm.com wrote:
A low-level (packet-oriented) API would handle these issues and allow a higher-level (message-oriented) API to emit messages as a sequence of packets.
IIUC, writing STM device directly can fulfill this request, like below:
/* write the data via STM device file */ write(fd, (char*)trace_data, sizeof(unsigned int) * TEST_DATA_SIZE);
But I think it may be not fast enough for your requirement.
Yes you're write that if you have a large amount of data you could write it out using write() to the STM device driver, but the overhead would depend on the size of the message.
The cost of going via a syscall is 1 syscall overhead (~1000ns?) plus ~30ns per STM packet.
The cost of going via a low-level userspace API is ~30ns per packet.
So you would have to be writing hundreds of packets in one call, for the syscall overhead to become insignificant. Even some of the general-purpose higher level message protocols are designed to only use about 10 packets per message - that would be ~300ns userspace, ~1300 with write(). For me the whole attraction of STM is that you can write messages 'little and often' and they get individual timestamps and channel identifiers. You could do quite a lot with just one packet per message.
Apart from this way, what else I can think out is to implement a user space wrapper program based on the mmap() system call I have finished in kernel space.
Ok I'll look forward to it.
I finished one version of API along with an example program, I've tested it on my device, you can get it by the command [1], and the kernel supports CS-STM mmap interface from [2] branch stm-mmap-test-v4.6.
Please get back to me if you have any question.
Thanks, Chunyan
[1] git clone -b mmap-wrapper https://github.com/lyrazhang/user-space-tests.git [2] https://git.linaro.org/people/zhang.chunyan/linux.git
We'll be putting the CoreSight Access Library up on ARM's public github area soon, and that might be a way to share ideas for an STM userspace API.
Al
Thanks, Chunyan
I'm wondering what's the best way for me to put my proposal up for review. Does Linaro have a Confluence system?
Al
Thanks, Chunyan
[1] https://git.linaro.org/people/zhang.chunyan/linux.git/shortlog/refs/h eads/stm- mmap-test-v4.6 [2] https://pastebin.linaro.org/view/21995dfa
On Fri, Apr 1, 2016 at 5:41 PM, Ola Liljedahl Ola.Liljedahl@arm.com
wrote:
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com>
wrote:
> Ah, right, using this address (map) directly has some thing wrong >for now, I haven't found the root cause. Now I have to focus the >work on getting STM driver upstreamed first, and then I will take >a look at what caused the problem. > I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight