Hi Al,
On Mon, Apr 11, 2016 at 3:31 PM, Al Grant Al.Grant@arm.com wrote:
I have fixed the problem of mmap for STM, the kernel code is here [1], the user space example program is on linaro pastebin [2], I have tested the program on my device. If you have any questions, please let me know.
Hi Chunyan,
I think this isn't a good way of writing to STM external stimulus ports:
/* write data to map space */ memcpy(map, (char*)trace_data, sizeof(unsigned int) * TEST_DATA_SIZE);
Even though the external stimulus area is mapped into userspace it still needs to be treated as a memory-mapped peripheral. Writes should be done with the correct size to the correct address. The address is determined by what kind of STP packet you want to be output. Writes should be to addresses that are 0 modulo 8.
Ok, I understand this. I certainly didn't get the decoded traces while using mmap() to write data to STM like above.
If you are seeing correct STP output at all I think you're lucky - it's a combination of the way memcpy() happens to be implemented - or perhaps the way the compiler is expanding memcpy when the size is known to be 16 bytes - and the way STM is treating writes to addresses that aren't 0 modulo 8. If this generates D32 at all I'd expect this to generate two D32MTS packets followed by two D32M packets. With a longer string you'd likely get other combinations of flags until you eventually got into non-data packets or missing packets like TRIG. Different compilers and libraries will generate different packet streams because they implement memcpy() differently.
A low-level (packet-oriented) API would handle these issues and allow a higher-level (message-oriented) API to emit messages as a sequence of packets.
IIUC, writing STM device directly can fulfill this request, like below:
/* write the data via STM device file */ write(fd, (char*)trace_data, sizeof(unsigned int) * TEST_DATA_SIZE);
But I think it may be not fast enough for your requirement. Apart from this way, what else I can think out is to implement a user space wrapper program based on the mmap() system call I have finished in kernel space.
Thanks, Chunyan
I'm wondering what's the best way for me to put my proposal up for review. Does Linaro have a Confluence system?
Al
Thanks, Chunyan
[1] https://git.linaro.org/people/zhang.chunyan/linux.git/shortlog/refs/heads/st... mmap-test-v4.6 [2] https://pastebin.linaro.org/view/21995dfa
On Fri, Apr 1, 2016 at 5:41 PM, Ola Liljedahl Ola.Liljedahl@arm.com wrote:
On 01/04/2016, 11:21, "CoreSight on behalf of Al Grant" <coresight-bounces@lists.linaro.org on behalf of Al.Grant@arm.com> wrote:
Ah, right, using this address (map) directly has some thing wrong for now, I haven't found the root cause. Now I have to focus the work on getting STM driver upstreamed first, and then I will take a look at what caused the problem. I will get back to you once STM mmap() interface works well.
Ok I look forward to it. I think a userspace write API could be quite useful for low-overhead instrumentation. On my Cortex-A57 server I can write a 32-bit STM timestamped data item, guaranteed delivery, from userspace every 30ns. It's not as fast as writing an untimestamped item to a ring buffer in local cache (~1ns) but it's less disruptive on cache, and a lot faster than calling clock_gettime() and writing a timestamped item to a ring buffer (~250ns). And I'm fairly sure that I could make the same API work on Intel STH too although I haven't got one to try out.
I approve of this.
‹ Ola
Al _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight