Hi,
STM has 16 channels per 4KB page, (or 256 channels per 64KB page for the biggest page granule in ARMv8). And the STM supports up to 65k channels, so up to 4096 separate pages (AArch32 with 4KB pages) or 256 separate pages (AArch64 with 64KB pages), using 16MB of address space. This allows the STM channel space to be partitioned up between independent software agents. However implementers often want to allocate a smaller amount of physical space to STM.
Mapping just one page of STM is all right when there is just one agent (e.g. low-level firmware) but if there are multiple software agents it creates a problem. Is there now enough documentation on how the Linux STM driver will map pages of STM channel space, for us to use this in setting expectations on how much should be physically mapped in?
In particular, will the driver support mapping parts of STM channel space into userspace processes, allowing userspace to generate STM messages by storing directly into channel space without going via the kernel?
Al
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Mike and Tor,
I've added my slides in the presentation and fixed a few things around
it. Please review/modify as you see fit. You may want to take a look
at slide 27 and add what you have in mind for the coming weeks.
Another slide can be added if we're running out of room.
Serge and Chunyan, I intend to use you guys as a barometer. Please
have a thorough review of the presentation - if you still have
questions on the project after reading the presentation, we need to
somehow/somewhere add that information in there.
Thanks,
Mathieu
On my side I fixed the stability problem I was seeing and pushed a new
branch, "perf-opencsd-4.5-rc5", to my repository on g.l.o [1]. The
new header is in [2]. I moved the snapshot information from the CPU
specific portion to the global header. That way we avoid duplication.
But doing so breaks [3] where CS_ETM_SNAPSHOT now has the wrong index
value. Tor, please help me fix that.
Last but not least I generated a new bundle[4] for testing.
Have a look and tell me what you think.
In the mean time I'll continue working on adding the sink in the PMU
options from the cmd line.
Thanks,
Mathieu
[1]. http://git.linaro.org/git-ro/people/mathieu.poirier/coresight.git
[2]. tools/perf/util/cs-etm.h
[3]. tools/perf/util/cs-etm.c
[4]. http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.feb24.tgz
On 24 February 2016 at 11:36, Jeremiassen, Tor <tor(a)ti.com> wrote:
> I pushed my latest changes to github.
>
>
>
> Main thing is that you now need to define an env variable CSTRACE_PATH with
> the full path to the ref_trace_decoder directory under OpenCSD before you
> build – otherwise the trace decoder will be disabled.
>
>
>
> e.g., setenv CSTRACE_PATH /home/tor/linaro/git/OpenCSD/ref_trace_decoder
>
>
>
>
>
> Tor
>
>
>
> ---
> Tor Jeremiassen, Ph.D.
>
> Senior Member Technical Staff
> SDO Foundational Tools
> Texas Instruments Ph: 832 939 2356
> 13905 University Lane Fax: 832 939 2015
> Sugar Land, TX 77479 Email: tor(a)ti.com
>
>
Hi Mathieu,
In case you don't remember, I work in the ARM architecture team with responsibility for debug and CoreSight. We have met before at Linaro Connect and discussed the Linaro support for CoreSight.
I have to say it's all looking very encouraging, with progress being made on CoreSight and trace support. So that's all good!
Now, I don't normally read the Linux mailing lists, but Al Grant pointed me at this patch to add CoreSight STM support [1].
[1] http://www.spinics.net/lists/arm-kernel/msg479457.html
There're some points that confuse me about this patch. Since I don't follow the mailing lists, Will Deacon suggested I mail you directly. In particular, its behaviour is different for STMv1 vs. STM-500, as well as being different to the Intel driver [2].
[2] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/intel_th/st…
E.g. consider a call for the following:
uint64 data[128]; // A 64b aligned pointer
stm.packet(16, &((char *)data)[1]); // Send 16 bytes, unaligned
The Intel 32bit driver (sth_stm_driver) will:
* Send a D32 packet consisting of data[4..1] (assuming they don't fault misaligned addresses). Because size > 8, it rounds size down to 4.
* Ignore the other data.
* Only ever generates a single packet.
* For other sizes, rounds size down to power of 2 and returns number of bytes written.
The Intel 64bit driver (sth_stm_driver) will:
* Do nothing because size > 8.
* Only ever generates a zero or one packets.
* For size <= 8, rounds size down to power of 2 and returns number of bytes written.
The CoreSight 32bit driver (stm_send) will:
* Send a D1=data[1], D2=data[3..2], D4=data[7..4], D4=data[11..8], D4=data[15..12], D1=data[16] stream.
* This is very inefficient use of bandwidth.
The CoreSight 64bit driver (stm_send_64bit) will:
* Send a D8=ZeroExtend(data[7..1]), D8=data[15..8] D8=ZeroExtend(data[16])
* This function only ever sends D8 packets.
* There is no way for the decoder to work out what the original data was.
I think this function is only called from within the generic driver [3], though. It looks like stm_write() calls it a chunk at a time, with a chunk being at most 4/8 bytes, depending on the capabilities of the STM, and is expecting it to send only a single packet. This means that the code in stm_send/stm_send_64bit to deal with odd sized packets and misaligned addresses looks redundant/wrong.
[3] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/stm/core.c
It looks like there might be support for other sources to link to the driver, but I could only find the stm_console when I looked.
Of course, this is based on my limited understanding of how this is used, and the current generic STM and Intel drivers. It might be that these changes have been agreed and Intel plan to change their driver to match (as I said, I don't generally follow the mailing lists). However, the different 64b and 32b behaviors on the ARM version are weird, and the unaligned pointer handling looks wrong too.
(The different behaviors on the Intel version isn't my problem. :-)
It might also be that I am reverse engineering the behaviour incorrectly.
The other point (which Will has raised on the mailing lists in the past [4]) is this code:
#ifndef CONFIG_64BIT
static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
{
asm volatile("strd %1, %0"
: "+Qo" (*(volatile u64 __force *)addr)
: "r" (val));
}
#undef writeq_relaxed
#define writeq_relaxed(v, c)__raw_writeq((__force u64) cpu_to_le64(v), c)
#endif
This isn't guaranteed to work on the ARM 32 bit architectures. The STM might receive a 64-bit write, or might receive a pair of 32-bit writes to the two addressed words *in either order*. The upshot is that this is not a valid way of writing to the STM. (The data reordering is a killer.)
The driver appears to use this if there is an STM-500 in an AArch32 system. This is because the code interrogates the STM to decide whether it supports 64-bit accesses. It should either (a) not do so, and refuse 64-bit data if AArch32, or (b) use some property of the system to decide. I would still frown on (b) because the architecture makes it clear that this is UNPREDICTABLE, meaning you're not supposed to rely on it and the device isn't allowed to advertise its behaviour.
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/297379.h…
I hope this is useful.
With kind regards,
Mike.
--
Michael Williams Principal Engineer ARM Limited
www.arm.com The Architecture For The Digital World
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi All,
What is the best practice to redirect the Ftrace output over STM?
Can we use the existing module "stm_console" and redirect the Ftrace output
as a kernel message by :
# cat trace_pipe > /dev/kmsg
Other question please:
How can the "stm_core" module knows the STM base address?
Best regards,
Jonatan
Good afternoon/evening Mike (or anyone else in a position to answer),
I wish we could have that conversation on IRC as I am sure my question
will be inaccurate. I'm also well aware the weekend has started in
the UK so it could also wait until Monday. But I'll try to be as
precise as possible....
When decoding STM traces, how is the library aware of the masterIDs
present on the system? I suppose there is an external way of passing
that information to the decoder... The metadata contained in the
perf.data file is irrelevant when dealing with STM traces.
Some clarification would be appreciated.
Many thanks,
Mathieu
Hi,
What's the plan for CoreSight-related discussion at Connect? It would be a good
opportunity to raise awareness of the CoreSight framework among silicon vendors.
Al
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.