Hi Michael,
Let me try to explain some points based on my current understanding.
On Fri, Feb 5, 2016 at 6:25 AM, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Hello Michael,
I didn't have time to get to your email today - you're first on tomorrow's priority list.
Thanks, Mathieu
On 4 February 2016 at 08:24, Michael Williams Michael.Williams@arm.com wrote:
Hi Mathieu,
In case you don't remember, I work in the ARM architecture team with responsibility for debug and CoreSight. We have met before at Linaro Connect and discussed the Linaro support for CoreSight.
I have to say it's all looking very encouraging, with progress being made on CoreSight and trace support. So that's all good!
Now, I don't normally read the Linux mailing lists, but Al Grant pointed me at this patch to add CoreSight STM support [1].
[1] http://www.spinics.net/lists/arm-kernel/msg479457.html
There're some points that confuse me about this patch. Since I don't follow the mailing lists, Will Deacon suggested I mail you directly. In particular, its behaviour is different for STMv1 vs. STM-500, as well as being different to the Intel driver [2].
[2] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/intel_th/sth...
E.g. consider a call for the following:
uint64 data[128]; // A 64b aligned pointer stm.packet(16, &((char *)data)[1]); // Send 16 bytes, unaligned
The Intel 32bit driver (sth_stm_driver) will:
- Send a D32 packet consisting of data[4..1] (assuming they don't fault misaligned addresses). Because size > 8, it rounds size down to 4.
- Ignore the other data.
There's a loop [5] to send packet, so will not ignore other data. For this case, the process should be that send 4*D32 packets to STM buffer.
[5] http://lxr.free-electrons.com/source/drivers/hwtracing/stm/core.c#L391
- Only ever generates a single packet.
- For other sizes, rounds size down to power of 2 and returns number of bytes written.
For other sizes on 32bit driver, if the size > 4, send D32 at a time until the size of left packets which haven't been send to buffer is less than 4, and then if the size of left packets is larger than 2, will send D16, and then if still one byte of packets left, then send D4.
64bit driver has a similar process, one exception is the largest size of packets that are sent at a time is 8 instead of 4.
The Intel 64bit driver (sth_stm_driver) will:
- Do nothing because size > 8.
Like what I explained above, it will send 8 bytes at a time until the size of left packets less then 8, and then D32 D16 D8 if needed.
- Only ever generates a zero or one packets.
- For size <= 8, rounds size down to power of 2 and returns number of bytes written.
The CoreSight 32bit driver (stm_send) will:
- Send a D1=data[1], D2=data[3..2], D4=data[7..4], D4=data[11..8], D4=data[15..12], D1=data[16] stream.
- This is very inefficient use of bandwidth.
The CoreSight 64bit driver (stm_send_64bit) will:
- Send a D8=ZeroExtend(data[7..1]), D8=data[15..8] D8=ZeroExtend(data[16])
- This function only ever sends D8 packets.
- There is no way for the decoder to work out what the original data was.
How about if we process packets on 64bit like we did on 32bit? Can the decoder work with the original data generated by the way we are using on CoreSight 32bit system.
I think this function is only called from within the generic driver [3], though. It looks like stm_write() calls it a chunk at a time, with a chunk being at most 4/8 bytes, depending on the capabilities of the STM, and is expecting it to send only a single packet. This means that the code in stm_send/stm_send_64bit to deal with odd sized packets and misaligned addresses looks redundant/wrong.
Yes, this is indeed something that have to be optimized/revised. I will spend time to look into it. Thanks for pointing this.
[3] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/stm/core.c
It looks like there might be support for other sources to link to the driver, but I could only find the stm_console when I looked.
Yes, I'm working on a integration of Linux Kernel Ftrace subsystem with CoreSight STM, there's another STM source to serve this feature. If you're interested in this, please let me know, I will cc you when sending out the patches for public review.
Hope these are helpful.
Thanks, Chunyan
Of course, this is based on my limited understanding of how this is used, and the current generic STM and Intel drivers. It might be that these changes have been agreed and Intel plan to change their driver to match (as I said, I don't generally follow the mailing lists). However, the different 64b and 32b behaviors on the ARM version are weird, and the unaligned pointer handling looks wrong too.
(The different behaviors on the Intel version isn't my problem. :-)
It might also be that I am reverse engineering the behaviour incorrectly.
The other point (which Will has raised on the mailing lists in the past [4]) is this code:
#ifndef CONFIG_64BIT static inline void __raw_writeq(u64 val, volatile void __iomem *addr) { asm volatile("strd %1, %0" : "+Qo" (*(volatile u64 __force *)addr) : "r" (val)); } #undef writeq_relaxed #define writeq_relaxed(v, c)__raw_writeq((__force u64) cpu_to_le64(v), c) #endif
This isn't guaranteed to work on the ARM 32 bit architectures. The STM might receive a 64-bit write, or might receive a pair of 32-bit writes to the two addressed words *in either order*. The upshot is that this is not a valid way of writing to the STM. (The data reordering is a killer.)
The driver appears to use this if there is an STM-500 in an AArch32 system. This is because the code interrogates the STM to decide whether it supports 64-bit accesses. It should either (a) not do so, and refuse 64-bit data if AArch32, or (b) use some property of the system to decide. I would still frown on (b) because the architecture makes it clear that this is UNPREDICTABLE, meaning you're not supposed to rely on it and the device isn't allowed to advertise its behaviour.
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/297379.ht...
I hope this is useful.
With kind regards,
Mike.
Michael Williams Principal Engineer ARM Limited www.arm.com The Architecture For The Digital World IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight