Good morning Michael,
On 4 February 2016 at 08:24, Michael Williams Michael.Williams@arm.com wrote:
Hi Mathieu,
In case you don't remember, I work in the ARM architecture team with responsibility for debug and CoreSight. We have met before at Linaro Connect and discussed the Linaro support for CoreSight.
Yes, I remember - it was in San Francisco last year.
I have to say it's all looking very encouraging, with progress being made on CoreSight and trace support. So that's all good!
Good to hear, we are working hard.
Now, I don't normally read the Linux mailing lists, but Al Grant pointed me at this patch to add CoreSight STM support [1].
[1] http://www.spinics.net/lists/arm-kernel/msg479457.html
There're some points that confuse me about this patch. Since I don't follow the mailing lists, Will Deacon suggested I mail you directly. In particular, its behaviour is different for STMv1 vs. STM-500, as well as being different to the Intel driver [2].
First and foremost it is important to understand that this driver is _not_ tailored for STM500. I don't have HW with an STM500 and as such can't test the integration. Any code that seem to imply the contrary is coincidental.
[2] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/intel_th/sth...
E.g. consider a call for the following:
uint64 data[128]; // A 64b aligned pointer stm.packet(16, &((char *)data)[1]); // Send 16 bytes, unaligned
The Intel 32bit driver (sth_stm_driver) will:
- Send a D32 packet consisting of data[4..1] (assuming they don't fault misaligned addresses). Because size > 8, it rounds size down to 4.
- Ignore the other data.
- Only ever generates a single packet.
- For other sizes, rounds size down to power of 2 and returns number of bytes written.
The Intel 64bit driver (sth_stm_driver) will:
- Do nothing because size > 8.
- Only ever generates a zero or one packets.
- For size <= 8, rounds size down to power of 2 and returns number of bytes written.
The CoreSight 32bit driver (stm_send) will:
- Send a D1=data[1], D2=data[3..2], D4=data[7..4], D4=data[11..8], D4=data[15..12], D1=data[16] stream.
- This is very inefficient use of bandwidth.
The CoreSight 64bit driver (stm_send_64bit) will:
- Send a D8=ZeroExtend(data[7..1]), D8=data[15..8] D8=ZeroExtend(data[16])
- This function only ever sends D8 packets.
- There is no way for the decoder to work out what the original data was.
I think this function is only called from within the generic driver [3], though. It looks like stm_write() calls it a chunk at a time, with a chunk being at most 4/8 bytes, depending on the capabilities of the STM, and is expecting it to send only a single packet. This means that the code in stm_send/stm_send_64bit to deal with odd sized packets and misaligned addresses looks redundant/wrong.
[3] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/stm/core.c
It looks like there might be support for other sources to link to the driver, but I could only find the stm_console when I looked.
Of course, this is based on my limited understanding of how this is used, and the current generic STM and Intel drivers. It might be that these changes have been agreed and Intel plan to change their driver to match (as I said, I don't generally follow the mailing lists). However, the different 64b and 32b behaviors on the ARM version are weird, and the unaligned pointer handling looks wrong too.
This is the kind of feedback we need. It would be great if you could work with us to make things better. Comments and request for modifications have to be done on the public mailing list. Otherwise people will ask (and rightly so) why things were modified from one version to another. Chunyan will CC you on her next submission - please reply to all.
(The different behaviors on the Intel version isn't my problem. :-)
It might also be that I am reverse engineering the behaviour incorrectly.
The other point (which Will has raised on the mailing lists in the past [4]) is this code:
The documentation is very _unclear_ on this topic. Catalin, Will and I couldn't make sense of it and we decided to leave things as is.
#ifndef CONFIG_64BIT static inline void __raw_writeq(u64 val, volatile void __iomem *addr) { asm volatile("strd %1, %0" : "+Qo" (*(volatile u64 __force *)addr) : "r" (val)); } #undef writeq_relaxed #define writeq_relaxed(v, c)__raw_writeq((__force u64) cpu_to_le64(v), c) #endif
This isn't guaranteed to work on the ARM 32 bit architectures. The STM might receive a 64-bit write, or might receive a pair of 32-bit writes to the two addressed words *in either order*. The upshot is that this is not a valid way of writing to the STM. (The data reordering is a killer.)
Perfect, so it's officially out on 32 architectures. Chunyan, please remove (for 32 bit) in your next release.
The driver appears to use this if there is an STM-500 in an AArch32 system. This is because the code interrogates the STM to decide whether it supports 64-bit accesses. It should either (a) not do so, and refuse 64-bit data if AArch32, or (b) use some property of the system to decide. I would still frown on (b) because the architecture makes it clear that this is UNPREDICTABLE, meaning you're not supposed to rely on it and the device isn't allowed to advertise its behaviour.
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/297379.ht...
I hope this is useful.
With kind regards,
I'm really happy to have received comments on this driver. As mentioned above we would welcome more reviews on the upcoming patchsets.
Thanks, Mathieu
Mike.
Michael Williams Principal Engineer ARM Limited www.arm.com The Architecture For The Digital World IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.