Hello CoreSight team,
I'm trying to bring up TMC-ETR on Xilinx Zynq Ultrascale+ and I ran into some troubles. I hope you may have some ideas on where to look next.
Detailed CoreSight topology of Zynq US+ MPSoC may be found in ug1085-zynq-ultrascale-trm.pdf (easy to google), but to make this discussion easier, I'll try to sketch it below:
[2x C-R5] [4x C-A53] | | [2x ETMs] [4x ETM] | | [Funnel0] [Funnel1] [STM] | | | | [TMC-ETF 4kB] | | | | [--------------------ATB----------------] | [Funnel2] | [TMC-ETF 8 KB] | [Replicator] | | [TMC-ETR] [TPIU]
I can happily use perf to trace Cortex-A53 cores and get trace data from the upmost ETF (the 4kB one). However, I feel like I often get buffer overflows (thanks Mathieu for this hypothesis) overwriting my trace with new data during the session. To overcome this I'd like to use either the second ETF or, preferably, ETR with significantly larger buffer. The problem is, I'm not able to get any trace from ETR.
Observations: 1. It is possible to choose ETR as sink in perf - there is no error and the session starts.
2. There are no CoreSight related errors in dmesg.
3. By examining TMC-ETR memory mapped registers (busybox devmem 0x...) I can see that indeed perf sees the device and configures it properly. I've added some prints around struct etr_buf manipulations in TMC drivers and I can actually see that buffer address and size saved into this structure are programmed into TMC, as the same values appear in its registers. I can also see that the enable bit is set high when tracing starts and low when perf returns.
4. There is never any useful data in AUXTRACE sections of perf.data. When tracing with --per-thread I observe that the size of the section grows significantly the longer I trace: ' ... CoreSight ETM Trace data: size xxx bytes' with xxx exceeding kBytes.
However, all I get is:
0xd60 [0x8]: event: 68 . . ... raw event: size 8 bytes . 0000: 44 00 00 00 00 00 08 00 D.......
0xd60 [0x8]: PERF_RECORD_FINISHED_ROUND
With --all-cpus, I always get ' ... CoreSight ETM Trace data: size 16 bytes' no matter how long the tracing session is.
Interestingly, the data part does not change - it's always the same 8 bytes each time I try using ETR as sink, regardless --per-thread or --all-cpus mode.
5. Each time I print etr_buf contents in tmc_etr_sync_flat_buf() or tmc_etr_sync_sg_buf(), I can see that the buffer, no matter how big, gets only 16 bytes of data on each sync.
I wonder if this issue may point to SMMU issues. I can see in juno-base.dtsi in Linux mainline that the ETR node (and only this one from the CS family) has iommus=< > property pointing to smmu_etr:
etr@20070000 { compatible = "arm,coresight-tmc", "arm,primecell"; reg = <0 0x20070000 0 0x1000>; iommus = <&smmu_etr 0>; ...
I tried to mimic this behaviour on my platform by adding similar reference to the only SMMU node defined in xilinx/zynqmp.dtsi. In my case it's iommus = <&smmu 0xc5>; since there is no dedicated SMMU for ETR (and I don't see it in TRM) and 0xc5 is stream ID calculated from the CoreSight master ID (TRM Chapter 16, Table 16-11). I can see in dmesg that SMMU is enabled and ETR is added to iommu group 0, but this does not change the behaviour. I'd appreciate any suggestions if this direction seem worth further debugging.
Another interesting observation is that I'm actually unable to access anything below the 4k ETF in the topology I sketched. I can't use ETF2 nor STM via sysfs. I wonder if there is some ATB configuration that may be worth checking as well?
I would appreciate any suggestions where to look next.
Thanks and best regards, Wojciech
Hi Wojciech,
On Wed, Mar 20, 2019 at 07:32:06PM +0000, Wojciech Żmuda wrote:
To be honest, I don't have experience for SMMU; but based on Hikey and DB410c, both doesn't connect ETR with SMMU and I can run perf on both them (please note, I did this for 1~2 monthes ago).
I personally think the straightforward method is to use sysfs mode to verify the path from sources to sink and you could dump trace raw data; e.g. for Hikey I use below commands:
echo 1 > /sys/bus/coresight/devices/f6404000.etf/enable_sink echo 1 > /sys/bus/coresight/devices/f659c000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f659d000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f659e000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f659f000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65dc000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65dd000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65de000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65df000.etm/enable_source dd if=/dev/f6404000.etr of=/tmp/etr_raw_data
Using this way, you could firstly confirm if can capture raw data; the purpose for doing this is to verify if clock/power have been configured properly on your platform. If this doesn't work (as you said ETF2 or STM cannot be used via sysfs), I think you should firstly debug for sysfs mode and the next step is to use perf tool.
BTW, could you explain what's the issue for you cannot use ETF2 via sysfs? This is caused by clocks, or dt bindings?
Thanks, Leo Yan
On Thu, Mar 21, 2019 at 11:05:57AM +0800, Leo Yan wrote:
[...]
FWIW, suggest you always to disable CPU idle state when use CoreSight, I remembered up before I captured empty trace data on Hikey if enable CPU idle. You could simply add 'nohlt' in kernel command for this.
Thanks, Leo Yan
Hi Leo,
That's a good clue. I'm leaving the SMMU hypothesis for now then.
I followed this way and here is the result.
I enabled sinks and sources: root@zynq:/sys/bus/coresight/devices# echo 1 > fe970000.etr/enable_sink root@zynq:/sys/bus/coresight/devices# echo 1 > fe950000.etf2/enable_sink root@zynq:/sys/bus/coresight/devices# echo 1 > fe940000.etf1/enable_sink root@zynq:/sys/bus/coresight/devices# echo 1 > fec40000.etm0/enable_source root@zynq:/sys/bus/coresight/devices# echo 1 > fed40000.etm1/enable_source root@zynq:/sys/bus/coresight/devices# echo 1 > fee40000.etm2/enable_source root@zynq:/sys/bus/coresight/devices# echo 1 > fef40000.etm3/enable_source
Then, while trying to consume trace from sinks, only ETF1 (being the topmost ETF with 4k buffer, closest to A53 cores) works:
root@zynq:/sys/bus/coresight/devices# dd if=/dev/fe970000.etr of=/root/trace.bin dd: failed to open '/dev/fe970000.etr': Invalid argument root@zynq:/sys/bus/coresight/devices# dd if=/dev/fe950000.etf2 of=/root/trace.bin dd: failed to open '/dev/fe950000.etf2': Invalid argument root@zynq:/sys/bus/coresight/devices# dd if=/dev/fe940000.etf1 of=/root/trace.bin 8+0 records in 8+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00134656 s, 3.0 MB/s
Trace.bin is not filled with zeros. I didn't bother decoding it, but on the first glance it looks like a valid trace. This is the same topology that works for me with perf.
Further examination showed me an interesting thing. I checked sysfs-mapped registers of ETFs, ETR and funnels:
root@zynq:/sys/bus/coresight/devices# cat fe970000.etr/mgmt/ctl 0x0 root@zynq:/sys/bus/coresight/devices# cat fe950000.etf2/mgmt/ctl 0x0 root@zynq:/sys/bus/coresight/devices# cat fe940000.etf1/mgmt/ctl 0x1 root@zynq:/sys/bus/coresight/devices# cat fe920000.funnel1/funnel_ctrl 0x701 root@zynq:/sys/bus/coresight/devices# cat fe930000.funnel2/funnel_ctrl 0x300
If I understand this correctly, ETF2 (8kB), ETR and funnel2 (which routes trace to ETF2 and ETR) are disabled, while ETF1 and funnel1 (which routes trace to ETF1) are enabled. Interestingly, I can't enable funnel2, neither with sysfs or direct memory access (I have kernel with CONFIG_STRICT_DEVMEM=n, tried that from u-boot as well):
root@zynq:/sys/bus/coresight/devices# echo 0x301 > fe930000.funnel2/funnel_ctrl -bash: fe930000.funnel2/funnel_ctrl: Permission denied root@zynq:/sys/bus/coresight/devices# busybox devmem 0xfe930000 32 0x00000300 root@zynq:/sys/bus/coresight/devices# busybox devmem 0xfe930000 32 0x301 root@zynq:/sys/bus/coresight/devices# busybox devmem 0xfe930000 32 0x00000300
My current hypothesis is that I'm not able to get trace, because these devices are disabled. I have no idea at the moment what may cause it. If my understanding of TRM is correct, All these devices, working and not-working ones, are in the same full-power domain. Also, I suspect, if they were powered off, I would get 0x00s or 0xffs trying to read them. Leo, have you ever witnessed something similar on your boards?
I tried everything above with 'nohlt' added to kernel arguments. I don't think this was relevant though, because I was using kernel with CPU_IDLE=n before.
BTW, could you explain what's the issue for you cannot use ETF2 via sysfs? This is caused by clocks, or dt bindings?
I said it a bit imprecisely before. I meant that I'm not able to use ETF2 in perf, since it gives me empty trace, just like ETR does. I have troubles getting STM to work via sysfs, however I suspect I may be making some mistakes regarding setting up policies. I'd leave that for later anyway.
Leo, what do you think of releasing my changes to Zynq Ultrascale+ DTS on this list as RFC? Perhaps it would encourage community to help with bringing up CoreSight on this platform, since some of the topology is already working.
Thank you and best regards, Wojciech
Hi,
On Thu, 21 Mar 2019 at 18:21, Wojciech Żmuda wzmuda@n7space.com wrote:
OK - please try enabling _one_ sink only. I believe that the CoreSight infrastructure will enable a path from source to enabled sink when you enable the first source. The issue with enabling ETF1 as a sink is that it will not forward any data to the other enabled sinks further down the line.
In theory, if you enable ETF2, then enable an ETM source, then the coresight drivers will enable ETM1 as a hardware FIFO link, _not_ as a sink so that any data passes through.
I recommend testing with each sink enabled separately to see if you get trace. Remember to disable everything you enabled between tests.
Regards
Mike
On Fri, 22 Mar 2019 at 09:52, Mike Leach mike.leach@linaro.org wrote:
Mike is correct - the framework will use the first enabled sink that is encoutered on a path. As such if an ETF is encoutered on a path but hasn't been enabled it will be configured as a FIFO.
Mathieu.
Hi Mike and Mathieu,
Ok, that's something I was not aware of. With this approach I'm getting slightly better results - I can access ETF2 via dd now:
root@zynq:/sys/bus/coresight/devices# echo 1 > fe950000.etf2/enable_sink root@zynq:/sys/bus/coresight/devices# echo 1 > fec40000.etm0/enable_source root@zynq:/sys/bus/coresight/devices# echo 1 > fed40000.etm1/enable_source root@zynq:/sys/bus/coresight/devices# echo 1 > fee40000.etm2/enable_source root@zynq:/sys/bus/coresight/devices# echo 1 > fef40000.etm3/enable_source root@zynq:/sys/bus/coresight/devices# dd if=/dev/fe950000.etf2 of=/root/trace.bin 0+1 records in 0+1 records out 16 bytes copied, 0.00126131 s, 12.7 kB/s
The trace buffer is only 16-bytes long and filled with zeros (except that one byte):
root@zynq:/sys/bus/coresight/devices# hexdump /root/trace.bin 0000000 0001 0000 0000 0000 0000 0000 0000 0000 0000010
Interestingly, internal buffer of ETF2 is set to 8kB (RSZ register): root@zynq:/sys/bus/coresight/devices# busybox devmem 0xfe950004 32 0x00000800
On the bright side, I can see that the components on the way are actually enabled, Including funnels I saw disabled yesterday (which seems perfectly clear now): root@zynq:/sys/bus/coresight/devices# cat fe940000.etf1/mgmt/ctl 0x1 root@zynq:/sys/bus/coresight/devices# cat fe950000.etf2/mgmt/ctl 0x1 root@zynq:/sys/bus/coresight/devices# cat fe930000.funnel2/funnel_ctrl 0x702 root@zynq:/sys/bus/coresight/devices# cat fe920000.funnel1/funnel_ctrl 0x70f
The same happens when I try ETR instead of ETF2 (after power off/power on cycle).
I've also did an experiment with checking registers of ETR while it was selected as sink in perf. I examined RRP and RWP registers while perf was tracing and I could observed these pointers changing. It looks like the component is working - it's just the trace that's empty.
So, framework-wise everything seems fine. It smells like memory access issue maybe?
Thank you for joining the discussion. I'll appreciate any hints that may bump into your heads.
Best regards, Wojciech
Hello Wojciech,
On Fri, 22 Mar 2019 at 18:06, Wojciech Żmuda wzmuda@n7space.com wrote:
The trace sink hardware is responsible for generating the coresight formatted 16 byte frames - on halt it will pad out to form a final 16 byte frame. This output is what happens when the device has no trace for the final fram.
The size register on an ETB indicates the internal size of the buffer - not the amount of data in the buffer. It will be constant.
This is wrong - input port 1 is enabled here - this is the input from the 2x R5 sources via funnel 0. This suggests that the interconnections or attributes defined in your device tree file have an error.
Regards
Mike
Hi Mike
Hi Mike,
That was it. My funnel2-ETF1 connection was specified like this:
port@1 { reg = <0x1>; funnel2_in_port1: endpoint { remote-endpoint = <&etf1_out_port>; }; };
while it should be:
port@2 { reg = <0x2>; funnel2_in_port2: endpoint { remote-endpoint = <&etf1_out_port>; }; };
With this change applied I can get trace from both ETF2 and ETR, with both sysfs and perf interfaces. Thank you very much!
May I ask, how did you know that funnel0 output is connected specifically to funnel2 input 1? Zynq TRM shows only topology diagram and memory mapped addresses, without details about input/output ports numbers. Is there any programmatic way to discover it, or did I miss something?
Best regards, Wojciech
Hi Wojciech,
On Mon, 25 Mar 2019 at 14:30, Wojciech Żmuda wzmuda@n7space.com wrote:
Given the issues you were seeing it was likely that a routing problem was the cause of the lack of trace.
Therefore, I looked at the topology information generated by ARM's debug tools which auto-detect targets - we have in the past connected an external JTAG debugger to a target with the same type of SoC.
Auto-detection is not practical in an on-target environment as it can result in an unusable target, and is not always successful if the device manufacturer has not interconnected certain topology signals. With external JTAG debuggers, we can autodetect then power cycle - and in this case sufficient information had been detected to determine the connection ports of the funnel.
Regards
Mike
Just to add to this. On-target auto-detection of topology is definitely not practical or recommended as a standard way for the kernel to discover CoreSight configuration at boot. I.e. it is not an alternative to ACPI or DTS. But it is often a feasible option in a 'lab' situation, if you haven't got JTAG, and where you are happy to reset the device. So it's actually been quite useful in getting us started on devices which don't have an accessible JTAG port.
Topology auto-detection (whether by on-target programming or by JTAG) involves putting devices in a special "integration" mode and although the CoreSight spec says you should do a system reset after this, in my experience it's worth trying to just revert back to production mode, as assuming you've set registers to sensible values, more often than not it just works.
CSAL has some code for on-target ATB topology detection:
https://github.com/ARM-software/CSAL/blob/master/experimental/cs_topology_de...
The kernel mustn't be trying to use CoreSight at the same time. If you haven't yet got a DTS/ACPI configuration, hopefully the kernel won't be using CoreSight at all, but it's something to bear in mind if you have got a configuration and are using auto-detect to check it's complete and correct. Ideally we would have a way to tell the kernel to quiesce and relinquish all use of CoreSight.
Al