Hi Wojciech,
On Wed, Mar 20, 2019 at 07:32:06PM +0000, Wojciech Żmuda wrote:
Hello CoreSight team,
I'm trying to bring up TMC-ETR on Xilinx Zynq Ultrascale+ and I ran into some troubles. I hope you may have some ideas on where to look next.
Detailed CoreSight topology of Zynq US+ MPSoC may be found in ug1085-zynq-ultrascale-trm.pdf (easy to google), but to make this discussion easier, I'll try to sketch it below:
[2x C-R5] [4x C-A53] | | [2x ETMs] [4x ETM] | | [Funnel0] [Funnel1] [STM] | | | | [TMC-ETF 4kB] | | | | [--------------------ATB----------------] | [Funnel2] | [TMC-ETF 8 KB] | [Replicator] | | [TMC-ETR] [TPIU]
I can happily use perf to trace Cortex-A53 cores and get trace data from the upmost ETF (the 4kB one). However, I feel like I often get buffer overflows (thanks Mathieu for this hypothesis) overwriting my trace with new data during the session. To overcome this I'd like to use either the second ETF or, preferably, ETR with significantly larger buffer. The problem is, I'm not able to get any trace from ETR.
Observations:
It is possible to choose ETR as sink in perf - there is no error and the session starts.
There are no CoreSight related errors in dmesg.
By examining TMC-ETR memory mapped registers (busybox devmem 0x...) I can see that
indeed perf sees the device and configures it properly. I've added some prints around struct etr_buf manipulations in TMC drivers and I can actually see that buffer address and size saved into this structure are programmed into TMC, as the same values appear in its registers. I can also see that the enable bit is set high when tracing starts and low when perf returns.
- There is never any useful data in AUXTRACE sections of perf.data. When tracing with
--per-thread I observe that the size of the section grows significantly the longer I trace: ' ... CoreSight ETM Trace data: size xxx bytes' with xxx exceeding kBytes.
However, all I get is:
0xd60 [0x8]: event: 68 . . ... raw event: size 8 bytes . 0000: 44 00 00 00 00 00 08 00 D.......
0xd60 [0x8]: PERF_RECORD_FINISHED_ROUND
With --all-cpus, I always get ' ... CoreSight ETM Trace data: size 16 bytes' no matter how long the tracing session is.
Interestingly, the data part does not change - it's always the same 8 bytes each time I try using ETR as sink, regardless --per-thread or --all-cpus mode.
- Each time I print etr_buf contents in tmc_etr_sync_flat_buf() or tmc_etr_sync_sg_buf(),
I can see that the buffer, no matter how big, gets only 16 bytes of data on each sync.
I wonder if this issue may point to SMMU issues. I can see in juno-base.dtsi in Linux mainline that the ETR node (and only this one from the CS family) has iommus=< > property pointing to smmu_etr:
etr@20070000 { compatible = "arm,coresight-tmc", "arm,primecell"; reg = <0 0x20070000 0 0x1000>; iommus = <&smmu_etr 0>; ...
I tried to mimic this behaviour on my platform by adding similar reference to the only SMMU node defined in xilinx/zynqmp.dtsi. In my case it's iommus = <&smmu 0xc5>; since there is no dedicated SMMU for ETR (and I don't see it in TRM) and 0xc5 is stream ID calculated from the CoreSight master ID (TRM Chapter 16, Table 16-11). I can see in dmesg that SMMU is enabled and ETR is added to iommu group 0, but this does not change the behaviour. I'd appreciate any suggestions if this direction seem worth further debugging.
To be honest, I don't have experience for SMMU; but based on Hikey and DB410c, both doesn't connect ETR with SMMU and I can run perf on both them (please note, I did this for 1~2 monthes ago).
Another interesting observation is that I'm actually unable to access anything below the 4k ETF in the topology I sketched. I can't use ETF2 nor STM via sysfs. I wonder if there is some ATB configuration that may be worth checking as well?
I personally think the straightforward method is to use sysfs mode to verify the path from sources to sink and you could dump trace raw data; e.g. for Hikey I use below commands:
echo 1 > /sys/bus/coresight/devices/f6404000.etf/enable_sink echo 1 > /sys/bus/coresight/devices/f659c000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f659d000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f659e000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f659f000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65dc000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65dd000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65de000.etm/enable_source echo 1 > /sys/bus/coresight/devices/f65df000.etm/enable_source dd if=/dev/f6404000.etr of=/tmp/etr_raw_data
Using this way, you could firstly confirm if can capture raw data; the purpose for doing this is to verify if clock/power have been configured properly on your platform. If this doesn't work (as you said ETF2 or STM cannot be used via sysfs), I think you should firstly debug for sysfs mode and the next step is to use perf tool.
BTW, could you explain what's the issue for you cannot use ETF2 via sysfs? This is caused by clocks, or dt bindings?
Thanks, Leo Yan