Hello CoreSight team,
I'm trying to bring up TMC-ETR on Xilinx Zynq Ultrascale+ and I ran into some troubles. I hope you may have some ideas on where to look next.
Detailed CoreSight topology of Zynq US+ MPSoC may be found in ug1085-zynq-ultrascale-trm.pdf (easy to google), but to make this discussion easier, I'll try to sketch it below:
[2x C-R5] [4x C-A53] | | [2x ETMs] [4x ETM] | | [Funnel0] [Funnel1] [STM] | | | | [TMC-ETF 4kB] | | | | [--------------------ATB----------------] | [Funnel2] | [TMC-ETF 8 KB] | [Replicator] | | [TMC-ETR] [TPIU]
I can happily use perf to trace Cortex-A53 cores and get trace data from the upmost ETF (the 4kB one). However, I feel like I often get buffer overflows (thanks Mathieu for this hypothesis) overwriting my trace with new data during the session. To overcome this I'd like to use either the second ETF or, preferably, ETR with significantly larger buffer. The problem is, I'm not able to get any trace from ETR.
Observations: 1. It is possible to choose ETR as sink in perf - there is no error and the session starts.
2. There are no CoreSight related errors in dmesg.
3. By examining TMC-ETR memory mapped registers (busybox devmem 0x...) I can see that indeed perf sees the device and configures it properly. I've added some prints around struct etr_buf manipulations in TMC drivers and I can actually see that buffer address and size saved into this structure are programmed into TMC, as the same values appear in its registers. I can also see that the enable bit is set high when tracing starts and low when perf returns.
4. There is never any useful data in AUXTRACE sections of perf.data. When tracing with --per-thread I observe that the size of the section grows significantly the longer I trace: ' ... CoreSight ETM Trace data: size xxx bytes' with xxx exceeding kBytes.
However, all I get is:
0xd60 [0x8]: event: 68 . . ... raw event: size 8 bytes . 0000: 44 00 00 00 00 00 08 00 D.......
0xd60 [0x8]: PERF_RECORD_FINISHED_ROUND
With --all-cpus, I always get ' ... CoreSight ETM Trace data: size 16 bytes' no matter how long the tracing session is.
Interestingly, the data part does not change - it's always the same 8 bytes each time I try using ETR as sink, regardless --per-thread or --all-cpus mode.
5. Each time I print etr_buf contents in tmc_etr_sync_flat_buf() or tmc_etr_sync_sg_buf(), I can see that the buffer, no matter how big, gets only 16 bytes of data on each sync.
I wonder if this issue may point to SMMU issues. I can see in juno-base.dtsi in Linux mainline that the ETR node (and only this one from the CS family) has iommus=< > property pointing to smmu_etr:
etr@20070000 { compatible = "arm,coresight-tmc", "arm,primecell"; reg = <0 0x20070000 0 0x1000>; iommus = <&smmu_etr 0>; ...
I tried to mimic this behaviour on my platform by adding similar reference to the only SMMU node defined in xilinx/zynqmp.dtsi. In my case it's iommus = <&smmu 0xc5>; since there is no dedicated SMMU for ETR (and I don't see it in TRM) and 0xc5 is stream ID calculated from the CoreSight master ID (TRM Chapter 16, Table 16-11). I can see in dmesg that SMMU is enabled and ETR is added to iommu group 0, but this does not change the behaviour. I'd appreciate any suggestions if this direction seem worth further debugging.
Another interesting observation is that I'm actually unable to access anything below the 4k ETF in the topology I sketched. I can't use ETF2 nor STM via sysfs. I wonder if there is some ATB configuration that may be worth checking as well?
I would appreciate any suggestions where to look next.
Thanks and best regards, Wojciech