Hi Vincent,
PTM and ETMv4 support what is called program flow trace - where trace elements (E/N atoms) are only output on potential changes in program flow - primarily branch instructions - unlike ETMV3.x which outputs an atom per instruction. This does not mean that the intermediate instructions have not been traced / execute, its just the execution is implied rather than explicitly stated.
Looking at the first few packets:
ADDRESS AND CONTEXT: addr = 0xffffff80080fbfc4 EL = 0x1 SF = 0x1 NS = NON_SECURE V = 0x0 VMID = 0x0 CONTEXT_ID = 0x0 <00f4:85 71 5f 0f 08 80 ff ff ff 31 >
This is the start address of the traced section of the program
ATOM packet format 2: ATOMS: N, E, <00fe:da >
These are a couple of branch elements - one branch not taken, one branch taken
The decoder will then get the program image for the code loaded at 0xffffff80080fbfc4 and examine consecutive opcodes till one matches a branch instruction (or other program flow element)
<dequeue_entity+0xa94> ffffff80080fbfc4: b9401b40 ffffff80080fbfc8: 8b204301 ffffff80080fbfcc: f10ffc3f N ffffff80080fbfd0: 54002028 b.hi ffffff80080fc3d4 <dequeue_entity+0xea4>
Here the decoder walks four instructions and finds the B.HI instruction, which it associates the N atom with. This it can disassemble to calculate the direct target address, so this address is not output in the trace.
The previous three instructions are implied as executed as well. This trace client decoder is simply not providing any disassembly for them.
This is not taken so decode continues with the next instruction.
ffffff80080fbfd4: 9b1c7f04 ffffff80080fbfd8: 2a1803ea ffffff80080fbfdc: 52800001 ffffff80080fbfe0: d34afc87 E ffffff80080fbfe4: b5003519 cbnz x25, ffffff80080fc684 <dequeue_entity+0x1154>
Walk 5 instructions to find an indirect branch that is taken, associates the E atom. Again all 5 instructions have been executed. The branch target cannot be calculated from the instruction, so the trace has to output the target address:
ADDRESS AND CONTEXT:L_64_IS0 addr = 0xffffff80080fc684 <00ff:9d 21 63 0f 08 80 ff ff ff >
This is then used to determine where trace continues from - 0xffffff80080fc684
And so on......
The OpenCSD library itself does no disassembly - other than that necessary detection of key instructions and calculation of target addresses. It will take in the raw trace and the program image supplied by a client (e.g. perf) and output information that the client can use for further processing.
e.g. for this snippet the OpenCSD output would be something like: ADDR_RANGE( ffffff80080fbfc4- ffffff80080fbfd0, N,) ADDR_RANGE( ffffff80080fbfd4-ffffff80080fbfe4, E)
perf then takes this and processes it accordingly - depending on the type of information requested on the command line (flame graphs, disassembly, etc).
A full featured GUI debugger might use an .elf file to show disassembly and C sources lines for the same trace.
On Mon, 10 Feb 2025 at 17:46, vincent.ernst@web.de vincent.ernst@web.de wrote:
Hi Mike,
I was actually able to set up the Nvidia driver, collect some trace data and decode it with Nvidia's mem_parser (see attachment). Comparing this to the trace output in the OpenCSD how-to, it seems to me that the ETMs on my device only support branch instruction trace and do not trace every instruction. Or does using perf + OpenCSD provide additional trace information compared to what I am getting right now?
The trace information is all that is necessary and sufficient to reconstruct the execution of the program. How much of that information is used is a function of the decoder / client program.
If not, trying to backport perf would not be necessary for me anymore.
We discussed the versions of kernel that supported coresight in one of our internal meetings this week. A colleague pointed out that NVidia appear to have released a 5.15 base kernel for Jetson.
Regards
Mike
Regards, Vincent
From: Mike Leach mike.leach@linaro.org Sent: Tuesday, February 4, 2025 12:29 To: V E vincent.ernst@web.de Cc: coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: CoreSight on Nvidia Jetson
Hi Vincent,
On Fri, 31 Jan 2025 at 23:47, V E vincent.ernst@web.de wrote:
Hi Mike,
Mike Leach wrote:
Hi Vincent On Thu, 23 Jan 2025 at 15:48, vincent.ernst@web.de vincent.ernst@web.de wrote:
Hi Mike, Mike Leach wrote: Hi, On Tue, 21 Jan 2025 at 21:29, V E vincent.ernst@web.de wrote: Hi Mike, Mike Leach wrote: HI Vincent On Mon, 20 Jan 2025 at 09:33, V E vincent.ernst@web.de wrote: Hi Mike, Mike Leach wrote: Hi VIncent On Wed, 15 Jan 2025 at 09:11, V E vincent.ernst@web.de wrote: Hi Mike, I was able to initialise the ETMs by replacing the hex phandle references with labels. For the funnels, I used "arm,coresight-funnel" like specified in the bindings of kernel 4.9, which seems to fix it. When enabling the sinks and sources via sysfs, I get the expected messages in dmesg. There are no connections folders, though. OK thats good. The connections information was added sometime in the kernel 5.x series. So registering the CoreSight devices seems to works, but I guess that there is only limited tracing functionality because the drivers are so old? sysfs tracing should be OK - this is no more than switching on a source to trace into a sink and collect data. I tried collecting the trace data via "dd if=/dev/72030000.etf of=~/trace.bin" after turning the devices on and off again, but all I get is an empty file: 0+1 records in 0+1 records out 64 bytes copied, 0.000981324 s, 65.2 kB/s Do you have an idea what I can do to fix this? In general that would suggest a break in the path between the ETM source and ETF sink - hence no trace reaching the sink, or the CPU source is idle so nothing being generated. Ensure that there is a clear path in your DTS from the chosen source to the sink. When you enable the source, the coresight code will follow the connections looking for an active sink. I am pretty sure that the connections are correct. When I echo 1 into enable_sink of the ETF and enable_source of the ETM, dmesg shows that the path was activated: [ 3171.085440] coresight-tmc 72030000.etf: TMC-ETB/ETF enabled [ 3171.085455] coresight-funnel 72010000.funnel_major: FUNNEL inport 2 enabled [ 3171.085468] coresight-funnel 73010000.funnel_ccplex: FUNNEL inport 1 enabled [ 3171.085662] coresight-etm4x 73440000.etm0: ETM tracing enabled It is ETM - Funnel_CCPLEX - Funnel_Major - ETF as expected. And it also shows that the ETF is read: [ 3716.865028] coresight-tmc 72030000.etf: TMC read start [ 3716.867437] coresight-tmc 72030000.etf: TMC read end Is there probably something else that I have to do before activating the devices or between activation and reading of the ETF? Is it possible that there are target specific things not related to default coresight that might need to be done - clocks / power / permissions / etc? I am not sure about that, I couldn't find anything regarding that. The documentation on the Nvidia side is very sparse, basically just the instructions that I linked earlier and some general information on the implemented CoreSight devices in the SoC manual. I stumbled across two files in the Nvidia kernel sources which seem to be Nvidias own approach to implementing a CoreSight driver (see attachments). tegra210_ptm.c compiles for kernel 4.9 after a few small adjustments, but I am still looking if there is a way to use it. Nvidia mentions some of the registers defined in the two files in the SoC manual (I can provide it if you want to have a look at it), so it might be possible that they stuck to their own implementation. Could it be that the standard CoreSight drivers do not work because Nvidia implements their own one? Where you may suffer is if you try to use perf. We have made a lot of enhancements and bugfixes on how perf processes trace data since 4.9 Do you know if OpenCSD is already supported for my perf version? I was able to get some perf.data, but ptm2human does not seem to work. I am not that comfortable with these tools yet, though. As far as I know, ptm2human will only work on raw trace buffers - i.e. those from the sysfs dumps above. perf.data contains a whole lot of records, only a few of which are related to trace. You only need OpenCSD for decode. Generally I ensure that my target version of perf supports the cs_etm device for collection, and build a host version of perf with OpenCSD for off target decode. I would like to do on-target tracing (preferable in real-time), so this is not a solution for me, unfortunately. Do you think this is possible? The solution above is on target capture, with off target decode. The reason I generally do this is that trace files tend to be very large and decode is processor intensive. If you build a version of on target perf that contains OpenCSD, then on target decode is possible. Not quite sure by what you mean by "real-time" here, but the rate at which trace is generated will always far exceed the rate at which it can be decoded. With real-time I mean (permanent) collecting, decoding and analyzing of the trace on the target while the CoreSight devices keep running. The tools I know of - including perf - use a serial collect -> decode
-> analyze cycle - with collection stopping before decode starts. In general, the trace hardware needs to stop, before the data can be extracted from the collection buffer. perf may start and stop the trace hardware a number of times during a trace session, accumulating the data in a kernel memory buffer firstly, then writing to user space into the perf.data file.
Alright, thanks for the explanation.
There are options in the perf makefiles to pull in the OpenCSD library - this is not done by default so you will probably have to build your own version of perf - at least for decode. Is this another approach to using OpenCSD than the one explained in the GitHub HOWTO? No, this is the same approach. The HowTo describes compiling perf with OpenCSD included as a library - though I probably need to check that the method is up to date. It would be really nice to know if it is. The sample trace bundle from section "Trace Decoding with Perf Report", however, is not (I get a 404 when trying to download the .tgz). The .tgz was probably removed when we removed the non-upstreamed
kernel drivers from the OpenCSD repository. These where there as early examples during development, but dropped once the initial drivers were upstreamed.
Using perf to decode the trace is generally the only way to decode trace captured using perf, as the perf.data file is a perf specific format, which importantly contains additional records that associate running program images with captured trace - the only way to get full decode. I followed the build steps in the howto, but it seems like kernel v4.9 perf is too old to support OpenCSD. The build completes without errors, but OpenCSD is nowhere to be found. In general, opencsd needs to be installed separately and enabled in
your perf build. Looking at the history of this, the drivers and perf tooling were first upstreamed in kernel 4.9, so as long as you perf source has the necessary etm decode source that uses OpenCSD, you should not have an issue building a perf version with it in.
Are you sure that OpenCSD works with kernel 4.9 perf? I think that OpenCSD wasn't introduced to the kernel until version 4.16 (with this commit: https://github.com/torvalds/linux/commit/aa6292f4845e7921fca60b146403ea6682b...). So this means that I would have to backport perf from >=4.16 (preferably a much more recent version) to 4.9 to make perf with OpenCSD work on my device. Do you think this is feasible? If so, which version would you recommend?
I was looking at the coresight drivers / perf record collection mechanism which upstreamed at least in part in commit a818c563ae16 - around the 4.9 timeframe. (adds in tools/perf/arch/arm/util/cs-etm.c etc that allow collection to perf data file of trace records). As you point out the tooling to decode was upstreamed later. I assume we were using none upstreamed versions of perf to look at trace before 4.16, as the tooling was developed and eventually upstreamed as ready for production. As I recall, we were at one time hosing a copy of the kernel tree in the OpenCSD project to allow others to try out the latest trace developments, which was dropped once a fully upstream solution was ready.
Now the actual format of the ETM AUXTRACE sections in the perf.data file may not have changed between 4.9 and 4.16 so it is worth trying to collect trace data with a 4.9 version of perf, and decoding with a version of perf from 4.16.
If you do backport, I'd keep the versions as close together as possible to minimise possible perf.data format changes.
Mike
Tools like ptm2human will only tell you the trace packets generated. See the docs in OpenCSD and the ETM trm for more detail You mean the E/N-Atoms etc., right? Correct - this lists the packets - but as far as I know will not fully
decode the trace as the only address packets in the trace stream are those that cannot be deduced from the memory image of the executed code. The same list of packets is generated by OpenCSD in perf using --dump mode. Full decode uses a combination of packets and program images, as recorded in the perf.data file to obtain a complete list of executed instructions. If you want to see some example output, OpenCSD has a test program trc_pkt_lister, which can be used to decode the examples provided in the tests directory of the OpenCSD project.
Thanks for the hint, I gave it a try. But I need the fully decoded trace, so ptm2human is no solution for me.
Did you have a look at the two files from Nvidia that I attached? I would really like to hear your thoughts about that.
A brief glance suggests this is a single platform driver to program up the entire coresight infrastructure for a specific SoC.
Not the approach used upstream where we have individual component drivers initialised as required by device-tree / ACPI
Regards
Mike
Kind regards,
Vincent
Regards Mike Regards Mike
Best regards, Vincent Regards Mike Thanks for the help Vincent Regards Mike Unfortunately, Nvidia's Linux for my Jetson Nano ("Linux for Tegra") only supports kernel 4.9. I guessed this might be the case There exist community solutions for getting a version 5 kernel running (on Ubuntu), but I would prefer keeping the standard Linux for Tegra if possible. What surprises me is that ...0000.etf/status (shown here [1]) does not exist on my device and I also couldn't find it in the ABI documentation. Do you have an idea what that is? This looks like a printout of many of the ETF registers in a single sysfs file. e.g. RRP, RWP, STATUS etc. Having multiple output lines for a single sysfs file is something that the upstream maintainers frown upon so has likely been dropped. You should still be able to see the same output by reading each individual register Yes, this is the case, thanks for the info. Kind regards, Vincent Regards Mike Best regards, Vincent [1] https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3275/index.html#pag... _______________________________________________ CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org -- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org -- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org -- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
From: Mike Leach mike.leach@linaro.org Sent: Wednesday, January 22, 2025 15:52 To: V E vincent.ernst@web.de Cc: coresight@lists.linaro.org coresight@lists.linaro.org Subject: Re: CoreSight on Nvidia Jetson Hi, On Tue, 21 Jan 2025 at 21:29, V E vincent.ernst@web.de wrote: Hi Mike, Mike Leach wrote: HI Vincent On Mon, 20 Jan 2025 at 09:33, V E vincent.ernst@web.de wrote: Hi Mike, Mike Leach wrote: Hi VIncent On Wed, 15 Jan 2025 at 09:11, V E vincent.ernst@web.de wrote: Hi Mike, I was able to initialise the ETMs by replacing the hex phandle references with labels. For the funnels, I used "arm,coresight-funnel" like specified in the bindings of kernel 4.9, which seems to fix it. When enabling the sinks and sources via sysfs, I get the expected messages in dmesg. There are no connections folders, though. OK thats good. The connections information was added sometime in the kernel 5.x series. So registering the CoreSight devices seems to works, but I guess that there is only limited tracing functionality because the drivers are so old? sysfs tracing should be OK - this is no more than switching on a source to trace into a sink and collect data. I tried collecting the trace data via "dd if=/dev/72030000.etf of=~/trace.bin" after turning the devices on and off again, but all I get is an empty file: 0+1 records in 0+1 records out 64 bytes copied, 0.000981324 s, 65.2 kB/s Do you have an idea what I can do to fix this? In general that would suggest a break in the path between the ETM source and ETF sink - hence no trace reaching the sink, or the CPU source is idle so nothing being generated. Ensure that there is a clear path in your DTS from the chosen source to the sink. When you enable the source, the coresight code will follow the connections looking for an active sink. I am pretty sure that the connections are correct. When I echo 1 into enable_sink of the ETF and enable_source of the ETM, dmesg shows that the path was activated: [ 3171.085440] coresight-tmc 72030000.etf: TMC-ETB/ETF enabled [ 3171.085455] coresight-funnel 72010000.funnel_major: FUNNEL inport 2 enabled [ 3171.085468] coresight-funnel 73010000.funnel_ccplex: FUNNEL inport 1 enabled [ 3171.085662] coresight-etm4x 73440000.etm0: ETM tracing enabled It is ETM - Funnel_CCPLEX - Funnel_Major - ETF as expected. And it also shows that the ETF is read: [ 3716.865028] coresight-tmc 72030000.etf: TMC read start [ 3716.867437] coresight-tmc 72030000.etf: TMC read end Is there probably something else that I have to do before activating the devices or between activation and reading of the ETF? Is it possible that there are target specific things not related to default coresight that might need to be done - clocks / power / permissions / etc? Where you may suffer is if you try to use perf. We have made a lot of enhancements and bugfixes on how perf processes trace data since 4.9 Do you know if OpenCSD is already supported for my perf version? I was able to get some perf.data, but ptm2human does not seem to work. I am not that comfortable with these tools yet, though. As far as I know, ptm2human will only work on raw trace buffers - i.e. those from the sysfs dumps above. perf.data contains a whole lot of records, only a few of which are related to trace. You only need OpenCSD for decode. Generally I ensure that my target version of perf supports the cs_etm device for collection, and build a host version of perf with OpenCSD for off target decode. I would like to do on-target tracing (preferable in real-time), so this is not a solution for me, unfortunately. Do you think this is possible? The solution above is on target capture, with off target decode. The reason I generally do this is that trace files tend to be very large and decode is processor intensive. If you build a version of on target perf that contains OpenCSD, then on target decode is possible. Not quite sure by what you mean by "real-time" here, but the rate at which trace is generated will always far exceed the rate at which it can be decoded. There are options in the perf makefiles to pull in the OpenCSD library - this is not done by default so you will probably have to build your own version of perf - at least for decode. Is this another approach to using OpenCSD than the one explained in the GitHub HOWTO? No, this is the same approach. The HowTo describes compiling perf with OpenCSD included as a library - though I probably need to check that the method is up to date. Using perf to decode the trace is generally the only way to decode trace captured using perf, as the perf.data file is a perf specific format, which importantly contains additional records that associate running program images with captured trace - the only way to get full decode. Tools like ptm2human will only tell you the trace packets generated. See the docs in OpenCSD and the ETM trm for more detail Regards Mike Thanks for the help Vincent Regards Mike Unfortunately, Nvidia's Linux for my Jetson Nano ("Linux for Tegra") only supports kernel 4.9. I guessed this might be the case There exist community solutions for getting a version 5 kernel running (on Ubuntu), but I would prefer keeping the standard Linux for Tegra if possible. What surprises me is that ...0000.etf/status (shown here [1]) does not exist on my device and I also couldn't find it in the ABI documentation. Do you have an idea what that is? This looks like a printout of many of the ETF registers in a single sysfs file. e.g. RRP, RWP, STATUS etc. Having multiple output lines for a single sysfs file is something that the upstream maintainers frown upon so has likely been dropped. You should still be able to see the same output by reading each individual register Yes, this is the case, thanks for the info. Kind regards, Vincent Regards Mike Best regards, Vincent [1] https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3275/index.html#pag... _______________________________________________ CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org -- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org -- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org -- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK --
Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK