Hi,
Please ensure you reply to the list as well - this gives you a better chance of getting a timely response & my find people who know more about CSAL.
On Wed, 21 Apr 2021 at 20:36, Dominik Huber dominik.huber@fau.de wrote:
Hello, Thanks for all the help. Especially the explanation of the trace was very insightful. For now, I am going to try implementing the sysfs tracing, since it seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini. I've read in the discovery.md from CSAL that they can be extracted by reading the ROM table of my Cortex-A53, but I'm doing/understanding something wrong. I cross-compiled the CSAL library and the csinfo-folder (which btw only compiled after inserting "MODULE_LICENSE("");" into the csinfo.c file). Then I copied the resulting csinfo.ko to my Hikey620 where I tried "sudo make" and it returned: Load CoreSight reporting module... expect "Resource temporarily unavailable" insmod csinfo.ko insmod: ERROR: could not insert module csinfo.ko: Resource temporarily unavailable Makefile:17: recipe for target 'load' failed make: *** [load] Error 1 If I understood correctly, that is intended, but should also give me the base address of the ROM-table afterward. But there was no other output. I do have CONFIG_DEVMEM enabled. I tried "sudo insmod csinfo.ko". directly as well, with the same result.
Looking at the code it appears that the output is printk(KERN_INFO ... ) Check the printk level is correct on your system.
I could neither find the physical base address for the Cortex-A53 nor a way to read the "CBAR_EL1" register, which apparently contains that information. At least not without a debugger. Since I don't know the correct address, i just tried "sudo ./csscan.py 0x0", and this was the result: @0x0 0x000 0x000 r0.0 unexpected CIDR: 0x000000f8 class:0 What does this output mean?
As far as I can tell to get valid output from this tool, you need to give it a valid input address / range of addresses. CIDR is the component ID register - this is used by the tool to match to known component values. If it says unexpected then it means just that - the value found is not a recognised CIDR for the tool. See the Coresight Spoecification for how CIDR and other IDR values are useda nd waht is expected.
Regards
Mike
For the address 0xE011000 I got: @0xe011000 0x200 0x000 r0.7 unexpected CIDR: 0x00001a3f ROM table Does this help me? Unfortunately, this is not persistent information, since after a reboot it showed something different. How can I find out the addresses, that I need for the .ini files?
Regards,
Dominik
On 12/04/2021 11:44, Mike Leach wrote:
HI,
On Sat, 10 Apr 2021 at 13:59, Dominik Huber dominik.huber@fau.de wrote:
Am 09.04.2021 um 17:01 schrieb Mike Leach:
Hi Dominik
On Fri, 9 Apr 2021 at 15:22, Dominik Huberdominik.huber@fau.de wrote:
Hello, I want to gather trace data of closed source binaries using CoreSight ETMv4 on a Hikey620. I want to know the source and destination address for all taken jumps of the traced program, like in the output of "perf script". It would be great if I could get feedback on how to achieve this. I'm not sure where to turn to with such a broad CoreSight problem, so I'm sorry if you are not the right ones to turn to, but I'd be happy for any help or advice you might have.
This is exactly the right place to ask for help on this!
I'm glad to hear that!
My main problem is, that I don't know which approaches are promising to try. Below I describe two Ideas that I tried but where I got stuck after a while. Are they any good for my use case? If yes, then how can I solve the respective problems that have come up, or where can I look to solve them? If not, are there maybe better ways to approach this, which I've overlooked until now?
After hearing a presentation from Mathieu Poirier, I thought sysFS was the (only) way to go. However, the decoded trace seems to show only the jump address, instead of both the source and destination addresses, and I did not find a register to change that.
What do you mean by decoded trace here & what are you using to decode the trace? If you look at the ETM spec / OpenCSD documentation you will see that to fully decode trace there is a two step process.
- convert the trace byte stream into trace packets. This will
require some minimal information regarding the configuration of the ETM 2) convert the trace packets into the fully decoded execution trace. This requires access to the binary images executed during the trace session. The reason for this is that trace packets are highly compressed, and contain the minimum of address information. Decode requires that the decoder will walk the binary images to deduce which branches are taken and not taken. Only were address information cannot be deduced from this code is included in the trace packets - and this is only ever target address information - the source address can always be deduced from the code walk.
I used ptm2human for decoding. I also tried the c_api_pkt_print_test.c from OpenCSD, which decodes a single packet. However, depending on how I collected the trace data, it does not always produce decoded data.
ptm2human performs the 1st stage of decode - byte stream to trace packets. The output from this is the same as the packet only decode from the OpenCSD library / trc_pkt_lister app. This will print out the trace packets - but be aware that in both cases the addresses that you see in the trace are not all the branch addresses used in the executed application.
Once I have a working proof-of-concept, I intended to write myself an OpenCSD decoder. Because of that, and because the decoding often just seemed "to work", I postponed any thoughts about the packet processing. What are these "binary images"? Until now, I got only a cstrace.bin from /dev/[sink], which I used as only Input for any decoding. How can I get them? Can I, by using them, obtain the destination addresses of branch instructions?
cstrace.bin is the binary trace data. The binary images I refer to are the memory images of the executed code - the decoder walks this memory image to deduce the path of the executed trace based on the opcodes encountered. These are either the binary files of your application and any loaded .so files, or may be a memory dump of the locations these were loaded during the trace session. Either will do to decode.
See https://github.com/Linaro/OpenCSD/blob/master/decoder/docs/prog_guide/prog_g...
So to successfully decode a trace session to obtain the source and target branch addresses you want, you will need the following:-
- The captured binary trace data.
- The configuration registers of the ETM
- The program binaries for all applications and .so libraries active
during the trace session and their load addresses. Be aware that ETM will trace everything running on a core - so you it may be necessary for any analysis of a particular program to filter out anything unrelated.
Also, the trace gathered seems to lose some of the branch addresses. Inserting a sleep instruction after each regular instruction into my test program, fixed that. But since it should also work for closed source binaries and has to be fast this is probably not an option.
Then I tried to copy the way "perf record" is tracing, and extract the relevant code parts. But then I realized, that perf record doesn't use sysFS, apart from enabling the sink in "util/cs-etm.c" (which apparently is not used, and not even deactivated afterward).
perf uses the driver in the a similar way to sysfs. It does in fact activate and de-activate the sources and sinks as the perf events are run on any CPU. perf also records the binaries used during the trace session - so that full trace decode is possible. With sysfs you can get raw trace data - but relating this to the binaries being executed is far more difficult.
So is it your recommendation to use large parts of the perf source code for my project because it already makes use of these "binary images"?
I was not recommending re-use of perf source code - I was recommending using perf to capture and analyse trace. If this is not sufficient - or you want to write your own application, then perf serves as an example of the information you need to collect during a trace session to successfully decode the trace. perf does two tasks - capture - using perf record, and analysis using perf report. These are very separate elements - the first happens in kernel space, the second, often offline as a separate perf program, in user space.
Up until now, I only thought about using a few hundred lines, because I believe perf has a lot of overhead (both in performance and code length) just to get branch instructions from an executed binary.
The most important factors for me are that
- the trace includes all branches (including the jump destination), and
- it is fast (faster than using hardware emulation and extracting the
addresses from there)
What about the approaches to use the CoreSight driver directly or to use CSAL for trace collection? Would they maybe be better suited?
The best method used for capturing trace is really an assessment for you to decide based on your requirements. CSAL / direct driver access / sysfs are all options - though I am not aware of anyone actually using a direct driver access method so could not really advise here.
Remember that whatever method you choose, in order to get the data that you require you will need to collect the additional information I describe above. Once you have this data you can then pass it to the OpenCSD library for full decode. This will output the executed trace ranges. You are then free to analyse these ranges to synthesise source / target data for branches.
Below is a short example of the trace output from the library - I have annotated this '***' to explain what is happening. This was generated using trc_pkt_lister test program from OpenCSD, running on one of the supplied test captures.
The I_ packets are the trace packets - as would be output from ptm2human, the OCSD_GEN_TRC packets are the output of the library - fully decoded trace ranges.
Idx:356; ID:12; [0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x80 ]; I_ASYNC : Alignment Synchronisation. *** start of trace - align the decoder to the incoming byte stream for the ETM programmed with trace ID 0x12
Idx:369; ID:12; [0x01 0x01 0x00 ]; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 } *** Some setup information regarding trace configuration - used internally by the decoder.
Idx:372; ID:12; [0xf7 ]; I_ATOM_F1 : Atom format 1.; E *** atom packet - skipped - cannot decode until we get an address / context packet
Idx:373; ID:12; [0x85 0x22 0x12 0x4d 0x00 0x00 0x00 0x00 0x00 0x30 ]; I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x00000000004D2488; Ctxt: AArch64,EL0, NS; *** address and context packet - this gives the trace start address for the decoder + EL and ISA context.
Idx:384; ID:12; [0xf7 ]; I_ATOM_F1 : Atom format 1.; E *** Atom packet. This indicates that the program executed a number of instructions until it encountered a P0 element instruction (a P0 element instruction - previously referred to as waypoints in PTM, are instructions that change the address flow of the program. These are primarily branch instructions but the full range of P0 instruction types is defined by the ETM4 protocol). This instruction was executed.
Idx:373; ID:12; OCSD_GEN_TRC_ELEM_PE_CONTEXT((ISA=A64) EL0N; 64-bit; ) *** OpenCSD library outputs the context information for the client application. Where traced context also includes the Context ID register which may contain the PID of the program running on the CPU under dertain kernel configurations.
Idx:384; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4d2488:[0x4d2494] num_i(3) last_sz(4) (ISA=A64) E iBR A64:ret ) *** OpenCSD outputs the executed instruction trace range three instructions from addresses 0x4d2488 - 0x4d2493. This is output in response to the atom packet. This range was calculated by starting @ address 0x4d2488, walking through the program image from that address, examining opcodes until it found a P0 element (branch instruction) which it associates with the atom packet. This was an indirect branch (iBR A64:ret ) and also an aarch64 return instruction. This branch was taken (E). At this point we do not know the destination of the branch - so we are waiting for a target address packet in the input stream.
Idx:385; ID:12; [0x9d 0x48 0x5f 0x4d 0x00 0x00 0x00 0x00 0x00 ]; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x00000000004DBF20; *** Address packet - this updates the decoder with the target of the prior indirect branch - which can now continue decoding
Idx:394; ID:12; [0xde ]; I_ATOM_F4 : Atom format 4.; NENE *** 4 atom packets - representing 4 waypoint instructions that were taken (E) or not taken (N). You will note that this single packet will decode into 4 separate executed instruction ranges - with no further address packets visible in the trace.
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4dbf20:[0x4dbf24] num_i(1) last_sz(4) (ISA=A64) N BR <cond>) *** OpenCSD executed instruction range - 1 instruction - was a not taken branch. This range started @ 0x04DBF20 - taken from the address packet above. At this point the client program using the library can deduce the most recent taken branch information - as 0x4d2490=>0x4dbf20
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4dbf24:[0x4dbf2c] num_i(2) last_sz(4) (ISA=A64) E BR b+link ) *** OpenCSD executed instruction range - 2 instruction - last instruction was a taken branch + link - in this case the decoder can calculate the target address from the instruction opcode - so no address data needs to appear in the trace.
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4d1d88:[0x4d1db4] num_i(11) last_sz(4) (ISA=A64) N BR <cond>) *** OpenCSD executed instruction range - 11 instrucitons, starting at the address calculated from the previous range. So the start of the range was the result of a branch 0x4dbf28=>0x4d1d88 Ends in a not taken branch.
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4d1db4:[0x4d1dbc] num_i(2) last_sz(4) (ISA=A64) E BR <cond>) *** OpenCSD executed instruction range - 2 instructions, last is a direct branch which will allow us to calculate the target address.
Throughout this process the decoder maintains a current trace address from the incoming address packets, and by walking through the executed opcodes - calculating branch targets where possible. This is why the program and library binaries (or a memeory dump or their load locations ) is required for full trace decode and obtaining the information you require.
Again, I would recommend reading the ETM protocol spec and the OpenCSD libarary documentation to get a full understanding of the protocols and how decoding works.
Regards
Mike
If you are interested in tracing a particular binary & this is a userspace program then you may wish to try:- perf record -e cs_etm//u --per-thread <program-to-trace> to ensure that any trace collected is related to the program you are interested in. You can then use the facilites of perf report / perf script to examine the trace.
Thanks, but I luckily already knew about that one.
Regards
Mike
So there is another way to gather trace, maybe by interacting with the CoreSight driver directly. But looking into the "perf report" source code I couldn't find it yet.
Thanks and regards,
Dominik
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
Hi Dominik,
On Thu, Apr 22, 2021 at 04:24:26PM +0100, Mike Leach wrote:
[...]
For now, I am going to try implementing the sysfs tracing, since it
seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini.
You could download the snapshot .ini files for Hikey from: https://people.linaro.org/~leo.yan/opencsd/opencsd_hikey/hikey_snapshot.tgz
Just clarify: - The hikey_snapshot.tgz contains the configuration files which were manually edited by myself and I tested it can work with OpenCSD's snapshot tool; - I think you could continue to use CSAL tool to continue to retrieve the registers, so that can use the dumping registers to double check the settings in .ini.
I've read in the discovery.md from CSAL that they can be extracted by reading the ROM table of my Cortex-A53, but I'm doing/understanding something wrong. I cross-compiled the CSAL library and the csinfo-folder (which btw only compiled after inserting "MODULE_LICENSE("");" into the csinfo.c file). Then I copied the resulting csinfo.ko to my Hikey620 where I tried "sudo make" and it returned: Load CoreSight reporting module... expect "Resource temporarily unavailable" insmod csinfo.ko insmod: ERROR: could not insert module csinfo.ko: Resource temporarily unavailable Makefile:17: recipe for target 'load' failed make: *** [load] Error 1 If I understood correctly, that is intended, but should also give me the base address of the ROM-table afterward. But there was no other output. I do have CONFIG_DEVMEM enabled. I tried "sudo insmod csinfo.ko". directly as well, with the same result.
Looking at the code it appears that the output is printk(KERN_INFO ... ) Check the printk level is correct on your system.
It's good to use "dmesg" to check the kernel log, so it might give more hints for why fails to load kernel module. The compiling faiure with "MODULE_LICENSE" is weird, it's better to go back to check if have specified correctly for the kernel path?
I could neither find the physical base address for the Cortex-A53 nor a way to read the "CBAR_EL1" register, which apparently contains that information. At least not without a debugger. Since I don't know the correct address, i just tried "sudo ./csscan.py 0x0", and this was the result: @0x0 0x000 0x000 r0.0 unexpected CIDR: 0x000000f8 class:0 What does this output mean?
As far as I can tell to get valid output from this tool, you need to give it a valid input address / range of addresses. CIDR is the component ID register - this is used by the tool to match to known component values. If it says unexpected then it means just that - the value found is not a recognised CIDR for the tool. See the Coresight Spoecification for how CIDR and other IDR values are useda nd waht is expected.
Besides Mike's suggestion, I'd like to strongly suggest to disable CPUIdle when you debug CoreSight stuffs on Hikey620. So please add "nohlt" in the kernel command line.
Thanks, Leo
On 23/04/2021 02:10, Leo Yan wrote:
Hi Dominik,
On Thu, Apr 22, 2021 at 04:24:26PM +0100, Mike Leach wrote:
[...]
For now, I am going to try implementing the sysfs tracing, since it
seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini.
You could download the snapshot .ini files for Hikey from: https://people.linaro.org/~leo.yan/opencsd/opencsd_hikey/hikey_snapshot.tgz
Just clarify:
- The hikey_snapshot.tgz contains the configuration files which were manually edited by myself and I tested it can work with OpenCSD's snapshot tool;
- I think you could continue to use CSAL tool to continue to retrieve the registers, so that can use the dumping registers to double check the settings in .ini.
Thank you, these snapshot files are a huge help! I've added my binary and its .so files to the "cpu_0.ini", but it doesn't work quite yet. When using the traced data from my own binary with the trc_pkt_lister program, it just says "END OF TRACE DATA" (see attachment "trc_pkt_lister-decode_output.txt"). Weirdly enough, when using the cstrace.bin from Leo's snapshot, I get lots of decoded trace. Of course, the streams aren't synchronized and it is not informative, but it has content. This made me wonder, whether I'm collecting the trace correctly, but I believe there's not much I can do wrong. I attached my simpletrace.sh, which enables trace, runs the binary on CPU-core 0, stops trace, and collects it. I also tried a variant, where I filter only the relevant address ranges, but with the same result. Did I understand correctly, that I need the code segment addresses for the binary and .so files? (the entries from "/proc/<pid>/maps" with the "x" executable flag)? I think my .ini files should be correct, but I've attached my "cpu_0.ini" and the output of "/proc/<pid>/maps" of my binary just in case. I also deactivated ASLR, to ensure the addresses stay constant.
I've read in the discovery.md from CSAL that they can be extracted by reading the ROM table of my Cortex-A53, but I'm doing/understanding something wrong. I cross-compiled the CSAL library and the csinfo-folder (which btw only compiled after inserting "MODULE_LICENSE("");" into the csinfo.c file). Then I copied the resulting csinfo.ko to my Hikey620 where I tried "sudo make" and it returned: Load CoreSight reporting module... expect "Resource temporarily unavailable" insmod csinfo.ko insmod: ERROR: could not insert module csinfo.ko: Resource temporarily unavailable Makefile:17: recipe for target 'load' failed make: *** [load] Error 1 If I understood correctly, that is intended, but should also give me the base address of the ROM-table afterward. But there was no other output. I do have CONFIG_DEVMEM enabled. I tried "sudo insmod csinfo.ko". directly as well, with the same result.
Looking at the code it appears that the output is printk(KERN_INFO ... ) Check the printk level is correct on your system.
It's good to use "dmesg" to check the kernel log, so it might give more hints for why fails to load kernel module. The compiling faiure with "MODULE_LICENSE" is weird, it's better to go back to check if have specified correctly for the kernel path?
Oh, of course, csinfo is a module, using printk, therefore the output also appears only on dmesg. There is indeed output. I've attached the dmesg output (load_csinfo_dmesg_output.txt), could you maybe take a quick glance at it, whether it looks as expected? Using the address from the MPIDR register ROM table ("sudo ./csscan.py 0x00000000f6400000") causes the board to hang until I replug the power supply (see csscan_output.txt).
Besides Mike's suggestion, I'd like to strongly suggest to disable CPUIdle when you debug CoreSight stuffs on Hikey620. So please add "nohlt" in the kernel command line.
I'm already using "nohlt", but thanks for the reminder.
Thanks, Leo
Thanks,
Dominik
-----Original Message----- From: CoreSight coresight-bounces@lists.linaro.org On Behalf Of Dominik Huber Sent: 28 April 2021 14:51 To: leo.yan@linaro.org; Mike Leach mike.leach@linaro.org Cc: coresight@lists.linaro.org Subject: Re: Help sought on which approach to take for tracing with CoreSight
On 23/04/2021 02:10, Leo Yan wrote:
Hi Dominik,
On Thu, Apr 22, 2021 at 04:24:26PM +0100, Mike Leach wrote:
[...]
For now, I am going to try implementing the sysfs tracing, since it
seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini.
You could download the snapshot .ini files for Hikey from: https://people.linaro.org/~leo.yan/opencsd/opencsd_hikey/hikey_snapsho t.tgz
Just clarify:
- The hikey_snapshot.tgz contains the configuration files which were manually edited by myself and I tested it can work with OpenCSD's snapshot tool;
- I think you could continue to use CSAL tool to continue to retrieve the registers, so that can use the dumping registers to double check the settings in .ini.
Thank you, these snapshot files are a huge help! I've added my binary and its .so files to the "cpu_0.ini", but it doesn't work quite yet. When using the traced data from my own binary with the trc_pkt_lister program, it just says "END OF TRACE DATA" (see attachment "trc_pkt_lister- decode_output.txt"). Weirdly enough, when using the cstrace.bin from Leo's snapshot, I get lots of decoded trace. Of course, the streams aren't synchronized and it is not informative, but it has content. This made me wonder, whether I'm collecting the trace correctly, but I believe there's not much I can do wrong. I attached my simpletrace.sh, which enables trace, runs the binary on CPU-core 0, stops trace, and collects it. I also tried a variant, where I filter only the relevant address ranges, but with the same result. Did I understand correctly, that I need the code segment addresses for the binary and .so files? (the entries from "/proc/<pid>/maps" with the "x" executable flag)? I think my .ini files should be correct, but I've attached my "cpu_0.ini" and the output of "/proc/<pid>/maps" of my binary just in case. I also deactivated ASLR, to ensure the addresses stay constant.
I've read in the discovery.md from CSAL that they can be extracted by reading the ROM table of my Cortex-A53, but I'm doing/understanding something wrong. I cross-compiled the CSAL library and the csinfo-folder (which btw only compiled after inserting "MODULE_LICENSE("");" into the csinfo.c file). Then I copied the resulting csinfo.ko to my Hikey620 where I tried "sudo make" and it returned: Load CoreSight reporting module... expect "Resource temporarily
unavailable"
insmod csinfo.ko insmod: ERROR: could not insert module csinfo.ko: Resource temporarily unavailable Makefile:17: recipe for target 'load' failed make: *** [load] Error 1 If I understood correctly, that is intended, but should also give me the base address of the ROM-table afterward. But there was no other
output.
I do have CONFIG_DEVMEM enabled. I tried "sudo insmod csinfo.ko". directly as well, with the same result.
Looking at the code it appears that the output is printk(KERN_INFO ... ) Check the printk level is correct on your system.
It's good to use "dmesg" to check the kernel log, so it might give more hints for why fails to load kernel module. The compiling faiure with "MODULE_LICENSE" is weird, it's better to go back to check if have specified correctly for the kernel path?
Oh, of course, csinfo is a module, using printk, therefore the output also appears only on dmesg. There is indeed output. I've attached the dmesg output (load_csinfo_dmesg_output.txt), could you maybe take a quick glance at it, whether it looks as expected?
Sorry, I should have got to this thread earlier (I wrote csinfo and csscan). Yes, that is expected. You load the csinfo module so that it can read some system ID registers that aren't readable in userspace - it prints the values out as kernel log messages, and then unloads itself.
If there's a better way that involves the messages getting directly back to your shell when you load the module, or that avoids insmod complaining about the module 'failing to load', I'd be very happy to fix it to use that!
cssinfo exists only to discover where the main ROM table lives in the CPU's physical address space. It also prints some other ID registers. csinfo has no value if you already know the ROM table address... or if it turns out that the ROM table address register reads as zero, which is increasingly the case.
Using the address from the MPIDR register ROM table ("sudo ./csscan.py 0x00000000f6400000") causes the board to hang until I replug the power supply (see csscan_output.txt).
Unfortunately that is fairly typical. csscan has got as far as confirming that 0xf6400000 is a ROM table. It will then try and read each entry in the ROM table and interrogate the CoreSight devices pointed to. If those devices aren't accessible for any reason you may get a bus lockup. Sometimes it's due to them being powered off, which is why disabling CPUidle sometimes work - but other times, they are deliberately access-protected. At the top level, i.e. main ROM table, you may be encountering entries for the vendor's private system-level debug components.
Entries in the ROM table page itself may cause a bus lockup (i.e. you might be able to read the first entry at 0xf6400000 successfully, but 0xf6400010 might lock up).
You may be able to get further by excluding entries from the scan... possibly by hacking the csscan script. All csscan is doing at this stage is reading physical memory and following pointers. It doesn't actively start interacting with the devices unless you use the ATB topology discovery feature.
Al
Besides Mike's suggestion, I'd like to strongly suggest to disable CPUIdle when you debug CoreSight stuffs on Hikey620. So please add "nohlt" in the kernel command line.
I'm already using "nohlt", but thanks for the reminder.
Thanks, Leo
Thanks,
Dominik
HI Dominik,
On Wed, 28 Apr 2021 at 14:50, Dominik Huber dominik.huber@fau.de wrote:
On 23/04/2021 02:10, Leo Yan wrote:
Hi Dominik,
On Thu, Apr 22, 2021 at 04:24:26PM +0100, Mike Leach wrote:
[...]
For now, I am going to try implementing the sysfs tracing, since it
seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini.
You could download the snapshot .ini files for Hikey from: https://people.linaro.org/~leo.yan/opencsd/opencsd_hikey/hikey_snapshot.tgz
Just clarify:
- The hikey_snapshot.tgz contains the configuration files which were manually edited by myself and I tested it can work with OpenCSD's snapshot tool;
- I think you could continue to use CSAL tool to continue to retrieve the registers, so that can use the dumping registers to double check the settings in .ini.
Thank you, these snapshot files are a huge help! I've added my binary and its .so files to the "cpu_0.ini", but it doesn't work quite yet. When using the traced data from my own binary with the trc_pkt_lister program, it just says "END OF TRACE DATA" (see attachment "trc_pkt_lister-decode_output.txt"). Weirdly enough, when using the cstrace.bin from Leo's snapshot, I get lots of decoded trace. Of course, the streams aren't synchronized and it is not informative, but it has content. This made me wonder, whether I'm collecting the trace correctly, but I believe there's not much I can do wrong. I attached my simpletrace.sh, which enables trace, runs the binary on CPU-core 0, stops trace, and collects it. I also tried a variant, where I filter only the relevant address ranges, but with the same result. Did I understand correctly, that I need the code segment addresses for the binary and .so files? (the entries from "/proc/<pid>/maps" with the "x" executable flag)? I think my .ini files should be correct, but I've attached my "cpu_0.ini" and the output of "/proc/<pid>/maps" of my binary just in case. I also deactivated ASLR, to ensure the addresses stay constant.
The output you see from trc_pkt_lister is likely due to the TRCTRACEIDR in the .ini files for the ETM devices not matching the values actually used when collecting the trace.
This output:- Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x10 Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x12 Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x14 Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x16 Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x17 Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x18 Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x1a Trace Packet Lister : Protocol printer ETMV4I on Trace ID 0x1c
indicates the expected Trace IDs to look for when processing your raw trace file. If data associated with these IDs is not present in the file, then it will be skipped. As I have mentioned before, decode requires accurate information on the configuration of the ETMs at trace capture time.
Try adding -o_raw_unpacked and you should see additional output of the raw unpacket trace frames. e.g.
Frame Data; Index 32656; ID_DATA[0x20]; ff ff db 9a 3f 10 e1 10 f7 00 00 00 00 00 00 Frame Data; Index 32672; ID_DATA[0x20]; 00 00 00 00 00 80 01 01 00 04 85 3c 10 e1 10 Frame Data; Index 32688; ID_DATA[0x20]; 00 80 ff ff 32 f7 9d 04 43 be 10 00 80 ff ff Frame Data; Index 32704; ID_DATA[0x20]; db 9a 3f 10 e1 10 f7 04 85 3c 10 e1 10 00 80 Frame Data; Index 32720; ID_DATA[0x20]; ff ff 32 06 03 9d 3d 10 e1 10 00 80 ff ff 81 Frame Data; Index 32736; ID_DATA[0x20]; 13 Frame Data; Index 32736; ID_DATA[0x00]; 00 00 00 00 00 00 00 00 00 00 00 00 00 Frame Data; Index 32752; ID_DATA[0x00]; 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Idx:0; ID:60; OCSD_GEN_TRC_ELEM_EO_TRACE( [end-of-trace]) ID:60 END OF TRACE DATA
Regards
Mike
I've read in the discovery.md from CSAL that they can be extracted by reading the ROM table of my Cortex-A53, but I'm doing/understanding something wrong. I cross-compiled the CSAL library and the csinfo-folder (which btw only compiled after inserting "MODULE_LICENSE("");" into the csinfo.c file). Then I copied the resulting csinfo.ko to my Hikey620 where I tried "sudo make" and it returned: Load CoreSight reporting module... expect "Resource temporarily unavailable" insmod csinfo.ko insmod: ERROR: could not insert module csinfo.ko: Resource temporarily unavailable Makefile:17: recipe for target 'load' failed make: *** [load] Error 1 If I understood correctly, that is intended, but should also give me the base address of the ROM-table afterward. But there was no other output. I do have CONFIG_DEVMEM enabled. I tried "sudo insmod csinfo.ko". directly as well, with the same result.
Looking at the code it appears that the output is printk(KERN_INFO ... ) Check the printk level is correct on your system.
It's good to use "dmesg" to check the kernel log, so it might give more hints for why fails to load kernel module. The compiling faiure with "MODULE_LICENSE" is weird, it's better to go back to check if have specified correctly for the kernel path?
Oh, of course, csinfo is a module, using printk, therefore the output also appears only on dmesg. There is indeed output. I've attached the dmesg output (load_csinfo_dmesg_output.txt), could you maybe take a quick glance at it, whether it looks as expected? Using the address from the MPIDR register ROM table ("sudo ./csscan.py 0x00000000f6400000") causes the board to hang until I replug the power supply (see csscan_output.txt).
Besides Mike's suggestion, I'd like to strongly suggest to disable CPUIdle when you debug CoreSight stuffs on Hikey620. So please add "nohlt" in the kernel command line.
I'm already using "nohlt", but thanks for the reminder.
Thanks, Leo
Thanks,
Dominik
Hi Mike, Hi Leo,
On 28/04/2021 21:41, Mike Leach wrote:
HI Dominik,
On Wed, 28 Apr 2021 at 14:50, Dominik Huber dominik.huber@fau.de wrote:
On 23/04/2021 02:10, Leo Yan wrote:
Hi Dominik,
On Thu, Apr 22, 2021 at 04:24:26PM +0100, Mike Leach wrote:
[...]
For now, I am going to try implementing the sysfs tracing, since it
seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini.
You could download the snapshot .ini files for Hikey from: https://people.linaro.org/~leo.yan/opencsd/opencsd_hikey/hikey_snapshot.tgz
Just clarify:
- The hikey_snapshot.tgz contains the configuration files which were manually edited by myself and I tested it can work with OpenCSD's snapshot tool;
- I think you could continue to use CSAL tool to continue to retrieve the registers, so that can use the dumping registers to double check the settings in .ini.
I might have found an easier way to figure out the register addresses than building a csinfo and running csinfo.py. Perf script -D prints a lot of extra information, among others:
... Magic number 4040404040404040 CPU 7 TRCCONFIGR 0 TRCTRACEIDR 1e TRCIDR0 28000ea1 TRCIDR1 4100f403 TRCIDR2 488 TRCIDR8 0 TRCAUTHSTATUS cc
Compared to Leo's snapshot the TRCIDR9 - 13 are missing, the TRCCONFIGR value changed from 0x1 to 0x0, the TRCTRACEIDR of ETM_7 changed from 0x17 to 0x1e and the (optional) IDs are missing. But I compared this with Leo's configuration using trc_pkt_lister, and the output was basically the same. Could it be, that the TRCIDR9 - 13 are not used for the Hikey620? Also, what does the last TRCCONFIGR-bit do? The ARM ETMv4 spec just says it is "RES1". Does "RES1" mean, that it's supposed to be 1, but does not really matter? Anyway, I think it would be really nice to add this usage of Perf to the documentation, maybe into the CSAL/.../coresight-tools/discovery.md. Since Al is the one who wrote csscan, I CC'ed you. Thanks for your nice explanation the other day.
Thank you, these snapshot files are a huge help! I've added my binary and its .so files to the "cpu_0.ini", but it doesn't work quite yet. When using the traced data from my own binary with the trc_pkt_lister program, it just says "END OF TRACE DATA" (see attachment "trc_pkt_lister-decode_output.txt"). Weirdly enough, when using the cstrace.bin from Leo's snapshot, I get lots of decoded trace. Of course, the streams aren't synchronized and it is not informative, but it has content. This made me wonder, whether I'm collecting the trace correctly, but I believe there's not much I can do wrong. I attached my simpletrace.sh, which enables trace, runs the binary on CPU-core 0, stops trace, and collects it. I also tried a variant, where I filter only the relevant address ranges, but with the same result. Did I understand correctly, that I need the code segment addresses for the binary and .so files? (the entries from "/proc/<pid>/maps" with the "x" executable flag)? I think my .ini files should be correct, but I've attached my "cpu_0.ini" and the output of "/proc/<pid>/maps" of my binary just in case. I also deactivated ASLR, to ensure the addresses stay constant.
As indicated earlier, tracing and decoding works now on my end. I was lucky and I've found the error. After every time I traced my test program, I ran
$ echo 1 > /sys/bus/coresight/devices/etm0/reset
to reset any address range filters I might have configured for this specific trace session. According to older documentation [1] I thought that this would set everything back to its boot configuration. But after my first trace any trace data I gathered was useless until I rebooted. In the newer doc [2] it is stated that it "Reset[s] all programming to trace nothing", which in hindsight is exactly what happened - nothing was traced. From my subjective point of view, the old purpose of this register seems more practical and better.
[1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-coresight-dev... [2] https://www.kernel.org/doc/html/latest/trace/coresight/coresight-etm4x-refer...
I've got a question regarding this mailing list. I noticed, that the subject (which approach to take for tracing) no longer suits the content of this mail. Should I write a "new" mail for every new topic? Or should I just hit "reply all", so that it's easy to see that it's still me with my Hikey620? Or should I maybe mix both, and reply adjusting the subject?
Thank you for all the help you've given me until now. Without it I probably would not have gotten this far.
Dominik
From: Dominik Huber dominik.huber@fau.de Sent: 30 April 2021 22:34 To: Mike Leach mike.leach@linaro.org; leo.yan@linaro.org Cc: coresight@lists.linaro.org; Al Grant Al.Grant@arm.com Subject: Re: Help sought on which approach to take for tracing with CoreSight
Hi Mike, Hi Leo,
On 28/04/2021 21:41, Mike Leach wrote:
HI Dominik,
On Wed, 28 Apr 2021 at 14:50, Dominik Huber dominik.huber@fau.de wrote:
On 23/04/2021 02:10, Leo Yan wrote:
Hi Dominik,
On Thu, Apr 22, 2021 at 04:24:26PM +0100, Mike Leach wrote:
[...]
> For now, I am going to try implementing the sysfs tracing, since > it seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini.
You could download the snapshot .ini files for Hikey from: https://people.linaro.org/~leo.yan/opencsd/opencsd_hikey/hikey_snaps hot.tgz
Just clarify:
- The hikey_snapshot.tgz contains the configuration files which were manually edited by myself and I tested it can work with OpenCSD's snapshot tool;
- I think you could continue to use CSAL tool to continue to retrieve the registers, so that can use the dumping registers to double check the settings in .ini.
I might have found an easier way to figure out the register addresses than building a csinfo and running csinfo.py. Perf script -D prints a lot of extra information, among others:
... Magic number 4040404040404040 CPU 7 TRCCONFIGR 0 TRCTRACEIDR 1e TRCIDR0 28000ea1 TRCIDR1 4100f403 TRCIDR2 488 TRCIDR8 0 TRCAUTHSTATUS cc
This is printing out the values that were captured in whatever perf.data file you are looking at. If you want the actual values on the current machine, you can find them in e.g. /sys/bus/coresight/devices/etm0/trcidr/trcidr9.
Compared to Leo's snapshot the TRCIDR9 - 13 are missing, the TRCCONFIGR value changed from 0x1 to 0x0, the TRCTRACEIDR of ETM_7 changed from 0x17 to 0x1e and the (optional) IDs are missing. But I compared this with Leo's configuration using trc_pkt_lister, and the output was basically the same. Could it be, that the TRCIDR9 - 13 are not used for the Hikey620? Also, what does the last TRCCONFIGR-bit do? The ARM ETMv4 spec just says it is "RES1". Does "RES1" mean, that it's supposed to be 1, but does not really matter? Anyway, I think it would be really nice to add this usage of Perf to the documentation, maybe into the CSAL/.../coresight-tools/discovery.md. Since Al is the one who wrote csscan, I CC'ed you. Thanks for your nice explanation the other day.
You're welcome! I could add that to the docs, though as a way to find the id registers if your kernel is already set up for CoreSight, it's more direct to go to sysfs as mentioned above. csscan is really aimed at people who have an unknown board in front of them. If you do want to see what Linux thinks CoreSight looks like, cs_topology_sysfs.py might be useful.
Al
Thank you, these snapshot files are a huge help! I've added my binary and its .so files to the "cpu_0.ini", but it doesn't work quite yet. When using the traced data from my own binary with the trc_pkt_lister program, it just says "END OF TRACE DATA" (see attachment "trc_pkt_lister-decode_output.txt"). Weirdly enough, when using the cstrace.bin from Leo's snapshot, I get lots of decoded trace. Of course, the streams aren't synchronized and it is not informative, but it has content. This made me wonder, whether I'm collecting the trace correctly, but I believe there's not much I can do wrong. I attached my simpletrace.sh, which enables trace, runs the binary on CPU-core 0, stops trace, and collects it. I also tried a variant, where I filter only the relevant address ranges, but with the same result. Did I understand correctly, that I need the code segment addresses for the binary and .so files? (the entries from "/proc/<pid>/maps" with the "x" executable flag)? I think my .ini files should be correct, but I've attached my "cpu_0.ini" and the output of "/proc/<pid>/maps" of my binary just in case. I also deactivated ASLR, to ensure the addresses stay constant.
As indicated earlier, tracing and decoding works now on my end. I was lucky and I've found the error. After every time I traced my test program, I ran
$ echo 1 > /sys/bus/coresight/devices/etm0/reset
to reset any address range filters I might have configured for this specific trace session. According to older documentation [1] I thought that this would set everything back to its boot configuration. But after my first trace any trace data I gathered was useless until I rebooted. In the newer doc [2] it is stated that it "Reset[s] all programming to trace nothing", which in hindsight is exactly what happened - nothing was traced. From my subjective point of view, the old purpose of this register seems more practical and better.
[1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-coresight- devices-etm4x [2] https://www.kernel.org/doc/html/latest/trace/coresight/coresight-etm4x- reference.html
I've got a question regarding this mailing list. I noticed, that the subject (which approach to take for tracing) no longer suits the content of this mail. Should I write a "new" mail for every new topic? Or should I just hit "reply all", so that it's easy to see that it's still me with my Hikey620? Or should I maybe mix both, and reply adjusting the subject?
Thank you for all the help you've given me until now. Without it I probably would not have gotten this far.
Dominik
Hi,
Just a quick follow-up about the MISSING_MODULE() error when compiling the csinfo module.
On 23/04/2021 02:10, Leo Yan wrote:
I've read in the discovery.md from CSAL that they can be extracted by reading the ROM table of my Cortex-A53, but I'm doing/understanding something wrong. I cross-compiled the CSAL library and the csinfo-folder (which btw only compiled after inserting "MODULE_LICENSE("");" into the csinfo.c file). Then I copied the resulting csinfo.ko to my Hikey620 where I tried "sudo make" and it returned: Load CoreSight reporting module... expect "Resource temporarily unavailable" insmod csinfo.ko insmod: ERROR: could not insert module csinfo.ko: Resource temporarily unavailable Makefile:17: recipe for target 'load' failed make: *** [load] Error 1 If I understood correctly, that is intended, but should also give me the base address of the ROM-table afterward. But there was no other output. I do have CONFIG_DEVMEM enabled. I tried "sudo insmod csinfo.ko". directly as well, with the same result.
Looking at the code it appears that the output is printk(KERN_INFO ... ) Check the printk level is correct on your system.
It's good to use "dmesg" to check the kernel log, so it might give more hints for why fails to load kernel module. The compiling faiure with "MODULE_LICENSE" is weird, it's better to go back to check if have specified correctly for the kernel path?
I've looked a little into it, and I think the error is caused by a Linux patch [1], that requires all modules to include a license tag. Since csinfo.c doesn't have one, I think that's the issue. Does making the csinfo module with the newest Linux kernel version work on your end?
[1] https://patchwork.kernel.org/project/linux-kbuild/patch/20201201103418.67585...
Regards,
Dominik