Hi,
Please ensure you reply to the list as well - this gives you a better chance of getting a timely response & my find people who know more about CSAL.
On Wed, 21 Apr 2021 at 20:36, Dominik Huber dominik.huber@fau.de wrote:
Hello, Thanks for all the help. Especially the explanation of the trace was very insightful. For now, I am going to try implementing the sysfs tracing, since it seemed easier at first glance. I found the documentation for the .ini files within the snapshots, and am now trying to create my own version of them. But I'm struggling to get the addresses for all the devices, e.g. within cpu_0.ini. I've read in the discovery.md from CSAL that they can be extracted by reading the ROM table of my Cortex-A53, but I'm doing/understanding something wrong. I cross-compiled the CSAL library and the csinfo-folder (which btw only compiled after inserting "MODULE_LICENSE("");" into the csinfo.c file). Then I copied the resulting csinfo.ko to my Hikey620 where I tried "sudo make" and it returned: Load CoreSight reporting module... expect "Resource temporarily unavailable" insmod csinfo.ko insmod: ERROR: could not insert module csinfo.ko: Resource temporarily unavailable Makefile:17: recipe for target 'load' failed make: *** [load] Error 1 If I understood correctly, that is intended, but should also give me the base address of the ROM-table afterward. But there was no other output. I do have CONFIG_DEVMEM enabled. I tried "sudo insmod csinfo.ko". directly as well, with the same result.
Looking at the code it appears that the output is printk(KERN_INFO ... ) Check the printk level is correct on your system.
I could neither find the physical base address for the Cortex-A53 nor a way to read the "CBAR_EL1" register, which apparently contains that information. At least not without a debugger. Since I don't know the correct address, i just tried "sudo ./csscan.py 0x0", and this was the result: @0x0 0x000 0x000 r0.0 unexpected CIDR: 0x000000f8 class:0 What does this output mean?
As far as I can tell to get valid output from this tool, you need to give it a valid input address / range of addresses. CIDR is the component ID register - this is used by the tool to match to known component values. If it says unexpected then it means just that - the value found is not a recognised CIDR for the tool. See the Coresight Spoecification for how CIDR and other IDR values are useda nd waht is expected.
Regards
Mike
For the address 0xE011000 I got: @0xe011000 0x200 0x000 r0.7 unexpected CIDR: 0x00001a3f ROM table Does this help me? Unfortunately, this is not persistent information, since after a reboot it showed something different. How can I find out the addresses, that I need for the .ini files?
Regards,
Dominik
On 12/04/2021 11:44, Mike Leach wrote:
HI,
On Sat, 10 Apr 2021 at 13:59, Dominik Huber dominik.huber@fau.de wrote:
Am 09.04.2021 um 17:01 schrieb Mike Leach:
Hi Dominik
On Fri, 9 Apr 2021 at 15:22, Dominik Huberdominik.huber@fau.de wrote:
Hello, I want to gather trace data of closed source binaries using CoreSight ETMv4 on a Hikey620. I want to know the source and destination address for all taken jumps of the traced program, like in the output of "perf script". It would be great if I could get feedback on how to achieve this. I'm not sure where to turn to with such a broad CoreSight problem, so I'm sorry if you are not the right ones to turn to, but I'd be happy for any help or advice you might have.
This is exactly the right place to ask for help on this!
I'm glad to hear that!
My main problem is, that I don't know which approaches are promising to try. Below I describe two Ideas that I tried but where I got stuck after a while. Are they any good for my use case? If yes, then how can I solve the respective problems that have come up, or where can I look to solve them? If not, are there maybe better ways to approach this, which I've overlooked until now?
After hearing a presentation from Mathieu Poirier, I thought sysFS was the (only) way to go. However, the decoded trace seems to show only the jump address, instead of both the source and destination addresses, and I did not find a register to change that.
What do you mean by decoded trace here & what are you using to decode the trace? If you look at the ETM spec / OpenCSD documentation you will see that to fully decode trace there is a two step process.
- convert the trace byte stream into trace packets. This will
require some minimal information regarding the configuration of the ETM 2) convert the trace packets into the fully decoded execution trace. This requires access to the binary images executed during the trace session. The reason for this is that trace packets are highly compressed, and contain the minimum of address information. Decode requires that the decoder will walk the binary images to deduce which branches are taken and not taken. Only were address information cannot be deduced from this code is included in the trace packets - and this is only ever target address information - the source address can always be deduced from the code walk.
I used ptm2human for decoding. I also tried the c_api_pkt_print_test.c from OpenCSD, which decodes a single packet. However, depending on how I collected the trace data, it does not always produce decoded data.
ptm2human performs the 1st stage of decode - byte stream to trace packets. The output from this is the same as the packet only decode from the OpenCSD library / trc_pkt_lister app. This will print out the trace packets - but be aware that in both cases the addresses that you see in the trace are not all the branch addresses used in the executed application.
Once I have a working proof-of-concept, I intended to write myself an OpenCSD decoder. Because of that, and because the decoding often just seemed "to work", I postponed any thoughts about the packet processing. What are these "binary images"? Until now, I got only a cstrace.bin from /dev/[sink], which I used as only Input for any decoding. How can I get them? Can I, by using them, obtain the destination addresses of branch instructions?
cstrace.bin is the binary trace data. The binary images I refer to are the memory images of the executed code - the decoder walks this memory image to deduce the path of the executed trace based on the opcodes encountered. These are either the binary files of your application and any loaded .so files, or may be a memory dump of the locations these were loaded during the trace session. Either will do to decode.
See https://github.com/Linaro/OpenCSD/blob/master/decoder/docs/prog_guide/prog_g...
So to successfully decode a trace session to obtain the source and target branch addresses you want, you will need the following:-
- The captured binary trace data.
- The configuration registers of the ETM
- The program binaries for all applications and .so libraries active
during the trace session and their load addresses. Be aware that ETM will trace everything running on a core - so you it may be necessary for any analysis of a particular program to filter out anything unrelated.
Also, the trace gathered seems to lose some of the branch addresses. Inserting a sleep instruction after each regular instruction into my test program, fixed that. But since it should also work for closed source binaries and has to be fast this is probably not an option.
Then I tried to copy the way "perf record" is tracing, and extract the relevant code parts. But then I realized, that perf record doesn't use sysFS, apart from enabling the sink in "util/cs-etm.c" (which apparently is not used, and not even deactivated afterward).
perf uses the driver in the a similar way to sysfs. It does in fact activate and de-activate the sources and sinks as the perf events are run on any CPU. perf also records the binaries used during the trace session - so that full trace decode is possible. With sysfs you can get raw trace data - but relating this to the binaries being executed is far more difficult.
So is it your recommendation to use large parts of the perf source code for my project because it already makes use of these "binary images"?
I was not recommending re-use of perf source code - I was recommending using perf to capture and analyse trace. If this is not sufficient - or you want to write your own application, then perf serves as an example of the information you need to collect during a trace session to successfully decode the trace. perf does two tasks - capture - using perf record, and analysis using perf report. These are very separate elements - the first happens in kernel space, the second, often offline as a separate perf program, in user space.
Up until now, I only thought about using a few hundred lines, because I believe perf has a lot of overhead (both in performance and code length) just to get branch instructions from an executed binary.
The most important factors for me are that
- the trace includes all branches (including the jump destination), and
- it is fast (faster than using hardware emulation and extracting the
addresses from there)
What about the approaches to use the CoreSight driver directly or to use CSAL for trace collection? Would they maybe be better suited?
The best method used for capturing trace is really an assessment for you to decide based on your requirements. CSAL / direct driver access / sysfs are all options - though I am not aware of anyone actually using a direct driver access method so could not really advise here.
Remember that whatever method you choose, in order to get the data that you require you will need to collect the additional information I describe above. Once you have this data you can then pass it to the OpenCSD library for full decode. This will output the executed trace ranges. You are then free to analyse these ranges to synthesise source / target data for branches.
Below is a short example of the trace output from the library - I have annotated this '***' to explain what is happening. This was generated using trc_pkt_lister test program from OpenCSD, running on one of the supplied test captures.
The I_ packets are the trace packets - as would be output from ptm2human, the OCSD_GEN_TRC packets are the output of the library - fully decoded trace ranges.
Idx:356; ID:12; [0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x80 ]; I_ASYNC : Alignment Synchronisation. *** start of trace - align the decoder to the incoming byte stream for the ETM programmed with trace ID 0x12
Idx:369; ID:12; [0x01 0x01 0x00 ]; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 } *** Some setup information regarding trace configuration - used internally by the decoder.
Idx:372; ID:12; [0xf7 ]; I_ATOM_F1 : Atom format 1.; E *** atom packet - skipped - cannot decode until we get an address / context packet
Idx:373; ID:12; [0x85 0x22 0x12 0x4d 0x00 0x00 0x00 0x00 0x00 0x30 ]; I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x00000000004D2488; Ctxt: AArch64,EL0, NS; *** address and context packet - this gives the trace start address for the decoder + EL and ISA context.
Idx:384; ID:12; [0xf7 ]; I_ATOM_F1 : Atom format 1.; E *** Atom packet. This indicates that the program executed a number of instructions until it encountered a P0 element instruction (a P0 element instruction - previously referred to as waypoints in PTM, are instructions that change the address flow of the program. These are primarily branch instructions but the full range of P0 instruction types is defined by the ETM4 protocol). This instruction was executed.
Idx:373; ID:12; OCSD_GEN_TRC_ELEM_PE_CONTEXT((ISA=A64) EL0N; 64-bit; ) *** OpenCSD library outputs the context information for the client application. Where traced context also includes the Context ID register which may contain the PID of the program running on the CPU under dertain kernel configurations.
Idx:384; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4d2488:[0x4d2494] num_i(3) last_sz(4) (ISA=A64) E iBR A64:ret ) *** OpenCSD outputs the executed instruction trace range three instructions from addresses 0x4d2488 - 0x4d2493. This is output in response to the atom packet. This range was calculated by starting @ address 0x4d2488, walking through the program image from that address, examining opcodes until it found a P0 element (branch instruction) which it associates with the atom packet. This was an indirect branch (iBR A64:ret ) and also an aarch64 return instruction. This branch was taken (E). At this point we do not know the destination of the branch - so we are waiting for a target address packet in the input stream.
Idx:385; ID:12; [0x9d 0x48 0x5f 0x4d 0x00 0x00 0x00 0x00 0x00 ]; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x00000000004DBF20; *** Address packet - this updates the decoder with the target of the prior indirect branch - which can now continue decoding
Idx:394; ID:12; [0xde ]; I_ATOM_F4 : Atom format 4.; NENE *** 4 atom packets - representing 4 waypoint instructions that were taken (E) or not taken (N). You will note that this single packet will decode into 4 separate executed instruction ranges - with no further address packets visible in the trace.
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4dbf20:[0x4dbf24] num_i(1) last_sz(4) (ISA=A64) N BR <cond>) *** OpenCSD executed instruction range - 1 instruction - was a not taken branch. This range started @ 0x04DBF20 - taken from the address packet above. At this point the client program using the library can deduce the most recent taken branch information - as 0x4d2490=>0x4dbf20
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4dbf24:[0x4dbf2c] num_i(2) last_sz(4) (ISA=A64) E BR b+link ) *** OpenCSD executed instruction range - 2 instruction - last instruction was a taken branch + link - in this case the decoder can calculate the target address from the instruction opcode - so no address data needs to appear in the trace.
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4d1d88:[0x4d1db4] num_i(11) last_sz(4) (ISA=A64) N BR <cond>) *** OpenCSD executed instruction range - 11 instrucitons, starting at the address calculated from the previous range. So the start of the range was the result of a branch 0x4dbf28=>0x4d1d88 Ends in a not taken branch.
Idx:394; ID:12; OCSD_GEN_TRC_ELEM_INSTR_RANGE(exec range=0x4d1db4:[0x4d1dbc] num_i(2) last_sz(4) (ISA=A64) E BR <cond>) *** OpenCSD executed instruction range - 2 instructions, last is a direct branch which will allow us to calculate the target address.
Throughout this process the decoder maintains a current trace address from the incoming address packets, and by walking through the executed opcodes - calculating branch targets where possible. This is why the program and library binaries (or a memeory dump or their load locations ) is required for full trace decode and obtaining the information you require.
Again, I would recommend reading the ETM protocol spec and the OpenCSD libarary documentation to get a full understanding of the protocols and how decoding works.
Regards
Mike
If you are interested in tracing a particular binary & this is a userspace program then you may wish to try:- perf record -e cs_etm//u --per-thread <program-to-trace> to ensure that any trace collected is related to the program you are interested in. You can then use the facilites of perf report / perf script to examine the trace.
Thanks, but I luckily already knew about that one.
Regards
Mike
So there is another way to gather trace, maybe by interacting with the CoreSight driver directly. But looking into the "perf report" source code I couldn't find it yet.
Thanks and regards,
Dominik
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK