On 11/12/20 8:57 AM, Linu Cherian wrote:
Hi Suzuki,
On Tue, Nov 10, 2020 at 8:27 PM Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Linu
On 11/10/20 12:57 PM, Linu Cherian wrote:
Hi Suzuki,
...
....
# ./perf report 0x368 [0x50]: failed to process type: 1 [Cannot allocate memory] Error: failed to process sample
I have no clue about it. Are you able to run it under GDB ? (Looks like you have built the perf, so if you have sources, it may be a good idea to run under the GDB and figure out where that error is coming from).
Yeah gdb helped figuring out the issue. The issue is in the opencsd, where it doesn't seem to support multiple streams when the formatter is not enabled. . Note:Our Silicon has formatter disabled and we already had changes in perf tool to take care of the formatter status.
The below hack helped.
diff --git a/decoder/source/ocsd_dcd_tree.cpp b/decoder/source/ocsd_dcd_tree.cpp index be15e36..0210dec 100644 --- a/decoder/source/ocsd_dcd_tree.cpp +++ b/decoder/source/ocsd_dcd_tree.cpp @@ -401,7 +401,7 @@ ocsd_err_t DecodeTree::createDecoder(const std::string &decoderName, const int c int crtFlags = createFlags;
uint8_t CSID = 0; // default for single stream decoder (no
deformatter) - we ignore the ID
- if(usingFormatter())
- //if(usingFormatter()) { CSID = pConfig->getTraceID(); crtFlags |= OCSD_CREATE_FLG_INST_ID;
Not sure if this is the right fix though.
That may work for you, but would break the existing platforms and the drivers which enable formatting by default. We need a way to address this in the perf side. This would be needed for the ETE/TRBE trace scenario as well, where the formatting is not supported by TRBE.
This is how i tested,
- # taskset 0x2 ./perf record -e cs_etm//u -F 10 --per-thread ping -c
30 127.0.0.1
# Ctrl-Z // Put the process in background
# taskset -p 0x4 <pid of ping process> // Move the ping process to core 2
# fg // Get the process to foreground
./perf report
snip ...
# Samples: 66K of event 'branches:uH' # Event count (approx.): 66953 # # Children Self Command Shared Object Symbol # ........ ........ ....... ..................... ........................................ # 15.94% 15.94% ping ld-2.31.so [.] _dl_lookup_symbol_x 14.93% 14.93% ping ld-2.31.so [.] do_lookup_x 10.68% 10.68% ping libc-2.31.so [.] _dl_addr 9.87% 9.87% ping ld-2.31.so [.] _dl_relocate_object 6.75% 6.75% ping ld-2.31.so [.] strcmp 3.62% 3.62% ping ld-2.31.so [.] check_match 2.72% 2.72% ping libc-2.31.so [.] __vfprintf_internal 1.90% 1.90% ping libc-2.31.so [.] _int_malloc 1.29% 1.29% ping libc-2.31.so [.] getenv 1.28% 1.28% ping libc-2.31.so [.] strcmp 1.17% 1.17% ping libc-2.31.so [.] _IO_file_xsputn@@GLIBC_2.17 1.16% 1.16% ping ld-2.31.so [.] _dl_name_match_p
snip ...
Also i could verify using prints in the tmc-etr-driver that the trace buffer gets reused across cores as well.
Cool ! So please could you test the newer version of this patch (not functionally different, but slightly modified code) and add a Tested-by if you are happy with it ?
https://lore.kernel.org/linux-arm-kernel/1605012309-24812-3-git-send-email-a...
Cheers Suzuki