-----Original Message----- From: Mathieu Poirier [mailto:mathieu.poirier@linaro.org] Sent: 21 November 2017 17:46 To: Mike Leach mike.leach@linaro.org Cc: Robert Walker Robert.Walker@arm.com; CoreSight@lists.linaro.org Subject: Re: [PATCH 2/2] perf: Fix branch stack records from CoreSight ETM decode
On 21 November 2017 at 10:41, Mike Leach mike.leach@linaro.org wrote:
On 20 November 2017 at 15:21, Mathieu Poirier mathieu.poirier@linaro.org wrote:
I noticed that just doing a "perf report --stdio" on the autoFDO branch hangs with the commit I pointed out.
This hangs without Rob's patches too.
Correct - since Rob is already roaming in that code I was hoping he could have a look.
--stdio --dump works, --stdio only hangs.
I've tried it on a few trace captures from the HiKey 960 - it does complete eventually, but takes 10-20 minutes for a 50Mb input file. Does perf report ever complete for you if you leave it for a longer time?
If I inspect it with gdb, it seems to be spending a lot of time in cs_etm__run_decoder() making calls to cs_etm_decoder__process_data_block() - these usually only add a single packet to the output queue for cs_etm__sample(), but it is make *slow* progress through the trace data. Digging down a bit further, cs_etm_decoder__process_data_block() is most often calling the decoder with OCSD_OP_FLUSH because the previous call returned a WAIT response. I wonder if there's an efficiency problem here? With dense trace (i.e. all ATOM packets), it ends up calling into the trace decoder and cs_etm__sample() for almost every bit in the trace data. Can we make it build up a larger queue of packets from the decoder to pass to cs_etm__sample()?
Regards
Rob
I'm not surprised as --dump avoids a lot of code.
Not sure we haven't seen this before?
Mike