An update on my oprofile adventures with panda.
I did add the kernel param as Nicolas suggested and am getting a little more data out of oprofile on panda but it's still pretty awful as the resolution of the samples is quite poor.
This data for instance was gathered over 5 runs of djpeg crunching on a 1920x1280 jpeg image:
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 29 34.5238 libjpeg.so.62.0.0 decode_mcu 27 32.1429 libjpeg.so.62.0.0 h2v2_fancy_upsample 8 9.5238 libjpeg.so.62.0.0 jsimd_idct_islow_neon 7 8.3333 libc-2.13.so /lib/arm-linux-gnueabi/libc-2.13.so 7 8.3333 libjpeg.so.62.0.0 jsimd_ycc_extrgb_convert_neon 4 4.7619 libjpeg.so.62.0.0 decompress_onepass 1 1.1905 libjpeg.so.62.0.0 sep_upsample 1 1.1905 no-vmlinux /no-vmlinux
That's not a lot of samples given the time involved. Worse there's no way to adjust the timer up or down to adjust the number of samples being captured. It really hurts the usefulness of oprofile for looking at performance problems in user space code on arm which is what I'm trying to do in support of the upstream libjpeg-turbo community.
Siarhei Siamashka for instance has also noted this. See http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html (scan down to ARM Cortex-A8 performance monitoring)
Not being wise to latest greatest in oprofile kernel mods, perhaps there's already a solution here... if so I'd love to hear it.
On Mon, Aug 29, 2011 at 9:23 AM, Christian Robottom Reis kiko@linaro.org wrote:
On Fri, Aug 26, 2011 at 11:10:11AM -0500, Tom Gall wrote:
I'll give that a try. Still, oprofile ought to work out of the box without fiddling.
That's exactly how I feel. If Nicolas is right, what causes this to depend on the kernel's counter selection, and why can't we figure out what to use in runtime? -- Christian Robottom Reis, Engineering VP Brazil (GMT-3) | [+55] 16 9112 6430 | [+1] 612 216 4935 Linaro.org: Open Source Software for ARM SoCs