Thanks Rob for the suggestion.
Same number of runs as last time, same data file as well of course.
PU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 35 40.2299 libjpeg.so.62.0.0 decode_mcu 17 19.5402 libjpeg.so.62.0.0 h2v2_fancy_upsample 10 11.4943 no-vmlinux /no-vmlinux 8 9.1954 libjpeg.so.62.0.0 jsimd_idct_islow_neon 7 8.0460 libjpeg.so.62.0.0 jsimd_ycc_extrgb_convert_neon 5 5.7471 libc-2.13.so /lib/arm-linux-gnueabi/libc-2.13.so 3 3.4483 libjpeg.so.62.0.0 jpeg_fill_bit_buffer 1 1.1494 libjpeg.so.62.0.0 decompress_onepass 1 1.1494 libjpeg.so.62.0.0 sep_upsample
84 samples oprofile.timer=1
87 samples. oprofile.timer=1 nohz=0
On Mon, Aug 29, 2011 at 7:29 PM, Clark, Rob rob@ti.com wrote:
Could you try also adding 'nohz=0' to bootargs to disable tickless scheduler? Depending on what is the default in current linaro kernel, this might help..
BR, -R
On Mon, Aug 29, 2011 at 1:57 PM, Tom Gall tom.gall@linaro.org wrote:
An update on my oprofile adventures with panda.
I did add the kernel param as Nicolas suggested and am getting a little more data out of oprofile on panda but it's still pretty awful as the resolution of the samples is quite poor.
This data for instance was gathered over 5 runs of djpeg crunching on a 1920x1280 jpeg image:
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 29 34.5238 libjpeg.so.62.0.0 decode_mcu 27 32.1429 libjpeg.so.62.0.0 h2v2_fancy_upsample 8 9.5238 libjpeg.so.62.0.0 jsimd_idct_islow_neon 7 8.3333 libc-2.13.so /lib/arm-linux-gnueabi/libc-2.13.so 7 8.3333 libjpeg.so.62.0.0 jsimd_ycc_extrgb_convert_neon 4 4.7619 libjpeg.so.62.0.0 decompress_onepass 1 1.1905 libjpeg.so.62.0.0 sep_upsample 1 1.1905 no-vmlinux /no-vmlinux
That's not a lot of samples given the time involved. Worse there's no way to adjust the timer up or down to adjust the number of samples being captured. It really hurts the usefulness of oprofile for looking at performance problems in user space code on arm which is what I'm trying to do in support of the upstream libjpeg-turbo community.
Siarhei Siamashka for instance has also noted this. See http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html (scan down to ARM Cortex-A8 performance monitoring)
Not being wise to latest greatest in oprofile kernel mods, perhaps there's already a solution here... if so I'd love to hear it.
On Mon, Aug 29, 2011 at 9:23 AM, Christian Robottom Reis kiko@linaro.org wrote:
On Fri, Aug 26, 2011 at 11:10:11AM -0500, Tom Gall wrote:
I'll give that a try. Still, oprofile ought to work out of the box without fiddling.
That's exactly how I feel. If Nicolas is right, what causes this to depend on the kernel's counter selection, and why can't we figure out what to use in runtime? -- Christian Robottom Reis, Engineering VP Brazil (GMT-3) | [+55] 16 9112 6430 | [+1] 612 216 4935 Linaro.org: Open Source Software for ARM SoCs
-- Regards, Tom
"We want great men who, when fortune frowns will not be discouraged."
- Colonel Henry Knox
Linaro.org │ Open source software for ARM SoCs w) tom.gall att linaro.org w) tom_gall att vnet.ibm.com h) tom_gall att mac.com
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev