Hello Ming,
could you please give some pointers to observe an overall status of oprofile support on ARM A9 cores? IIUC, now it doesn't work without oprofile.timer=1 kernel option, at least for Linus' tree; searching gives a lot of discussion/patches fragments and similar stuff, but I was unable to find a complete patch/git tree/whatever else to try.
Thanks in advance, Dmitry
Hi Dmitry,
On Wed, Feb 22, 2012 at 6:15 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
Hello Ming,
could you please give some pointers to observe an overall status of oprofile support on ARM A9 cores? IIUC, now it doesn't work
Wrt. perf support on ARM A9, I think the builtin PMU can work well with mainline kernel.
without oprofile.timer=1 kernel option, at least for Linus' tree;
I didn't use oprofile before and always use 'perf', and I am sure it works well with arm a9 pmu hardware on linus tree.
searching gives a lot of discussion/patches fragments and similar stuff, but I was unable to find a complete patch/git tree/whatever else to try.
I only know some patches[1][2] are needed for omap4, but all are omap4 hardware specific and nothing to do with arm a9 pmu.
[1], http://marc.info/?l=linux-omap&m=132686049213313&w=2 [2], http://marc.info/?l=linux-arm-kernel&m=132687938417894&w=2
Hope the above can help you, :-)
thanks, -- Ming Lei
On 02/22/2012 02:57 PM, Ming Lei wrote:
I didn't use oprofile before and always use 'perf', and I am sure it works well with arm a9 pmu hardware on linus tree.
Should we consider oprofile as obsolete in favor of perf?
Are these projects competing to be a default system profiling tool for Linux (at least for ARM platforms)?
Thanks, Dmitry
On Wed, Feb 22, 2012 at 7:09 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/22/2012 02:57 PM, Ming Lei wrote:
I didn't use oprofile before and always use 'perf', and I am sure it works well with arm a9 pmu hardware on linus tree.
Should we consider oprofile as obsolete in favor of perf?
Are these projects competing to be a default system profiling tool for Linux (at least for ARM platforms)?
I don't know.
But I think that oprofile should work well if perf can since seems both depends on same principle.
thanks, -- Ming Lei
On Wed, Feb 22, 2012 at 08:14:11PM +0800, Ming Lei wrote:
On Wed, Feb 22, 2012 at 7:09 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/22/2012 02:57 PM, Ming Lei wrote:
I didn't use oprofile before and always use 'perf', and I am sure it works well with arm a9 pmu hardware on linus tree.
Should we consider oprofile as obsolete in favor of perf?
Are these projects competing to be a default system profiling tool for Linux (at least for ARM platforms)?
Generally, I would advise to migrate to perf. oprofile now uses perf as the backend for ARM anyway; and perf seems to be more functional, more robust and in a better state of maintenance (just my opinion!)
Cheers ---Dave
On Wed, Feb 22, 2012 at 02:15:09PM +0400, Dmitry Antipov wrote:
Hello Ming,
could you please give some pointers to observe an overall status of oprofile support on ARM A9 cores? IIUC, now it doesn't work
Note -- it's important to understand that there's a difference between oprofile/perf on A9, and oprofile/perf on specific boards.
On A9 generically, perf (and hence oprofile -- oprofile is built on perf now) works.
However, the way that the performance counter interrupts are routed is dependent on the SoC. OMAP4 and later have an unusual way of doing this, which is why perf doesn't currently work upstream for these platforms.
Cheers ---Dave
On 02/22/2012 05:59 PM, Dave Martin wrote:
However, the way that the performance counter interrupts are routed is dependent on the SoC. OMAP4 and later have an unusual way of doing this, which is why perf doesn't currently work upstream for these platforms.
But will it work on Panda board with linux-linaro-3.3-rc3-2012.02-1 at least?
Dmitry
On Wed, Feb 22, 2012 at 06:41:22PM +0400, Dmitry Antipov wrote:
On 02/22/2012 05:59 PM, Dave Martin wrote:
However, the way that the performance counter interrupts are routed is dependent on the SoC. OMAP4 and later have an unusual way of doing this, which is why perf doesn't currently work upstream for these platforms.
But will it work on Panda board with linux-linaro-3.3-rc3-2012.02-1 at least?
Hopefully one of the other guys has the answer... I don't know, unfortunately.
Cheers ---Dave
Hi Dmitry,
On 02/22/2012 07:17 PM, Dave Martin wrote:
On Wed, Feb 22, 2012 at 06:41:22PM +0400, Dmitry Antipov wrote:
On 02/22/2012 05:59 PM, Dave Martin wrote:
However, the way that the performance counter interrupts are routed is dependent on the SoC. OMAP4 and later have an unusual way of doing this, which is why perf doesn't currently work upstream for these platforms.
But will it work on Panda board with linux-linaro-3.3-rc3-2012.02-1 at least?
WRT perf, the linux-linaro-3.3-rc3-2012.02-1 kernel shouldn't be different from the mainline v3.3-rc3. It has the following stuff (about 50 commits) on top of v3.3-rc3: * samsung_cpuidle_l2_retention patch set from the power management WG * thermal_cpu_cooling patch set from the power management WG * irq_domain patch set from Grant L. (cherry-picked from linux-next) * Fix for https://bugs.launchpad.net/bugs/918412 * Basic device tree board support for supported ARM boards (comes from linux-linaro-3.1) * sched: Ensure cpu_power periodic update (Vincent G.) * ARM: kprobes: work around build errors (Arnd B.) * usb: ehci: make HC see up-to-date qh/qtd descriptors ASAP (Ming L.) * Perf: Fallback to /bin/more if less is not found for perf pager (Avik S.)
A full change log against the 3.3-rc3 release is available at:
http://launchpad.net/linux-linaro/3.3/3.3-rc3-2012.02/+download/CHANGELOG-li...
If there is something specific in the kernel for Panda, you could also try the most recent release of TI Landing Team's kernel: http://launchpad.net/linaro-landing-team-ti/trunk/2012.01/+download/linux-re... referenced as linux-linaro-lt-ti (version 3.2-2012.01) from http://www.linaro.org/downloads LEB and LT kernels have much more board specific code then linux-linaro-3.3-rc3-2012.02-1.
Thanks, Andrey
Hopefully one of the other guys has the answer... I don't know, unfortunately.
Cheers ---Dave
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On Wed, Feb 22, 2012 at 10:41 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/22/2012 05:59 PM, Dave Martin wrote:
However, the way that the performance counter interrupts are routed is dependent on the SoC. OMAP4 and later have an unusual way of doing this, which is why perf doesn't currently work upstream for these platforms.
But will it work on Panda board with linux-linaro-3.3-rc3-2012.02-1 at least?
No, it doesn't work with upstream kernel now. You need to apply the patches[1][2] against upstream kernel to route CTIs IRQ so that OMAP4 PMU/perf can work well.
[1], http://marc.info/?l=linux-omap&m=132686049213313&w=2 [2], http://marc.info/?l=linux-arm-kernel&m=132687938417894&w=2
thanks, -- Ming Lei
On 02/23/2012 04:57 AM, Ming Lei wrote:
No, it doesn't work with upstream kernel now. You need to apply the patches[1][2] against upstream kernel to route CTIs IRQ so that OMAP4 PMU/perf can work well.
[1], http://marc.info/?l=linux-omap&m=132686049213313&w=2 [2], http://marc.info/?l=linux-arm-kernel&m=132687938417894&w=2
I'm not sure about "well" :-(:
... irq 34: nobody cared (try booting with the "irqpoll" option) ... Disabling IRQ #34 irq 33: nobody cared (try booting with the "irqpoll" option) ... Disabling IRQ #33 ...
cat /proc/interrupts ==>
CPU0 CPU1 29: 302749 362285 GIC twd 33: 400001 0 GIC arm-pmu 34: 0 400001 GIC arm-pmu
Dmitry
On Fri, Feb 24, 2012 at 4:23 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/23/2012 04:57 AM, Ming Lei wrote:
No, it doesn't work with upstream kernel now. You need to apply the patches[1][2] against upstream kernel to route CTIs IRQ so that OMAP4 PMU/perf can work well.
[1], http://marc.info/?l=linux-omap&m=132686049213313&w=2 [2], http://marc.info/?l=linux-arm-kernel&m=132687938417894&w=2
I'm not sure about "well" :-(:
... irq 34: nobody cared (try booting with the "irqpoll" option) ... Disabling IRQ #34 irq 33: nobody cared (try booting with the "irqpoll" option) ... Disabling IRQ #33 ...
cat /proc/interrupts ==>
CPU0 CPU1 29: 302749 362285 GIC twd 33: 400001 0 GIC arm-pmu 34: 0 400001 GIC arm-pmu
Could you share us how you reproduced the problem? and which kernel are you used to reproduce it?
Also considered that there are several patches required for omap4 perf, so it is better that you can post the diff.
thanks -- Ming Lei
On 02/24/2012 01:44 PM, Ming Lei wrote:
Could you share us how you reproduced the problem? and which kernel are you used to reproduce it?
The kernel is Linus' tree (bb4c7e9a9908548b458f34afb2fee74dc0d49f90), .config is attached. HW is Panda board (OMAP4430). Reproduced by just starting oprofile:
opcontrol --vmlinux=vmlinux opcontrol --start
then waiting a few seconds to hit an unhandled IRQs threshold.
Thanks, Dmitry
On Fri, Feb 24, 2012 at 9:56 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/24/2012 01:44 PM, Ming Lei wrote:
Could you share us how you reproduced the problem? and which kernel are you used to reproduce it?
The kernel is Linus' tree (bb4c7e9a9908548b458f34afb2fee74dc0d49f90), .config is attached. HW is Panda board (OMAP4430). Reproduced by just starting oprofile:
opcontrol --vmlinux=vmlinux opcontrol --start
then waiting a few seconds to hit an unhandled IRQs threshold.
Looks I didn't observe the problem with 'perf', so could you run 'perf' to see if same thing can be found?
Also I will try run 'oprofile' on my pandaboard to test next week.
BTW: suggest you to apply the recent arm pmu irq fix patches[1] to test 'oprofile'.
[1], http://marc.info/?t=133001284900005&r=1&w=2
thanks, -- Ming Lei
On 02/25/2012 07:24 AM, Ming Lei wrote:
BTW: suggest you to apply the recent arm pmu irq fix patches[1] to test 'oprofile'.
I tried, and with the same results - "nobody cared" messages about IRQs 33 and 34.
Dmitry
On Mon, Feb 27, 2012 at 9:12 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/25/2012 07:24 AM, Ming Lei wrote:
BTW: suggest you to apply the recent arm pmu irq fix patches[1] to test 'oprofile'.
I tried, and with the same results - "nobody cared" messages about IRQs 33 and 34.
After some check, I just found there is another patch you missed. Please try the attachment patch from Shilimkar, Santosh.
If it doesn't work, I can send my uImage for your test.
BTW: I have just tested -rc5 with these patches on your config, and perf does work on my pandaboard(A1).
thanks, -- Ming Lei
On 02/27/2012 06:27 PM, Ming Lei wrote:
After some check, I just found there is another patch you missed. Please try the attachment patch from Shilimkar, Santosh.
If it doesn't work, I can send my uImage for your test.
No effect, so please send an uImage if possible.
I'm re-sending cumulative patch against 500dd2370e77c9551ba298bdeeb91b02d8402199. It's possible that I miss something else, so it would be great if you can take a look through it one more time.
Thanks, Dmitry
On Mon, Feb 27, 2012 at 10:55 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/27/2012 06:27 PM, Ming Lei wrote:
After some check, I just found there is another patch you missed. Please try the attachment patch from Shilimkar, Santosh.
If it doesn't work, I can send my uImage for your test.
No effect, so please send an uImage if possible.
Please try the uImage on the link below:
http://kernel.ubuntu.com/~ming/up/uImage-3.3-rc5-perf
thanks, -- Ming Lei
On 02/28/2012 04:45 AM, Ming Lei wrote:
Please try the uImage on the link below:
http://kernel.ubuntu.com/~ming/up/uImage-3.3-rc5-perf
No good news for the oprofile:
... irq 34: nobody cared (try booting with the "irqpoll" option) [stack] Disabling IRQ #34 irq 33: nobody cared (try booting with the "irqpoll" option) [stack] Disabling IRQ #33 ...
Could you also try an attached module in a loop like:
while true; do insmod timeoutbench.ko && rmmod timeoutbench; done
with oprofile running?
Dmitry
On Tue, Feb 28, 2012 at 8:13 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/28/2012 04:45 AM, Ming Lei wrote:
Please try the uImage on the link below:
No good news for the oprofile:
OK, could you try the MLO and u-boot.bin under the link of http://kernel.ubuntu.com/~ming/up to see if 'perf' may work well?
If still not, could you tell me what is the revision of your pandaboard? or do you have any changes on the hardware?
I am sure that several guys have tried the current omap4 pmu patch and make perf work well on pandaboard.
... irq 34: nobody cared (try booting with the "irqpoll" option) [stack]
Disabling IRQ #34 irq 33: nobody cared (try booting with the "irqpoll" option) [stack] Disabling IRQ #33 ...
Could you also try an attached module in a loop like:
while true; do insmod timeoutbench.ko && rmmod timeoutbench; done
with oprofile running?
'perf top' can be run well with the output below:
PerfTop: 1036 irqs/sec kernel:99.2% us: 1.0% guest kernel: 0.0% guest us: 0.0% exact: 0.0% [1000Hz cycles], (all, 2 CPUs) --------------------------------------------------------------------------------------------------------
44.87% [kernel] [k] _raw_spin_unlock_irqrestore 22.48% [kernel] [k] _raw_spin_unlock_irq 7.41% [kernel] [k] del_timer_sync 6.24% [kernel] [k] lock_acquire 4.95% [kernel] [k] lock_release 2.05% [kernel] [k] omap4_enter_idle 1.81% [kernel] [k] finish_task_switch 1.06% [kernel] [k] rcu_note_context_switch 0.60% [kernel] [k] schedule_timeout 0.57% [kernel] [k] memchr_inv 0.54% [kernel] [k] __schedule 0.54% [kernel] [k] thumbee_notifier 0.53% [kernel] [k] sub_preempt_count
thanks, -- Ming Lei
On 02/28/2012 05:27 PM, Ming Lei wrote:
OK, could you try the MLO and u-boot.bin under the link of http://kernel.ubuntu.com/~ming/up to see if 'perf' may work well?
Is it really possible that the bootloader stuff affects perf/oprofile?
If still not, could you tell me what is the revision of your pandaboard?
The kernel and u-boot says that CPU is OMAP4430 ES2.2, the board box has the label with PANDABOARD UEVM4430G-01-00-00, and /proc/cpuinfo is shown below.
Processor : ARMv7 Processor rev 2 (v7l) processor : 0 BogoMIPS : 597.81
processor : 1 BogoMIPS : 597.81
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xc09 CPU revision : 2
Hardware : OMAP4 Panda board Revision : 0020 Serial : 0000000000000000
or do you have any changes on the hardware?
No.
I am sure that several guys have tried the current omap4 pmu patch and make perf work well on pandaboard.
Perf (in particular, "perf top") works for me too. Also I tried "perf record -a -F 1000 sleep 200" while running the kernel module workload, and have never seen "nobody cared" IRQ issues. You have said that oprofile uses perf subsystem as a backend, so this looks even more strange.
Dmitry
On 02/28/2012 05:27 PM, Ming Lei wrote:
I am sure that several guys have tried the current omap4 pmu patch and make perf work well on pandaboard.
On a freshly booted panda board which is mostly idle:
root@linaro-developer:~# uptime 14:44:36 up 1 min, 3 users, load average: 0.17, 0.11, 0.05 root@linaro-developer:~# perf stat -a `perf list | grep kmem | awk '{printf ("-e %s ", $1)}'` sleep 1
Performance counter stats for 'sleep 1':
55 kmem:kmalloc [99.99%] 143 kmem:kmem_cache_alloc [99.99%] 0 kmem:kmalloc_node [99.99%] 0 kmem:kmem_cache_alloc_node [99.99%] 29 kmem:kfree [99.99%] 301 kmem:kmem_cache_free [100.00%] 45 kmem:mm_page_free [100.00%] 32 kmem:mm_page_free_batched [100.00%] 35 kmem:mm_page_alloc [100.00%] 1 kmem:mm_page_alloc_zone_locked [100.00%] 0 kmem:mm_page_pcpu_drain [100.00%] 0 kmem:mm_page_alloc_extfrag
1.022554950 seconds time elapsed
root@linaro-developer:~# perf stat -a `perf list | grep sched | awk '{printf ("-e %s ", $1)}'` sleep 1
Performance counter stats for 'sleep 1':
0 sched:sched_kthread_stop [99.98%] 0 sched:sched_kthread_stop_ret [99.98%] 12 sched:sched_wakeup [99.98%] 0 sched:sched_wakeup_new [99.99%] 26 sched:sched_switch [99.99%] 0 sched:sched_migrate_task [99.99%] 1 sched:sched_process_free [99.99%] 1 sched:sched_process_exit [99.99%] 0 sched:sched_wait_task [99.99%] 1 sched:sched_process_wait [99.99%] 0 sched:sched_process_fork [99.99%] 4867991 sched:sched_stat_wait [100.00%] 10864556009 sched:sched_stat_sleep [100.00%] 0 sched:sched_stat_iowait [100.00%] 940098109 sched:sched_stat_blocked [100.00%] 243187241 sched:sched_stat_runtime [100.00%] 0 sched:sched_pi_setprio
1.068707582 seconds time elapsed
No objections for kmem counters, but I'm pretty sure that some sched counters are bogus.
Dmitry
On Thu, Mar 1, 2012 at 6:48 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/28/2012 05:27 PM, Ming Lei wrote:
I am sure that several guys have tried the current omap4 pmu patch and make perf work well on pandaboard.
On a freshly booted panda board which is mostly idle:
The following are all tracepoint events, and the arm a9 pmu is not involved with them.
root@linaro-developer:~# uptime 14:44:36 up 1 min, 3 users, load average: 0.17, 0.11, 0.05 root@linaro-developer:~# perf stat -a `perf list | grep kmem | awk '{printf ("-e %s ", $1)}'` sleep 1
It is OK by 'perf stat -a -e kmem:* sleep 1'
Performance counter stats for 'sleep 1':
55 kmem:kmalloc [99.99%] 143 kmem:kmem_cache_alloc [99.99%] 0 kmem:kmalloc_node [99.99%] 0 kmem:kmem_cache_alloc_node [99.99%] 29 kmem:kfree [99.99%] 301 kmem:kmem_cache_free [100.00%] 45 kmem:mm_page_free [100.00%] 32 kmem:mm_page_free_batched [100.00%] 35 kmem:mm_page_alloc [100.00%] 1 kmem:mm_page_alloc_zone_locked [100.00%] 0 kmem:mm_page_pcpu_drain [100.00%] 0 kmem:mm_page_alloc_extfrag
1.022554950 seconds time elapsed
root@linaro-developer:~# perf stat -a `perf list | grep sched | awk '{printf ("-e %s ", $1)}'` sleep 1
Performance counter stats for 'sleep 1':
0 sched:sched_kthread_stop [99.98%] 0 sched:sched_kthread_stop_ret [99.98%] 12 sched:sched_wakeup [99.98%] 0 sched:sched_wakeup_new [99.99%] 26 sched:sched_switch [99.99%] 0 sched:sched_migrate_task [99.99%] 1 sched:sched_process_free [99.99%] 1 sched:sched_process_exit [99.99%] 0 sched:sched_wait_task [99.99%] 1 sched:sched_process_wait [99.99%] 0 sched:sched_process_fork [99.99%] 4867991 sched:sched_stat_wait [100.00%] 10864556009 sched:sched_stat_sleep [100.00%] 0 sched:sched_stat_iowait [100.00%] 940098109 sched:sched_stat_blocked [100.00%] 243187241 sched:sched_stat_runtime [100.00%] 0 sched:sched_pi_setprio
1.068707582 seconds time elapsed
No objections for kmem counters, but I'm pretty sure that some sched counters are bogus.
Maybe you can verify these counters via ftrace interface, but I bet you can get same result.
thanks -- Ming Lei
Hi Dmitry,
On Fri, Feb 24, 2012 at 4:23 PM, Dmitry Antipov dmitry.antipov@linaro.org wrote:
On 02/23/2012 04:57 AM, Ming Lei wrote:
No, it doesn't work with upstream kernel now. You need to apply the patches[1][2] against upstream kernel to route CTIs IRQ so that OMAP4 PMU/perf can work well.
[1], http://marc.info/?l=linux-omap&m=132686049213313&w=2 [2], http://marc.info/?l=linux-arm-kernel&m=132687938417894&w=2
I'm not sure about "well" :-(:
... irq 34: nobody cared (try booting with the "irqpoll" option) ... Disabling IRQ #34 irq 33: nobody cared (try booting with the "irqpoll" option) ... Disabling IRQ #33
Could you try the patch below to see if it can fix your oprofile problem?
In fact, I observed this patch can fix the same problem triggered by the command below:
# frequency should be set as more than 40000 perf record -e cycles -F 50000 noploop
diff --git a/arch/arm/mach-omap2/devices.c b/arch/arm/mach-omap2/devices.c index d055abc..9c12bfa 100644 --- a/arch/arm/mach-omap2/devices.c +++ b/arch/arm/mach-omap2/devices.c @@ -507,7 +507,7 @@ static void __init omap4_configure_pmu_irq(void) /*configure CTI1 for pmu irq routing*/ cti_init(&omap4_cti[1], base1, OMAP44XX_IRQ_CTI1, 6); cti_unlock(&omap4_cti[1]); - cti_map_trigger(&omap4_cti[1], 1, 6, 2); + cti_map_trigger(&omap4_cti[1], 1, 6, 3); }
static struct platform_device* __init omap4_init_pmu(void)
Thanks, -- Ming Lei
Ming Lei <ming.lei@...> writes:
In fact, I observed this patch can fix the same problem triggered by the command below:
# frequency should be set as more than 40000 perf record -e cycles -F 50000 noploop
Hi, guys! It seems that you have already got the perf tool worked on the ARM A9 platform. Which version of the kernel did you ues? I tried with the linaro-3.6- rc6-2012.09, but the patch was failed.
Thank you!