On Fri, Sep 16, 2011 at 12:30 PM, Bianconi, Cyril c-bianconi@ti.com wrote:
I don't think that the A9 issue is the same as the A8. However, effects are the same i.e. it's hard to use PMU.
BTW, if anybody is interested in the details about the Cortex-A8 PMU issue, this information can be found in i.MX51 errata list: http://www.freescale.com/files/dsp/doc/errata/MCIMX51CE.pdf
Just search for ENGcm10700 there or for 628216 which is the ARM erratum ID. If Freescale keeps this nice tradition, eventually we may enjoy also having Cortex-A9 errata information in a free public access :)
I cannot communicate the A9 errata document as-is due to legal stuff but I belive that I can explain the issue. The issue happens when counters are in overflow (then not sure that this impacts OProfile). Theoritically, an interrupt should fire in this case. In reality, this interrupt is lost randomly. The ARM proposed workaround is to use 2 counters: counter 0 and counter1 initialized at counter0+1. If one interrupt is lost, the other one should fire just after. We have noticed that this could not be sufficient and that a third counter should be used to have close to 0% of the interrupts lost.
Close to 0% is still not good enough if we really want to rely on the statistical properties of the collected profiling data. As a shameless plug, here is a link to my oprofile related blog post from a bit less than a month ago: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
Note: This HW issue has been fixed by ARM quite "late", so I think that most of the devices on the market should be impacted.
This pretty much rules out the use of PMU for oprofile on both A8 and A9. How soon can we expect linaro kernels to switch to using timer mode in oprofile with a reasonably high samples collection rate for all the linaro supported boards? Increasing samples collection rate is very simple and can be done by replacing TICK_NSEC with something more reasonable here: https://github.com/torvalds/linux/blob/master/drivers/oprofile/timer_int.c#L...
The majority of users even never use any counters other than the cycle counter, so not using PMU is not a big loss. Just using a high resolution timer is a viable replacement if the CPU clock frequency does not change during the test. PMU can have a much better use for timing short sequences of code if the performance counters could get exposed to userspace in the mainline kernel.