Sorry, who is working on this and when is it going back in? perf is a significant feature and Panda is a very common board...
-- Michael
I have been unable to reproduce the boot hang problem after unreverting the interrupt patch. Can those experiencing it please verify they are using the latest released lmc tools to create their SD card? And can you please send me your config file and Pandaboard revision?
It has been suggested that the limited sample count in the given example is consistent with the fact the ARM oprofile code uses HZ as the sampling frequency (even though it uses a separate timer). I'm not yet familiar with the x86 code for setting the profiling interval on the fly but Frederic Turgis's suggesting of doing the same thing on ARM makes a whole lot of sense. We should make this a new requirement.
Once the interrupt issue is resolved I might suggest sampling cpu-cycles as a workaround to real-time sampling granularity, except that there apparently is an issue with reliably getting interrupts from the PMU. Does anyone know if this is still a problem in the A9 (I've only seen it discussed regarding the A8)? If it's still an issue I think it simply kills using PMU event counters with oprofile.
We need to do a little work to make configuring hardware events counters into the kernel easier. A recent change means that you need to set at least a couple independent config options for this. This should be simple to fix.
-dl
Once the interrupt issue is resolved I might suggest sampling cpu-cycles as a workaround to real-time sampling granularity, except that there apparently is an issue with reliably getting interrupts from the PMU. Does anyone know if this is still a problem in the A9 (I've only seen it discussed regarding the A8)? If it's still an issue I think it simply kills using PMU event counters with oprofile.
We just had an intern who experienced the same on A9, it required minimum 3 counters counting same event to get at least 1 interrupt on counter overflow :-( I put in the loop Cyril, who was mentoring him, to confirm but it is probably still in A9 errata list.
Regards Fred
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
I don't think that the A9 issue is the same as the A8. However, effects are the same i.e. it's hard to use PMU.
I cannot communicate the A9 errata document as-is due to legal stuff but I belive that I can explain the issue. The issue happens when counters are in overflow (then not sure that this impacts OProfile). Theoritically, an interrupt should fire in this case. In reality, this interrupt is lost randomly. The ARM proposed workaround is to use 2 counters: counter 0 and counter1 initialized at counter0+1. If one interrupt is lost, the other one should fire just after. We have noticed that this could not be sufficient and that a third counter should be used to have close to 0% of the interrupts lost.
Note: This HW issue has been fixed by ARM quite "late", so I think that most of the devices on the market should be impacted.
Best regards, Cyril
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
On Thu, Sep 15, 2011 at 8:02 PM, Turgis, Frederic f-turgis@ti.com wrote:
Once the interrupt issue is resolved I might suggest sampling cpu-cycles
as a workaround to real-time sampling granularity, except that there apparently
is an issue with reliably getting interrupts from the PMU. Does anyone
know if this is still a problem in the A9 (I've only seen it discussed regarding
the A8)? If it's still an issue I think it simply kills using PMU event
counters with oprofile.
We just had an intern who experienced the same on A9, it required minimum 3 counters counting same event to get at least 1 interrupt on counter overflow :-( I put in the loop Cyril, who was mentoring him, to confirm but it is probably still in A9 errata list.
Regards Fred
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
On Fri, 2011-09-16 at 11:30 +0200, Bianconi, Cyril wrote:
I don't think that the A9 issue is the same as the A8. However, effects are the same i.e. it's hard to use PMU.
I cannot communicate the A9 errata document as-is due to legal stuff but I belive that I can explain the issue. The issue happens when counters are in overflow (then not sure that this impacts OProfile).
Overflow is the only way of getting a counter interrupt right? Then it's a fundamental problem for oprofile.
Theoritically, an interrupt should fire in this case. In reality, this interrupt is lost randomly. The ARM proposed workaround is to use 2 counters: counter 0 and counter1 initialized at counter0+1. If one interrupt is lost, the other one should fire just after. We have noticed that this could not be sufficient and that a third counter should be used to have close to 0% of the interrupts lost.
So, even with three counters there's still a statistical chance of failure?
Note: This HW issue has been fixed by ARM quite "late", so I think that most of the devices on the market should be impacted.
Are there part numbers that we can be reasonably sure do work, say perhaps the 4460?
Thanks, -dl
David,
Please find below my replies (CB>). I hope that they can help you. Sorry all the "I cannot say" due to legal stuffs but you should be in touch with ARM guys of Linaro to have the errata list. This does not provide many more info but at least you will have their official communication. They could also provide more details on a HW point of view. The issue is referenced by ARM as 751469.
Best regards, Cyril
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
On Fri, Sep 16, 2011 at 3:04 PM, David Long dave.long@linaro.org wrote:
** On Fri, 2011-09-16 at 11:30 +0200, Bianconi, Cyril wrote:
I don't think that the A9 issue is the same as the A8. However, effects are the same i.e. it's hard to use PMU.
I cannot communicate the A9 errata document as-is due to legal stuff but I belive that I can explain the issue. The issue happens when counters are in overflow (then not sure that this impacts OProfile).
Overflow is the only way of getting a counter interrupt right? Then it's a fundamental problem for oprofile.
CB> Yes, to my understanding, this is the only way. I'm not an OProfile expert and how it behaves internally. Here are my assumptions for the "not sure that this impacs OProfile" CB> As I remember, counter is 32 bits, then interrupt should fire only at about 2 Billion cycles, meaning for a device running at 1GHz, after about 2s. CB> OProfile is monitoring processes or functions durations. My high level view is that OProfile is looking at this profiling counter at "system transitions" like interrupts, context switches, ... CB>Then this means that the monitored activity should be longer than 2s without being preempted by the system in order to face the issue. Is such a use-case realistic? or may be I missed stg
Theoritically, an interrupt should fire in this case. In reality, this interrupt is lost randomly. The ARM proposed workaround is to use 2 counters: counter 0 and counter1 initialized at counter0+1. If one interrupt is lost, the other one should fire just after. We have noticed that this could not be sufficient and that a third counter should be used to have close to 0% of the interrupts lost.
So, even with three counters there's still a statistical chance of failure?
CB> ARM did not expain the root cause of their issue but only proposed a
workaround, so its quite difficult to know the probability of the issue. CB> However, your are right that there is always a statistical chance of failure. You can only reduce the probability. CB> I saw the following percentage of missed interrupts in my tests (few 10s of seconds): CB> 1 counter: about 28% CB> 2 counters: about 5.5% CB> 3 counters: about 0%
Note: This HW issue has been fixed by ARM quite "late", so I think that most of the devices on the market should be impacted.
Are there part numbers that we can be reasonably sure do work, say perhaps the 4460?
CB> For legal reasons, I don't think that I can provide the revision of A9 in 4460. However, ARM fixed it late i.e. in A9 r3p0, then ro, r1 and r2 "series" are impacted. I don't think that 4460 uses r3. May be the official TI representative at Linaro can provide you these info.
Thanks, -dl
Hi,
- To my understanding, oprofile is only a statistical tool based on regular sampling like "top" (well, I shall say /proc/stat). So it runs without impacting much your use case. I don't think it is triggered on system transitions. For that, I would use kernel traces or kprobes. Still a very useful (and used) tool.
You choose oprofile to be triggered every X ms by timer or every "overflow" of 1 PMU counter. Of course, "overflow interrupt" issue kills use of PMU for triggering (but I found only 1 article in the past really leveraging that). When tool wakes-up, it reads ARM registers + any info allowing to state in which function, thread, kernel/userspace we are (requires also debug symbols) By the way, ARM has a tool, which is an "oprofile" like where they tune wake-up timer and capture everything they can (kind of combination of top, oprofile, PMU counters reading... down to every ms). Not open source I think.
- PMU counter values are writable, no ? So if you want an interrupt every N events, you write "Overflow value - N" in counter and let counter run. On overflow interrupt, you reset counter to "Overflow value - N" again. With this "overflow interrupt" issue, this is clearly killing the use of it as an oprofile trigger.
Regards Fred
Overflow is the only way of getting a counter interrupt right? Then it's a fundamental problem for oprofile.
CB> Yes, to my understanding, this is the only way. I'm not an OProfile expert and how it behaves internally. Here are my assumptions for the "not sure that this impacs OProfile" CB> As I remember, counter is 32 bits, then interrupt should fire only at about 2 Billion cycles, meaning for a device running at 1GHz, after about 2s. CB> OProfile is monitoring processes or functions durations. My high level view is that OProfile is looking at this profiling counter at "system transitions" like interrupts, context switches, ... CB>Then this means that the monitored activity should be longer than 2s without being preempted by the system in order to face the issue. Is such a use-case realistic? or may be I missed stg
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
Andy made an interesting suggestion to me. What if the profile event code allocated all the counters to the requested event. It could reallocate half of them if a second event was also requested, and so on, till we're down to one (unreliably interrupting) counter. Or we could set a limit requiring at least three counters per event.
The problem would still exist, but this should throw the maximum amount of ammunition we have towards minimizing it. Would this be worth the effort to implement?
-dl
Hi,
I may not be the best one to answer: I use oprofile with regular wake-up and I am testing on long enough use cases so that a wake-up every scheduler tick is sufficient (like "top", it converges in some seconds to the right values). I don't really have use of performance counter wake-ups and I don't really know what CPU_CYCLES perf counter will bring except more granularity. Maybe something related to Idle time where timer-based oprofile would not execute as we can be tickless. I could be more interested in more granularity of timer based solution but official guideline may be to use CPU_CYCLES ;-)
So guys, please speak out if you need perf counters in oprofile, if you want more granularity in wake-ups. Or even simply require this interruption from perf counters for other purposes.
Regards Fred
Frederic Turgis OMAP Platform Business Unit - OMAP System Engineering - Platform Enablement
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
________________________________
From: David Long [mailto:dave.long@linaro.org] Sent: Monday, September 26, 2011 7:27 AM To: Turgis, Frederic Cc: Bianconi, Cyril; linaro-dev@lists.linaro.org; Michael Hope; Avik Sil; Dave Martin; Christian Robottom Reis; andy.green@linaro.org Subject: RE: 11.07 oprofile on panda busted?
Andy made an interesting suggestion to me. What if the profile event code allocated all the counters to the requested event. It could reallocate half of them if a second event was also requested, and so on, till we're down to one (unreliably interrupting) counter. Or we could set a limit requiring at least three counters per event.
The problem would still exist, but this should throw the maximum amount of ammunition we have towards minimizing it. Would this be worth the effort to implement?
-dl
On Mon, Sep 26, 2011 at 8:27 AM, David Long dave.long@linaro.org wrote:
Andy made an interesting suggestion to me. What if the profile event code allocated all the counters to the requested event. It could reallocate half of them if a second event was also requested, and so on, till we're down to one (unreliably interrupting) counter. Or we could set a limit requiring at least three counters per event.
The problem would still exist, but this should throw the maximum amount of ammunition we have towards minimizing it. Would this be worth the effort to implement?
Please just do something :) I don't see why you can't supply the kernels with oprofile working in timer mode with a usable sampling frequency *right now*. That should be enough for the helpless users who can't or don't want to tweak and build their own kernels. And some kind of workaround for A8/A9 PMU can be always applied later once/if you get it working reliable enough.
Also I don't know any details about A9 PMU problems, but the chance of encountering the missed PMU interrupt bug on A8 really depends on the use case which you are profiling. The code which uses lots of syscalls and spends a lot of time in the parts of the kernel accessing the problematic CP14/CP15 registers naturally has a *much* higher chance of triggering the bug. So if somebody says that the failure rate is very close to 0% and can be ignored, this may be not always the case.
The whole situation reminds me of one part of some stupid Seagal movie: * 'mad scientist' villain : babbling something about how complex his system is and how it can't be stopped * the hero : shoots at the laptop * 'mad scientist' villain : I didn't think of that
Come on, making oprofile practically usable on all ARM boards and devices is not that difficult.
On Tue, 2011-09-27 at 16:29 +0300, Siarhei Siamashka wrote:
Please just do something :) I don't see why you can't supply the kernels with oprofile working in timer mode with a usable sampling frequency *right now*. That should be enough for the helpless users who can't or don't want to tweak and build their own kernels. And some kind of workaround for A8/A9 PMU can be always applied later once/if you get it working reliable enough.
Be assured we consider timer mode granularity the highest priority aspect of this problem. My questions about using counter mode are to see if there is anything that could also be done there. Addressing issues with counter mode operation is lower priority but I have to ask the question while I have the attention of those that have the answers.
-dl
On Tue, Sep 27, 2011 at 5:16 PM, David Long dave.long@linaro.org wrote:
On Tue, 2011-09-27 at 16:29 +0300, Siarhei Siamashka wrote:
Please just do something :) I don't see why you can't supply the kernels with oprofile working in timer mode with a usable sampling frequency *right now*. That should be enough for the helpless users who can't or don't want to tweak and build their own kernels. And some kind of workaround for A8/A9 PMU can be always applied later once/if you get it working reliable enough.
Be assured we consider timer mode granularity the highest priority aspect of this problem. My questions about using counter mode are to see if there is anything that could also be done there. Addressing issues with counter mode operation is lower priority but I have to ask the question while I have the attention of those that have the answers.
OK, thanks.
On 16 September 2011 14:04, David Long dave.long@linaro.org wrote:
Are there part numbers that we can be reasonably sure do work, say perhaps the 4460?
The public TRM for the 4460 says it uses A9 r2p10.
On Fri, Sep 16, 2011 at 12:30 PM, Bianconi, Cyril c-bianconi@ti.com wrote:
I don't think that the A9 issue is the same as the A8. However, effects are the same i.e. it's hard to use PMU.
BTW, if anybody is interested in the details about the Cortex-A8 PMU issue, this information can be found in i.MX51 errata list: http://www.freescale.com/files/dsp/doc/errata/MCIMX51CE.pdf
Just search for ENGcm10700 there or for 628216 which is the ARM erratum ID. If Freescale keeps this nice tradition, eventually we may enjoy also having Cortex-A9 errata information in a free public access :)
I cannot communicate the A9 errata document as-is due to legal stuff but I belive that I can explain the issue. The issue happens when counters are in overflow (then not sure that this impacts OProfile). Theoritically, an interrupt should fire in this case. In reality, this interrupt is lost randomly. The ARM proposed workaround is to use 2 counters: counter 0 and counter1 initialized at counter0+1. If one interrupt is lost, the other one should fire just after. We have noticed that this could not be sufficient and that a third counter should be used to have close to 0% of the interrupts lost.
Close to 0% is still not good enough if we really want to rely on the statistical properties of the collected profiling data. As a shameless plug, here is a link to my oprofile related blog post from a bit less than a month ago: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
Note: This HW issue has been fixed by ARM quite "late", so I think that most of the devices on the market should be impacted.
This pretty much rules out the use of PMU for oprofile on both A8 and A9. How soon can we expect linaro kernels to switch to using timer mode in oprofile with a reasonably high samples collection rate for all the linaro supported boards? Increasing samples collection rate is very simple and can be done by replacing TICK_NSEC with something more reasonable here: https://github.com/torvalds/linux/blob/master/drivers/oprofile/timer_int.c#L...
The majority of users even never use any counters other than the cycle counter, so not using PMU is not a big loss. Just using a high resolution timer is a viable replacement if the CPU clock frequency does not change during the test. PMU can have a much better use for timing short sequences of code if the performance counters could get exposed to userspace in the mainline kernel.