On Feb 7, 2012, at 7:30 PM, John Stultz wrote:
On Wed, 2012-02-08 at 00:16 +0100, Zygmunt Krynicki wrote:
On 02/07/2012 11:43 PM, Andrew Richardson wrote:
Which clock_t were you using? I think CLOCK_MONOTONIC makes sense for what you are trying to do and perhaps it has different resolution/accuracy.
Hrm. No, that shouldn't be the case. CLOCK_MONOTONIC and CLOCK_REALTIME are driven by the same accumulation, and are only different by an offset.
That said, in the test case you're using CLOCK_MONOTONIC_RAW, which I don't think you really want, as its not NTP freq corrected. In addition it is driven by some slightly different accumulation logic. But you said CLOCK_REALTIME showed the same issue, so its probably not some CLOCK_MONOTONIC_RAW specific bug.
In general, I don't want the time value moving around on me (in case something weird is going on and it's changing too much). This seems to be what most people advise when it comes to profiling something with sub-second execution, but I might be misunderstanding you slightly.
If I crank up the amount of work done between the time calls (timetest.c:18: inneriters = 1e7;) such that the timed loop takes around 72ms, the timing results seem accurate and none of the intermediate calculations result in a 0us elapsed time. If I reduce it to around 10-25ms (inneriters=1e6), I get occasional 0us elapsed times. Around 2ms (inneriters=1e5), most results measure an elapsed time of 0us.
Hrm. So I'm not familiar with the clocksource on panda. It may be so coarse grained as to not allow for better intervals, but 2.5ms intervals are a little outrageous. In the above you do have a 30us interval, so clearly there are smaller intervals, so I doubt that is the real issue.
2.5ms is much closer to a tick length when HZ=300. Or a sched out and in w/ HZ=1000.
Seems a bit too high, right? I did get some low values, such as a 500nanoseconds difference, once. I was expected a harsh lower bound (e.g. a few ms), but a measurement of 500ns elapsed makes that theory unlikely.
I'm trying to optimize image processing functions, which take on the order of 2-15ms to process. Am I stuck with this timing resolution? I want to be careful to not omit issues like cache performance when timing, as I might if I repeatedly process an image to average the results. Currently, that seems like the best option.
Might the compiler be out smarting you, and you end up with basically two calls to clock_gettime() next to each other? Then it would be more normal to see 0 ns time intervals (if the clocksource is somewhat coarse grained), with the occasional scheduling blip hitting inbetween the timings?
This explanation doesn't match your image timing results though, as I assume you're doing actual work in that case.
Hmmm. I'm a little stumped. Can anyone closer to the OMAP hardware comment?
I don't think that's the case. I used -O0, which shouldn't do such things, AFAIK. And, the assembly shows a call instruction (bl) to the function being timed between two call instructions to the timestamp function.
Additionally, the times stabilize to something reasonable when cranking up the loop parameter to 1e7. My x86 machine takes around 50ms, whereas the PandaBoard takes 75ms (once what appears to be power management turns off). This seems very reasonable. I also changed the program that I'm actually interested in timing and the times are reasonable when averaging 100 iterations (at 18ms each)
Andrew