On Wed, 2012-02-08 at 00:16 +0100, Zygmunt Krynicki wrote:
On 02/07/2012 11:43 PM, Andrew Richardson wrote:
Greetings,
I'm experiencing what appears to be a minimum clock resolution issue in using clock_gettime() on a PandaBoard ES running ubuntu.
*> uname -r* 3.1.1-8-linaro-lt-omap *> cat /proc/version* Linux version 3.1.1-8-linaro-lt-omap (buildd@diphda) (gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) ) #8~lt~ci~20120118001257+025756-Ubuntu SMP PREEMPT Thu Jan 19 09:
I'm using clock_gettime() (and have tried gettimeofday()) to compute the
Which clock_t were you using? I think CLOCK_MONOTONIC makes sense for what you are trying to do and perhaps it has different resolution/accuracy.
Hrm. No, that shouldn't be the case. CLOCK_MONOTONIC and CLOCK_REALTIME are driven by the same accumulation, and are only different by an offset.
That said, in the test case you're using CLOCK_MONOTONIC_RAW, which I don't think you really want, as its not NTP freq corrected. In addition it is driven by some slightly different accumulation logic. But you said CLOCK_REALTIME showed the same issue, so its probably not some CLOCK_MONOTONIC_RAW specific bug.
elapsed time around roughly 15ms of computation (image processing). While the computed time is stable on my x86_64 machine, it is not on my PandaBoard ES. I have tried various clocks (e.g. CLOCK_REALTIME), but the issue remains. No error codes are returned by clock_gettime().
The result on my x86_64 machine looks like this:
*elapsed (s) elapsed (ns) elapsed (us) time (after) time (before)* 0s 532260ns *532us* (t1: 73741s 92573265ns) (t0: 73741s 92041005ns) 0s 544413ns *544us* (t1: 73741s 109390136ns) (t0: 73741s 108845723ns) 0s 529328ns *529us* (t1: 73741s 126024860ns) (t0: 73741s 125495532ns) A: 1.7s in total. *0.536ms* on average.
If I move over to my PandaBoard ES, I calculate elapsed times of 0us on some iterations.
*elapsed (s) elapsed (ns) elapsed (us) time (after) time (before)* 0s 0ns *0us* (t1: 269529s 192626951ns) (t0: 269529s 192626951ns) 0s 0ns *0us* (t1: 269529s 215606688ns) (t0: 269529s 215606688ns) 0s 2655030ns *2655us* (t1: 269529s 252349852ns) (t0: 269529s 249694822ns) 0s 2593994ns *2593us* (t1: 269529s 286163328ns) (t0: 269529s 283569334ns) 0s 30518ns *30us* (t1: 269529s 317657469ns) (t0: 269529s 317626951ns)
If I crank up the amount of work done between the time calls (timetest.c:18: inneriters = 1e7;) such that the timed loop takes around 72ms, the timing results seem accurate and none of the intermediate calculations result in a 0us elapsed time. If I reduce it to around 10-25ms (inneriters=1e6), I get occasional 0us elapsed times. Around 2ms (inneriters=1e5), most results measure an elapsed time of 0us.
Hrm. So I'm not familiar with the clocksource on panda. It may be so coarse grained as to not allow for better intervals, but 2.5ms intervals are a little outrageous. In the above you do have a 30us interval, so clearly there are smaller intervals, so I doubt that is the real issue.
2.5ms is much closer to a tick length when HZ=300. Or a sched out and in w/ HZ=1000.
I'm trying to optimize image processing functions, which take on the order of 2-15ms to process. Am I stuck with this timing resolution? I want to be careful to not omit issues like cache performance when timing, as I might if I repeatedly process an image to average the results. Currently, that seems like the best option.
Might the compiler be out smarting you, and you end up with basically two calls to clock_gettime() next to each other? Then it would be more normal to see 0 ns time intervals (if the clocksource is somewhat coarse grained), with the occasional scheduling blip hitting inbetween the timings?
This explanation doesn't match your image timing results though, as I assume you're doing actual work in that case.
Hmmm. I'm a little stumped. Can anyone closer to the OMAP hardware comment?
thanks -john