Greetings,
I'm experiencing what appears to be a minimum clock resolution issue
in using clock_gettime() on a PandaBoard ES running ubuntu.
> uname -r
3.1.1-8-linaro-lt-omap
> cat /proc/version
Linux version 3.1.1-8-linaro-lt-omap (buildd@diphda) (gcc
version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) )
#8~lt~ci~20120118001257+025756-Ubuntu SMP PREEMPT Thu Jan 19 09:
I'm using clock_gettime() (and have tried gettimeofday()) to compute
the elapsed time around roughly 15ms of computation (image
processing). While the computed time is stable on my x86_64 machine,
it is not on my PandaBoard ES. I have tried various clocks (e.g.
CLOCK_REALTIME), but the issue remains. No error codes are returned
by clock_gettime().
The result on my x86_64 machine looks like this:
elapsed (s) elapsed (ns) elapsed (us)
time (after) time (before)
0s 532260ns 532us (t1:
73741s 92573265ns) (t0: 73741s 92041005ns)
0s 544413ns 544us (t1:
73741s 109390136ns) (t0: 73741s 108845723ns)
0s 529328ns 529us (t1:
73741s 126024860ns) (t0: 73741s 125495532ns)
A: 1.7s in total. 0.536ms on average.
If I move over to my PandaBoard ES, I calculate elapsed times of 0us
on some iterations.
elapsed (s) elapsed (ns) elapsed (us)
time (after) time (before)
0s 0ns 0us (t1:
269529s 192626951ns) (t0: 269529s 192626951ns)
0s 0ns 0us
(t1: 269529s 215606688ns) (t0: 269529s 215606688ns)
0s 2655030ns 2655us
(t1: 269529s 252349852ns) (t0: 269529s 249694822ns)
0s 2593994ns 2593us
(t1: 269529s 286163328ns) (t0: 269529s 283569334ns)
0s 30518ns 30us
(t1: 269529s 317657469ns) (t0: 269529s 317626951ns)
If I crank up the amount of work done between the time calls
(timetest.c:18: inneriters = 1e7;) such that the timed loop takes
around 72ms, the timing results seem accurate and none of the
intermediate calculations result in a 0us elapsed time. If I reduce
it to around 10-25ms (inneriters=1e6), I get occasional 0us elapsed
times. Around 2ms (inneriters=1e5), most results measure an elapsed
time of 0us.
I'm trying to optimize image processing functions, which take on the
order of 2-15ms to process. Am I stuck with this timing resolution?
I want to be careful to not omit issues like cache performance when
timing, as I might if I repeatedly process an image to average the
results. Currently, that seems like the best option.
Source code and makefile attached, as well as /proc/timer_list
Is this a property of the hardware, or might it be a bug?
Thanks,
Andrew