Greetings,
    
    I'm experiencing what appears to be a minimum clock resolution issue
    in using clock_gettime() on a PandaBoard ES running ubuntu.
    > uname -r
        3.1.1-8-linaro-lt-omap
        
        > cat /proc/version
        Linux version 3.1.1-8-linaro-lt-omap (buildd@diphda) (gcc
        version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) )
        #8~lt~ci~20120118001257+025756-Ubuntu SMP PREEMPT Thu Jan 19 09:
      
    I'm using clock_gettime() (and have tried gettimeofday()) to compute
    the elapsed time around roughly 15ms of computation (image
    processing). While the computed time is stable on my x86_64 machine,
    it is not on my PandaBoard ES. I have tried various clocks (e.g.
    CLOCK_REALTIME), but the issue remains. No error codes are returned
    by clock_gettime().
    
    The result on my x86_64 machine looks like this:
    elapsed (s)   elapsed (ns)   elapsed (us)    
          time (after)                     time (before)
                 0s       532260ns          532us     (t1: 
        73741s     92573265ns)     (t0:  73741s     92041005ns)
                 0s       544413ns          544us     (t1: 
        73741s    109390136ns)     (t0:  73741s    108845723ns)
                 0s       529328ns          529us     (t1: 
        73741s    126024860ns)     (t0:  73741s    125495532ns)
        
        A: 1.7s in total.    0.536ms on average.
    
    
    If I move over to my PandaBoard ES, I calculate elapsed times of 0us
    on some iterations.
    elapsed (s)   elapsed (ns)   elapsed (us)    
          time (after)                     time (before)
               0s            0ns            0us     (t1:
        269529s    192626951ns)     (t0: 269529s    192626951ns)
               0s            0ns            0us    
        (t1: 269529s    215606688ns)     (t0: 269529s    215606688ns)
                 0s      2655030ns         2655us    
        (t1: 269529s    252349852ns)     (t0: 269529s    249694822ns)
                 0s      2593994ns         2593us    
        (t1: 269529s    286163328ns)     (t0: 269529s    283569334ns)
                 0s        30518ns           30us    
        (t1: 269529s    317657469ns)     (t0: 269529s    317626951ns)
    
    
    If I crank up the amount of work done between the time calls
    (timetest.c:18: inneriters = 1e7;) such that the timed loop takes
    around 72ms, the timing results seem accurate and none of the
    intermediate calculations result in a 0us elapsed time. If I reduce
    it to around 10-25ms (inneriters=1e6), I get occasional 0us elapsed
    times. Around 2ms (inneriters=1e5), most results measure an elapsed
    time of 0us.
    
    I'm trying to optimize image processing functions, which take on the
    order of 2-15ms to process. Am I stuck with this timing resolution?
    I want to be careful to not omit issues like cache performance when
    timing, as I might if I repeatedly process an image to average the
    results. Currently, that seems like the best option.
    
    Source code and makefile attached, as well as /proc/timer_list
    
    Is this a property of the hardware, or might it be a bug?
    
    Thanks,
    Andrew