Being able to accurately and consistently measure the elapsed CPU time is important for toolchain benchmarking. I ran a few experiments today and wrote up the results at: https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/TimerAccuracy
The original is available at: http://bazaar.launchpad.net/~michaelh1/linaro-toolchain-benchmarks/trunk/vie...
Short story: * Use clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ...) * The mean is unaffected by CPU load * I/O load has a significant effect * Use nice -n -20 to reduce the variance
For a CPU bound, non-VFP, L1 bound, 5 s long program: * The variation coefficient is < 0.01 % so we can reliably measure 0.1 % changes * The accuracy is around 50 us
I've changed eembc-linaro and will change denbench-linaro next. I recommend anyone else measuring time on core based benchmarks to do the same.
-- Michael
Hi,
What was the timer used previously for benchmarks ?
Your page focuses on accuracy while I think an important point is already the availability/functionality of clock_gettime(CLOCK_PROCESS/THREAD_CPUTIME_ID, ...) on ARM, which measures process/thread execution time vs "wall time" for most other timer APIs. Good to know it looks accurate (and Internet seems to say that API is available on Android)
Generally your program is alone to be benchmarked so wall time converges to exec time but as you suggest, it would be more correct to use this clock.
On our side, team was using basic timer computing "wall time" + scheduler trace from perf or systemtap (to remove periods of non-execution in a pos-processing tool) in order for example to profile/debug a specific sequence in an application. We could get rid of post-processing tool now.
Regards Fred
Frederic Turgis OMAP Platform Business Unit - OMAP System Engineering - Platform Enablement
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
-----Original Message-----
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev-bounces@lists.linaro.org] On Behalf Of Michael Hope Sent: Tuesday, August 30, 2011 3:47 AM To: Linaro Dev Subject: Choosing the timer to use in benchmarking
Being able to accurately and consistently measure the elapsed CPU time is important for toolchain benchmarking. I ran a few experiments today and wrote up the results at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/Tim erAccuracy
The original is available at:
http://bazaar.launchpad.net/~michaelh1/linaro-toolchain-benchm arks/trunk/view/head:/reports/timer-accuracy.rst
Short story:
- Use clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ...)
- The mean is unaffected by CPU load
- I/O load has a significant effect
- Use nice -n -20 to reduce the variance
For a CPU bound, non-VFP, L1 bound, 5 s long program:
- The variation coefficient is < 0.01 % so we can reliably measure
0.1 % changes
- The accuracy is around 50 us
I've changed eembc-linaro and will change denbench-linaro next. I recommend anyone else measuring time on core based benchmarks to do the same.
-- Michael
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On Wed, Aug 31, 2011 at 12:38 AM, Turgis, Frederic f-turgis@ti.com wrote:
Hi,
What was the timer used previously for benchmarks ?
It was a bit random across the different suites. CoreMark uses clock_gettime(CLOCK_REALTIME, ...) which is a wall clock with NTP adjustments. EEMBC uses clock() which is a lower resolution wall clock.
I was quite impressed with how steady the process clock was under different CPU loads. It doesn't seem to round up or down to a scheduler tick either which is nice.
-- Michael
It was a bit random across the different suites. CoreMark uses clock_gettime(CLOCK_REALTIME, ...) which is a wall clock with NTP adjustments. EEMBC uses clock() which is a lower resolution wall clock.
So everything was mostly wall clock based. We had not tested CLOCK_PROCESS_CPUTIME_ID internally and people were happily surprised of functionality/accuracy of your results. I assume this clock is just requiring a hrtimer + stats accumulation per process/thread in scheduler but I clearly prefer it to be there rather than post-processing a scheduler trace !
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
On Wed, 2011-08-31 at 07:43 +1200, Michael Hope wrote:
I was quite impressed with how steady the process clock was under different CPU loads. It doesn't seem to round up or down to a scheduler tick either which is nice.
Hmm. Very interesting results! Its been a little while since I've looked into the cpu time accounting, but I'd be hesitant to trust that it has finer then HZ granularity everywhere. I know that has been an issue in the past. I'll have to look into it.
thanks -john
On Fri, Sep 2, 2011 at 10:56 AM, John Stultz johnstul@us.ibm.com wrote:
On Wed, 2011-08-31 at 07:43 +1200, Michael Hope wrote:
I was quite impressed with how steady the process clock was under different CPU loads. It doesn't seem to round up or down to a scheduler tick either which is nice.
Hmm. Very interesting results! Its been a little while since I've looked into the cpu time accounting, but I'd be hesitant to trust that it has finer then HZ granularity everywhere. I know that has been an issue in the past. I'll have to look into it.
I was wondering the same and had a play with it as part of the exercise. Plotted the clock time for differing numbers of loop and didn't see the staircasing effect you'd expect if the time was rounded to the nearest scheduler tick.
I also had a look at the delta between each run and the standard deviation was nice and low - around 50 us.
-- Michael