Ramin,
Thanks for the email. I've added linaro-dev to my response.
The demo consisted of two identical PandaBoards with identical SD cards running the 3D benchmark of 0xbench using software 3D to amplify compiler and kernel improvements. 0xbench is a benchmarking program we ship with our Android images from 0xlab. Each build ran the same Android userspace, 2.3.4, but one was using the 2.6.36 Linux kernel and GCC 4.4 from the stock AOSP distribution and one was using an upgraded Linaro 3.0 Linux kernel with Linaro GCC 4.5. We ran the board in 640x480 mode so that we wouldn't be memory bound.
Users can use and recreate the builds easily. To program the builds visit
2.6.36 https://android-build.linaro.org/builds/~linaro-android/panda-11.05-release/
and
3.0 https://android-build.linaro.org/builds/~linaro-android/panda-11.07-release/
To recreate the builds from scratch visit:
https://wiki.linaro.org/Platform/Android/GetSource https://wiki.linaro.org/Platform/Android/BuildSource
Here's a video of demo running:
https://plus.google.com/104422661029399872488/posts/Rjmo5HCHQxZ (this is running in 720P mode not 640x480)
I'm happy to help you reproduce the demos. Feel free to drop by #linaro-android on freenode.
-Zach
On 10 August 2011 06:24, Ramin Zaghi Ramin.Zaghi@arm.com wrote:
Hi Zach
I didn’t get a chance to see you yesterday so got your card from the table. I was one of the first employees of our Multimedia Division who was from a game-dev background.
Your demos were interesting so I was wondering if I could ask for a bit of explanation, and what they were?
I also like to know what engine they were running or was it a custom bit of code that you wrote?
Thanks
Ramin
Ramin Zaghi
Software Engineer
Processor Division / PDSW
ARM Ltd.
110 Fulbourn Road
Cambridge
CB1 9NJ, UK
Tel: +44 1223 406347
[Extn 22347]
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
2011/8/11 Zach Pfeffer zach.pfeffer@linaro.org:
Ramin,
Thanks for the email. I've added linaro-dev to my response.
hi Ramin and Zach,
The demo consisted of two identical PandaBoards with identical SD cards running the 3D benchmark of 0xbench using software 3D to amplify compiler and kernel improvements. 0xbench is a benchmarking program we ship with our Android images from 0xlab.
[...]
For more information, please check the wiki: http://code.google.com/p/0xbench/wiki/Benchmarks
Recently, we even added JavaScript benchmark. And, every piece of 0xbench is licensed in the form of open source: http://gitorious.org/0xbench
We are considering to integrate existing OpenGL|ES benchmark tests such as glmark2: https://launchpad.net/glmark2
Sincerely, -jserv
Zach,
On Thu, Aug 11, 2011 at 12:56 AM, Zach Pfeffer zach.pfeffer@linaro.orgwrote:
The demo consisted of two identical PandaBoards with identical SD cards running the 3D benchmark of 0xbench using software 3D to amplify compiler and kernel improvements. 0xbench is a benchmarking program we ship with our Android images from 0xlab. Each build ran the same Android userspace, 2.3.4, but one was using the 2.6.36 Linux kernel and GCC 4.4 from the stock AOSP distribution and one was using an upgraded Linaro 3.0 Linux kernel with Linaro GCC 4.5. We ran the board in 640x480 mode so that we wouldn't be memory bound.
have you checked all clock configuration and ensure they are the same? .36 seems quite old (in the pandaboard lifetime) and i would suspect the CPU and memory clocks could be wrong compared to the linaro 3.0 (which I tried recently and which seems to have the right config). there are all bunch of kernel settings that can largely impact your demo like cache settings for example...
since DVFS is not enabled in both kernel I believe, the clock setting might very well come from the bootloaders. which xloader and uboot are you using in both cases?
have you tried to run the same demo with the exact same bootloaders and kernel? just a different user space built with 2 different compilers? I don't expect performances improvements to come from the kernel anyways (at least for such benchmark) that way you are sure you are really looking at GCC improvements. similarly you can run the same user space with both kernels.
Nicolas,
Thanks for the notes. As you say there are many, many things that can affect this demo. What notes like this really underscore is the importance of staying up-to-date. This demo is more about the macroscopic effects from tip support than anything else. We do have some more specific benchmark numbers at:
https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking
-Zach
On 13 August 2011 06:07, Dechesne, Nicolas n-dechesne@ti.com wrote:
Zach,
On Thu, Aug 11, 2011 at 12:56 AM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
The demo consisted of two identical PandaBoards with identical SD cards running the 3D benchmark of 0xbench using software 3D to amplify compiler and kernel improvements. 0xbench is a benchmarking program we ship with our Android images from 0xlab. Each build ran the same Android userspace, 2.3.4, but one was using the 2.6.36 Linux kernel and GCC 4.4 from the stock AOSP distribution and one was using an upgraded Linaro 3.0 Linux kernel with Linaro GCC 4.5. We ran the board in 640x480 mode so that we wouldn't be memory bound.
have you checked all clock configuration and ensure they are the same? .36 seems quite old (in the pandaboard lifetime) and i would suspect the CPU and memory clocks could be wrong compared to the linaro 3.0 (which I tried recently and which seems to have the right config). there are all bunch of kernel settings that can largely impact your demo like cache settings for example...
since DVFS is not enabled in both kernel I believe, the clock setting might very well come from the bootloaders. which xloader and uboot are you using in both cases?
have you tried to run the same demo with the exact same bootloaders and kernel? just a different user space built with 2 different compilers? I don't expect performances improvements to come from the kernel anyways (at least for such benchmark) that way you are sure you are really looking at GCC improvements. similarly you can run the same user space with both kernels.
On Tue, Aug 16, 2011 at 7:14 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
Nicolas,
Thanks for the notes. As you say there are many, many things that can affect this demo. What notes like this really underscore is the importance of staying up-to-date. This demo is more about the macroscopic effects from tip support than anything else. We do have some more specific benchmark numbers at:
https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking
If we're confident that the benchmark produces results of a trustworthy quality, then that's fine. I don't know this benchmark in detail, so I can't really judge, other than that the results look a bit odd.
But a performance comparison where the "fast" board occasionally produces worse numbers than "slow" board does rather undermine the argument -- when a given person comes to look at the demo and watches a single run then that result may be the only thing they see, and they may take away a negative impression. Explanations can be made of course, but the point of a demo is that seeing is believing.
There might be ways to modify the demo to show the comparison a bit better though. Someone (kiko?) suggested running the rendering continuously throughout the day, with a total frame count displayed for each board or something. This could show more effectively the long-term average performance, and would smooth out the impact of short-term OS housekeeping tasks and other junk which may execute randomly during the demo.
Cheers ---Dave
-Zach
On 13 August 2011 06:07, Dechesne, Nicolas n-dechesne@ti.com wrote:
Zach,
On Thu, Aug 11, 2011 at 12:56 AM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
The demo consisted of two identical PandaBoards with identical SD cards running the 3D benchmark of 0xbench using software 3D to amplify compiler and kernel improvements. 0xbench is a benchmarking program we ship with our Android images from 0xlab. Each build ran the same Android userspace, 2.3.4, but one was using the 2.6.36 Linux kernel and GCC 4.4 from the stock AOSP distribution and one was using an upgraded Linaro 3.0 Linux kernel with Linaro GCC 4.5. We ran the board in 640x480 mode so that we wouldn't be memory bound.
have you checked all clock configuration and ensure they are the same? .36 seems quite old (in the pandaboard lifetime) and i would suspect the CPU and memory clocks could be wrong compared to the linaro 3.0 (which I tried recently and which seems to have the right config). there are all bunch of kernel settings that can largely impact your demo like cache settings for example...
since DVFS is not enabled in both kernel I believe, the clock setting might very well come from the bootloaders. which xloader and uboot are you using in both cases?
have you tried to run the same demo with the exact same bootloaders and kernel? just a different user space built with 2 different compilers? I don't expect performances improvements to come from the kernel anyways (at least for such benchmark) that way you are sure you are really looking at GCC improvements. similarly you can run the same user space with both kernels.
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On 17 August 2011 04:12, Dave Martin dave.martin@linaro.org wrote:
On Tue, Aug 16, 2011 at 7:14 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
Nicolas,
Thanks for the notes. As you say there are many, many things that can affect this demo. What notes like this really underscore is the importance of staying up-to-date. This demo is more about the macroscopic effects from tip support than anything else. We do have some more specific benchmark numbers at:
https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking
If we're confident that the benchmark produces results of a trustworthy quality, then that's fine. I don't know this benchmark in detail, so I can't really judge, other than that the results look a bit odd.
But a performance comparison where the "fast" board occasionally produces worse numbers than "slow" board does rather undermine the argument -- when a given person comes to look at the demo and watches a single run then that result may be the only thing they see, and they may take away a negative impression. Explanations can be made of course, but the point of a demo is that seeing is believing.
Sure. I have seen it be slower in a few instances.
There might be ways to modify the demo to show the comparison a bit better though. Someone (kiko?) suggested running the rendering continuously throughout the day, with a total frame count displayed for each board or something. This could show more effectively the long-term average performance, and would smooth out the impact of short-term OS housekeeping tasks and other junk which may execute randomly during the demo.
Yeah, that sounds good. Most of our improvements are against the code running on the main core so anything compute bound should work. Perhaps we could do a fractal demo and throw up a realtime, slightly transparent, dashboard that showed the results as the demo free ran.
Cheers ---Dave
-Zach
On 13 August 2011 06:07, Dechesne, Nicolas n-dechesne@ti.com wrote:
Zach,
On Thu, Aug 11, 2011 at 12:56 AM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
The demo consisted of two identical PandaBoards with identical SD cards running the 3D benchmark of 0xbench using software 3D to amplify compiler and kernel improvements. 0xbench is a benchmarking program we ship with our Android images from 0xlab. Each build ran the same Android userspace, 2.3.4, but one was using the 2.6.36 Linux kernel and GCC 4.4 from the stock AOSP distribution and one was using an upgraded Linaro 3.0 Linux kernel with Linaro GCC 4.5. We ran the board in 640x480 mode so that we wouldn't be memory bound.
have you checked all clock configuration and ensure they are the same? .36 seems quite old (in the pandaboard lifetime) and i would suspect the CPU and memory clocks could be wrong compared to the linaro 3.0 (which I tried recently and which seems to have the right config). there are all bunch of kernel settings that can largely impact your demo like cache settings for example...
since DVFS is not enabled in both kernel I believe, the clock setting might very well come from the bootloaders. which xloader and uboot are you using in both cases?
have you tried to run the same demo with the exact same bootloaders and kernel? just a different user space built with 2 different compilers? I don't expect performances improvements to come from the kernel anyways (at least for such benchmark) that way you are sure you are really looking at GCC improvements. similarly you can run the same user space with both kernels.
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
-- Dave Martin dave.martin@linaro.org Linaro Kernel Working Group
-- http://www.linaro.org/ -- Open source software for ARM SoCs
http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg http://www.linaro.org/linaro-blog/
On Wed, Aug 17, 2011 at 11:12 PM, Dave Martin dave.martin@linaro.org wrote:
On Tue, Aug 16, 2011 at 7:14 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
Nicolas,
Thanks for the notes. As you say there are many, many things that can affect this demo. What notes like this really underscore is the importance of staying up-to-date. This demo is more about the macroscopic effects from tip support than anything else. We do have some more specific benchmark numbers at:
https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking
If we're confident that the benchmark produces results of a trustworthy quality, then that's fine. I don't know this benchmark in detail, so I can't really judge, other than that the results look a bit odd.
Ditto on that. Have these benchmarks been qualified? Do they represent real workloads? Where do they come from? What aspects of the system (CPU, memory, I/O, kernel, SMP) do they exercise? How sensitive are they to minor changes?
gnugo in particular is a problem - the results don't change across a range of toolchains which suggests it's got a silly hot loop or isn't core bound.
-- Michael
On 08/17/2011 04:59 PM, Michael Hope wrote:
On Wed, Aug 17, 2011 at 11:12 PM, Dave Martin dave.martin@linaro.org wrote:
On Tue, Aug 16, 2011 at 7:14 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
Nicolas,
Thanks for the notes. As you say there are many, many things that can affect this demo. What notes like this really underscore is the importance of staying up-to-date. This demo is more about the macroscopic effects from tip support than anything else. We do have some more specific benchmark numbers at:
https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking
If we're confident that the benchmark produces results of a trustworthy quality, then that's fine. I don't know this benchmark in detail, so I can't really judge, other than that the results look a bit odd.
Ditto on that. Have these benchmarks been qualified? Do they represent real workloads? Where do they come from? What aspects of the system (CPU, memory, I/O, kernel, SMP) do they exercise? How sensitive are they to minor changes?
The benchmark code comes from Android: http://android.git.kernel.org/?p=toolchain/benchmark.git
I'm not an expert on benchmarking. I've just tried to focus on running these in a way that's as fair and repeatable as possible.
gnugo in particular is a problem - the results don't change across a range of toolchains which suggests it's got a silly hot loop or isn't core bound.
On Fri, Aug 19, 2011 at 2:21 AM, Andy Doan andy.doan@linaro.org wrote:
On 08/17/2011 04:59 PM, Michael Hope wrote:
On Wed, Aug 17, 2011 at 11:12 PM, Dave Martin dave.martin@linaro.org wrote:
On Tue, Aug 16, 2011 at 7:14 PM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
Nicolas,
Thanks for the notes. As you say there are many, many things that can affect this demo. What notes like this really underscore is the importance of staying up-to-date. This demo is more about the macroscopic effects from tip support than anything else. We do have some more specific benchmark numbers at:
https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking
If we're confident that the benchmark produces results of a trustworthy quality, then that's fine. I don't know this benchmark in detail, so I can't really judge, other than that the results look a bit odd.
Ditto on that. Have these benchmarks been qualified? Do they represent real workloads? Where do they come from? What aspects of the system (CPU, memory, I/O, kernel, SMP) do they exercise? How sensitive are they to minor changes?
The benchmark code comes from Android: http://android.git.kernel.org/?p=toolchain/benchmark.git
I'm not an expert on benchmarking. I've just tried to focus on running these in a way that's as fair and repeatable as possible.
OK. Just keep an eye out then. If the benchmarks are dominated by things that Linaro isn't working on (such as I/O performance or memory bandwidth) then the results won't change. If they're dominated by certain inner functions that are very sensitive to environment changes, then you may see a regression. Benchmarks need to represent the workloads of a real system.
-- Michael