Hi, Recently we have been looking at how to squeeze more performance out of our toolchain for building Firefox on Android. Mike Hommey integrated GCC 4.6 into the android NDK and has been testing performance (with mixed results http://gcc.gnu.org/ml/gcc/2011-08/msg00096.html). I like how Linaro is doing regular arm benchmarking, ie https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-0... . Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
We are also looking at setting a developer-friendly android ROM with oprofile, perf, systemtap, gdb, debug symbols, etc. It might even be beneficial for us to use newer kernels as we exlore options like kernel-assisted ld.so relocations, etc. That seems to similar to what Linaro provides in the evaluation ROMS. Is there any chance of Linaro providing developer-friendly "evaluation" ROMs for retail phones like the Nexus S?
Thanks, Taras
Hello Taras,
We've recently had a Linaro Connect conference, and many developers are still on the way from it, so more knowledgeable folks may add more info later, and I'd like to highlight some points in the meantime.
On Thu, 04 Aug 2011 12:03:00 -0700 Taras Glek tglek@mozilla.com wrote:
Hi, Recently we have been looking at how to squeeze more performance out of our toolchain for building Firefox on Android. Mike Hommey integrated GCC 4.6 into the android NDK and has been testing performance (with mixed results http://gcc.gnu.org/ml/gcc/2011-08/msg00096.html). I like how Linaro is doing regular arm benchmarking, ie https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-0... . Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
That sounds like a great idea. Our validation team currently exactly compiles list of tests which are useful to run as our validation and benchmarking effort. We're still bootstrapping our continuous integration system and working on "low-hanging fruit" (simple tests), but for sure looking forward to add more elaborated testsuites, especially for Android (which doesn't have much of variety). So, please consider us among your users.
We are also looking at setting a developer-friendly android ROM with oprofile, perf, systemtap, gdb, debug symbols, etc. It might even be beneficial for us to use newer kernels as we exlore options like kernel-assisted ld.so relocations, etc. That seems to similar to what Linaro provides in the evaluation ROMS.
We recently added busybox to our builds, as the default Android environment indeed pretty limited. And I agree that it would be nice to have more tools available. However, our aim at Linaro is to stay sufficiently close to the upstream, and neither we currently have resources to maintain "normal" and "developer" images.
One solution I personally see for this is breaking away from the "ROM" outlook, and taking Android more as a normal Linux distribution. So, anyone interested could easily rebuild Android with more packages installed, or even better, have native package manager to install more binary packages (something like opkg). That's forward-looking thinking though, we unlikely get to it soon, but would be interested to see if wider Android community has interest in that and help if we can.
Is there any chance of Linaro providing developer-friendly "evaluation" ROMs for retail phones like the Nexus S?
Did you consider using a development board for development and testing? That for sure is more comfortable and affordable solution for sustainable development than a specific retail product (1). We at Linaro exactly work on making high-quality base software (including Android) available on low-cost devel boards, so different parties can develop ARM software more easily and with higher quality. That's also consistent with Linaro's aim of promoting ARM SoCs, not specific products. So, at this time we don't have plans to support retail devices, though that might change in the future.
That's also one area where community may kick in. On our side, we're considering what support we can provide for such different efforts, which may include providing code hosting, using our build system, validation farm, etc.
(1) Though I for sure agree that it would be nice to have Linaro supported and optimized build for some retail device. Except that I happen not to have Nexus S, so would vote for supporting my tablet instead ;-)
Thanks, Taras
On Thu, Aug 04, 2011 at 12:03:00PM -0700, Taras Glek wrote:
Recently we have been looking at how to squeeze more performance out of our toolchain for building Firefox on Android. Mike Hommey integrated GCC 4.6 into the android NDK and has been testing performance (with mixed results http://gcc.gnu.org/ml/gcc/2011-08/msg00096.html).
You should definitely be trying to build using the Linaro 4.5 and 4.6 compiler branches; they are pretty much guaranteed to give you better performance, and if they don't, we're on the hook to fix it quickly! All the patches go upstream, so there is no risk of you being stuck on a fork -- it just makes everything you need available right now.
I'm copying the linaro-toolchain list to make sure that you get the right people's attention (though if they weren't all coming back from Connect in Cambridge this week they would have picked the email up already).
I like how Linaro is doing regular arm benchmarking, ie https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-0...
We do much more than that, but it's not as easy to find right now; for instance http://ex.seabright.co.nz/helpers/benchcompare is Michael's regular release benchmark.
. Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
Totally. Let's do it. Can you give me an idea of what boards you are testing the build on today? Do you have a test suite that we could run in a reasonable timeframe (hours, not days)?
We are also looking at setting a developer-friendly android ROM with oprofile, perf, systemtap, gdb, debug symbols, etc. It might even be beneficial for us to use newer kernels as we exlore options like kernel-assisted ld.so relocations, etc. That seems to similar to what Linaro provides in the evaluation ROMS. Is there any chance of Linaro providing developer-friendly "evaluation" ROMs for retail phones like the Nexus S?
It's indeed pretty similar (we just call them LEBs), and Zach will be really interested in working with you on this.
As for supporting actual released phones, it lies somewhat outside of our optimal operating model, and we don't have any hardware available. I guess we could do a spin for a specific model if we had enough of them to use by a set of engineers in the different teams. They are so expensive, though. Do you guys have lots of them?
On 08/09/2011 02:47 PM, Christian Robottom Reis wrote:
On Thu, Aug 04, 2011 at 12:03:00PM -0700, Taras Glek wrote:
Recently we have been looking at how to squeeze more performance out of our toolchain for building Firefox on Android. Mike Hommey integrated GCC 4.6 into the android NDK and has been testing performance (with mixed results http://gcc.gnu.org/ml/gcc/2011-08/msg00096.html).
You should definitely be trying to build using the Linaro 4.5 and 4.6 compiler branches; they are pretty much guaranteed to give you better performance, and if they don't, we're on the hook to fix it quickly! All the patches go upstream, so there is no risk of you being stuck on a fork -- it just makes everything you need available right now.
Mike, can you give these a try?
I'm copying the linaro-toolchain list to make sure that you get the right people's attention (though if they weren't all coming back from Connect in Cambridge this week they would have picked the email up already).
I like how Linaro is doing regular arm benchmarking, ie https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-0...
We do much more than that, but it's not as easy to find right now; for instance http://ex.seabright.co.nz/helpers/benchcompare is Michael's regular release benchmark.
Link gives me an error page
. Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
Totally. Let's do it. Can you give me an idea of what boards you are testing the build on today? Do you have a test suite that we could run in a reasonable timeframe (hours, not days)?
We have various benchmarks, all of which run within 30-60min.
We are also looking at setting a developer-friendly android ROM with oprofile, perf, systemtap, gdb, debug symbols, etc. It might even be beneficial for us to use newer kernels as we exlore options like kernel-assisted ld.so relocations, etc. That seems to similar to what Linaro provides in the evaluation ROMS. Is there any chance of Linaro providing developer-friendly "evaluation" ROMs for retail phones like the Nexus S?
It's indeed pretty similar (we just call them LEBs), and Zach will be really interested in working with you on this.
As for supporting actual released phones, it lies somewhat outside of our optimal operating model, and we don't have any hardware available. I guess we could do a spin for a specific model if we had enough of them to use by a set of engineers in the different teams. They are so expensive, though. Do you guys have lots of them?
How many devices would you need?
We actually have two separate needs. We use developer boards for continuous integration and phones for development. We need a good quality rom for both. Perhaps we can switch our boards over to pandaboards.
Is there someone I can call(and what time) to discuss this further?
Taras
On Tue, Aug 09, 2011 at 03:08:53PM -0700, Taras Glek wrote:
You should definitely be trying to build using the Linaro 4.5 and 4.6 compiler branches; they are pretty much guaranteed to give you better performance, and if they don't, we're on the hook to fix it quickly! All the patches go upstream, so there is no risk of you being stuck on a fork -- it just makes everything you need available right now.
Mike, can you give these a try?
Also, the 4.5 branch is likely to perform better for a while; once 4.6 has caught up 4.5 goes into stable mode.
If you are on a platform with NEON (i.e. not a Tegra2) experimenting with on -O3 and -mfpu=neon might get some interesting results as that enables the NEON auto-vectorizing Ira has been working on; given it's Mozilla, you might also get an ICE though ;-) Tell us how it goes.
I'm copying the linaro-toolchain list to make sure that you get the right people's attention (though if they weren't all coming back from Connect in Cambridge this week they would have picked the email up already).
I like how Linaro is doing regular arm benchmarking, ie https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-0...
We do much more than that, but it's not as easy to find right now; for instance http://ex.seabright.co.nz/helpers/benchcompare is Michael's regular release benchmark.
Link gives me an error page
It so happens the owner of the page is on vacation this week. I'll make sure he fixes it and points you to a running URL when it's done.
Totally. Let's do it. Can you give me an idea of what boards you are testing the build on today? Do you have a test suite that we could run in a reasonable timeframe (hours, not days)?
We have various benchmarks, all of which run within 30-60min.
Cool. Can you share a subset you'd like to see run as a starting point?
As for supporting actual released phones, it lies somewhat outside of our optimal operating model, and we don't have any hardware available. I guess we could do a spin for a specific model if we had enough of them to use by a set of engineers in the different teams. They are so expensive, though. Do you guys have lots of them?
How many devices would you need?
5 phones would give me confidence that we're not going to be stuck throughout the process of getting a build going. So that's the general order of magnitude.
We actually have two separate needs. We use developer boards for continuous integration and phones for development. We need a good quality rom for both. Perhaps we can switch our boards over to pandaboards.
Is there someone I can call(and what time) to discuss this further?
You can call me at +44 7595 200905 any time this week; now isn't a bad time, but if I do miss your call I'll be sure to ring you back. You can also call Zach on Thursday or Friday; I'll send his number separately.
On Tue, Aug 09, 2011 at 07:32:51PM -0300, Christian Robottom Reis wrote:
On Tue, Aug 09, 2011 at 03:08:53PM -0700, Taras Glek wrote:
You should definitely be trying to build using the Linaro 4.5 and 4.6 compiler branches; they are pretty much guaranteed to give you better performance, and if they don't, we're on the hook to fix it quickly! All the patches go upstream, so there is no risk of you being stuck on a fork -- it just makes everything you need available right now.
Mike, can you give these a try?
Also, the 4.5 branch is likely to perform better for a while; once 4.6 has caught up 4.5 goes into stable mode.
If you are on a platform with NEON (i.e. not a Tegra2) experimenting with on -O3 and -mfpu=neon might get some interesting results as that enables the NEON auto-vectorizing Ira has been working on; given it's Mozilla, you might also get an ICE though ;-) Tell us how it goes.
That's unfortunately not something that can work for us, except if we start distributing a different android build for NEON and non-NEON.
Mike
Mike Hommey wrote:
On Tue, Aug 09, 2011 at 07:32:51PM -0300, Christian Robottom Reis wrote:
On Tue, Aug 09, 2011 at 03:08:53PM -0700, Taras Glek wrote:
You should definitely be trying to build using the Linaro 4.5 and 4.6 compiler branches; they are pretty much guaranteed to give you better performance, and if they don't, we're on the hook to fix it quickly! All the patches go upstream, so there is no risk of you being stuck on a fork -- it just makes everything you need available right now.
Mike, can you give these a try?
Also, the 4.5 branch is likely to perform better for a while; once 4.6 has caught up 4.5 goes into stable mode.
If you are on a platform with NEON (i.e. not a Tegra2) experimenting with on -O3 and -mfpu=neon might get some interesting results as that enables the NEON auto-vectorizing Ira has been working on; given it's Mozilla, you might also get an ICE though ;-) Tell us how it goes.
That's unfortunately not something that can work for us, except if we start distributing a different android build for NEON and non-NEON.
you should consider that. Otherwise you will penalize future devices that will all have neon by forcing Tegra2 "legacy" support onto them.
. Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
Sorry about the delayed response. I did notice your mail last week but I was busy with our conference and then the first couple of days this week have just disappeared with some internal training.
I would be interested in hearing how you get on with LTO and FDO on ARM. Listening to Honza talking at the GCC unconference in London about the memory usage for full LTO with trunk I did wonder what would happen if we tried it on the ARM target to see what we got, but I never managed to get around to trying anything there :) . We did look at getting FDO working with Linaro GCC last cycle but there are still a couple of issues with PGO in Linaro GCC 4.5.
With respect to LTO , the one problem we have currently is that the Neon intrinsics aren't streamed out and streamed back in. So you might have a few issues if your code uses arm_neon.h . https://bugs.launchpad.net/gcc-linaro/+bug/823548 is an example of this problem. This was fixed upstream and we probably just need to backport that into our 4.6 tree. I've tried a backport this morning and I think I have this right finally.
If you could do a build and a firefox benchmark run in about 30-60 minutes by all means please do let us know how you get on and what you find. We've been steadily trying to improve the performance of the ARM toolchain and the biggest improvements you'll notice will be with the vectorizer but there will be other small improvements that you'll notice in other general areas of code generation. We would be interested in feedback about what can be done and to add to our queue of things to look at and improve for the ARM port of GCC.
With respect to the images, Kiko's probably answered that bit.
cheers Ramana
2011/8/10 Ramana Radhakrishnan ramana.radhakrishnan@linaro.org:
. Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
I would be interested in hearing how you get on with LTO and FDO on ARM. Listening to Honza talking at the GCC unconference in London about the memory usage for full LTO with trunk I did wonder what would happen if we tried it on the ARM target to see what we got, but I never managed to get around to trying anything there :) . We did look at getting FDO working with Linaro GCC last cycle but there are still a couple of issues with PGO in Linaro GCC 4.5.
FYI. The toolchain benchmark suite derived from Google already includes the FDO mode, and I would suggest to enable it for comparisons.
Android build system has (incomplete) FDO integration since Android 2.2[*]. In my experience, it sometimes helps the performance for special cases slightly.
Sincerely, -jserv
[*] The build system would perform "build-run-build" scheme with the help of ADB, which deploys the profiler on target. Option: BUILD_FDO_INSTRUMENT
On Wed, Aug 10, 2011 at 04:29:46PM +0100, Ramana Radhakrishnan wrote:
Sorry about the delayed response. I did notice your mail last week but I was busy with our conference and then the first couple of days this week have just disappeared with some internal training.
I would be interested in hearing how you get on with LTO and FDO on ARM. Listening to Honza talking at the GCC unconference in London about the memory usage for full LTO with trunk I did wonder what would happen if we tried it on the ARM target to see what we got, but I never managed to get around to trying anything there :) . We did look at getting FDO working with Linaro GCC last cycle but there are still a couple of issues with PGO in Linaro GCC 4.5.
I gave a try to Linaro GCC 4.5, and at first glance, it looks pretty much on par, performance-wise, with upstream GCC 4.6.1.
With respect to LTO , the one problem we have currently is that the Neon intrinsics aren't streamed out and streamed back in. So you might have a few issues if your code uses arm_neon.h . https://bugs.launchpad.net/gcc-linaro/+bug/823548 is an example of this problem. This was fixed upstream and we probably just need to backport that into our 4.6 tree. I've tried a backport this morning and I think I have this right finally.
I haven't tried LTO with Linaro GCC, but with upsteam GCC 4.6.1, I hit something like the following: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41159
I'm currently running some more FDO tests, so I'll have more on that later. From my attempts so far with a possibly more correct profile than my very first attempts, it looks like it's not so bad (it used to regress in my first attempts), but it doesn't look better either. The resulting binary is however much bigger.
Cheers,
Mike