HI Zach,
The BP ( https://blueprints.launchpad.net/linaro-android/+spec/linaro-android-use-gcc...) and Bug <goog_24859600>#822113https://bugs.launchpad.net/linaro-android/+bug/822113aim at improving android performance. I think we also need to balance the size and the performance improvement. I used the gcc benchmark tool to benchmark the performance with different configuration: -O3 for arm files only, -O3 for both arm files and thumb files and -fstrict-aliasing. The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization. Please note, the results are based on linaro_android_2.3.4 for panda and toolchain-4.6-1107. I will benchmark linaro_android_2.3.5 and toolchain-4.6-1108 if necessary when they are stable enough.
The image size increases significantly when -O3 is enabled for thumb files, however it does not look like performance has been improved as much as expected. Could you please let me know if you think it is worth building thumb files with -O3 regardless of size? Thanks.
Regards
On Fri, Aug 19, 2011 at 2:40 PM, Chao Yang chao.yang@linaro.org wrote:
HI Zach, The BP (https://blueprints.launchpad.net/linaro-android/+spec/linaro-android-use-gcc...) and Bug #822113 aim at improving android performance. I think we also need to balance the size and the performance improvement. I used the gcc benchmark tool to benchmark the performance with different configuration: -O3 for arm files only, -O3 for both arm files and thumb files and -fstrict-aliasing. The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization. Please note, the results are based on linaro_android_2.3.4 for panda and toolchain-4.6-1107. I will benchmark linaro_android_2.3.5 and toolchain-4.6-1108 if necessary when they are stable enough. The image size increases significantly when -O3 is enabled for thumb files, however it does not look like performance has been improved as much as expected. Could you please let me know if you think it is worth building thumb files with -O3 regardless of size? Thanks. Regards
Hi Chao. I'm a bit confused by your numbers. There is no significant difference between the performance or size numbers across the different options you tested, except the Thumb results which grew unexpectedly.
My experience is that Thumb-2 is typically 75 % of the size of ARM and 95 % of the speed, and that -O3 is significantly faster than -O2. I just ran a popular deeply embedded benchmark and found: * In Thumb-2 mode, -O3 is 4.3 % faster than -O2 and 122 % bigger (!) * At -O3, ARM mode is 12.4 % faster than Thumb-2 and 12.2 % bigger
This benchmark is a bit small which is why the code size blew out so much and the -O3 improvement is so small. I used the size of the .text section. bz2 compressing and taking the on disk size to more closely match your method gives: * -O3 is 86 % bigger than -O2 * ARM is 4.4 % bigger than Thumb-2
Is there something strange going on with your benchmarks or options?
-- Michael
On Fri, Aug 19, 2011 at 03:40:26AM +0100, Chao Yang wrote:
The image size increases significantly when -O3 is enabled for thumb files,
Size goes /up/ when enabling thumb? That's definitely unexpected.
On 19 August 2011 13:40, Christian Robottom Reis kiko@linaro.org wrote:
On Fri, Aug 19, 2011 at 03:40:26AM +0100, Chao Yang wrote:
The image size increases significantly when -O3 is enabled for thumb files,
Size goes /up/ when enabling thumb? That's definitely unexpected.
The size increas is probably more driven by O3 rather than Thumb I suspect. If there are where ARM state is smaller than Thumb state but we should look at those but then you've got to be comparing O3 and -marm and O3 and -mthumb to be comparing apples and apples.
cheers Ramana
Hi,
On 18 August 2011 19:40, Chao Yang chao.yang@linaro.org wrote:
The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization.
Interesting... I'd have expected getting rid of -fno-strict-aliasing gives the biggest performance boost, but I'd have expected O3 to be a bit more effective than it is... Are you sure the relevant parts actually use O3 (keep in mind that e.g. -O3 + -Os = -Os -- whatever is specified later (LOCAL_CFLAGS, ...) is effective)?
Could you add another combination to the benchmark? I'm curious about -O3 -fno-inline-functions (function inlining is always a bit of a 2-edged sword because of the code size increases...)
ttyl bero
HI Bero,
What I did was changing both Os and O2 to O3 in TARGET_linux-arm.mk. I did not change those O2/Os specified in each module internally. As there may be a reason for the module itself to specify the optimisation level. I think it is risky to change those. But I don't think it should be a big problem.
Thanks and regards Chao
On 19 August 2011 15:38, Bernhard Rosenkranzer < bernhard.rosenkranzer@linaro.org> wrote:
Hi,
On 18 August 2011 19:40, Chao Yang chao.yang@linaro.org wrote:
The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization.
Interesting... I'd have expected getting rid of -fno-strict-aliasing gives the biggest performance boost, but I'd have expected O3 to be a bit more effective than it is... Are you sure the relevant parts actually use O3 (keep in mind that e.g. -O3 + -Os = -Os -- whatever is specified later (LOCAL_CFLAGS, ...) is effective)?
Could you add another combination to the benchmark? I'm curious about -O3 -fno-inline-functions (function inlining is always a bit of a 2-edged sword because of the code size increases...)
ttyl bero
On Sat, Aug 20, 2011 at 2:50 AM, Chao Yang chao.yang@linaro.org wrote:
HI Bero, What I did was changing both Os and O2 to O3 in TARGET_linux-arm.mk. I did not change those O2/Os specified in each module internally. As there may be a reason for the module itself to specify the optimisation level. I think it is risky to change those. But I don't think it should be a big problem.
Do you have the build log? You could do a quick grep over it to see if the GCC command line is sane.
-- Michael
HI Michael,
The build log can be found at http://people.linaro.org/~chaoyang/shared_sources/build_2011-08-19_20-33.log
The benchmark wiki page https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization is updated on 1. Adding benchmark for -fno-inline-function option (the size is reduced a bit) 2. Replacing O2 with O3 in build/core/combo/select.mk (a bit better results)
Thanks and regards
On 21 August 2011 23:14, Michael Hope michael.hope@linaro.org wrote:
On Sat, Aug 20, 2011 at 2:50 AM, Chao Yang chao.yang@linaro.org wrote:
HI Bero, What I did was changing both Os and O2 to O3 in TARGET_linux-arm.mk. I
did
not change those O2/Os specified in each module internally. As there may
be
a reason for the module itself to specify the optimisation level. I think
it
is risky to change those. But I don't think it should be a big problem.
Do you have the build log? You could do a quick grep over it to see if the GCC command line is sane.
-- Michael
On Mon, Aug 22, 2011 at 11:31 PM, Chao Yang chao.yang@linaro.org wrote:
HI Michael, The build log can be found at http://people.linaro.org/~chaoyang/shared_sources/build_2011-08-19_20-33.log The benchmark wiki page https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization%C2%A0is updated on
- Adding benchmark for -fno-inline-function option (the size is reduced a
bit) 2. Replacing O2 with O3 in build/core/combo/select.mk (a bit better results)
I picked a command line at random and had a poke through it. There's a few interesting things:
It includes -fgcse-after-reload and -finline-functions at all levels. These are normally in -O3 only, which may be why the -O2 results are so similar to -O3.
It includes both -msoft-float and -mfloat-abi=softfp. softfp occurs second but you might want to remove the potentially conflicting -msoft-float.
It uses -fno-strict-aliasing which reduces the number of optimisations that can be done especially at high optimisation levels.
It uses -Wstrict-aliasing=2, i.e. turned down from the default -Wstrict-aliasing. I suggest removing the =2.
It uses -fno-inline-functions-called-once which turns off a common optimisation.
It uses -frename-registers and -frerun-cse-after-loop, which are normally part of -funroll-loops.
I recommend pulling out most of the -f flags and re-running at -O2 and -O3.
-- Michael