Android gcc 4.6 1107 optimization benchmark

List overview All Threads
Download

newer

older

Generic Linux cross toolchain for...

Multistrap and required packages

Chao Yang

19 Aug 2011 19 Aug '11

2:40 a.m.

HI Zach,

The BP ( https://blueprints.launchpad.net/linaro-android/+spec/linaro-android-use-gcc...) and Bug <goog_24859600>#822113https://bugs.launchpad.net/linaro-android/+bug/822113aim at improving android performance. I think we also need to balance the size and the performance improvement. I used the gcc benchmark tool to benchmark the performance with different configuration: -O3 for arm files only, -O3 for both arm files and thumb files and -fstrict-aliasing. The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization. Please note, the results are based on linaro_android_2.3.4 for panda and toolchain-4.6-1107. I will benchmark linaro_android_2.3.5 and toolchain-4.6-1108 if necessary when they are stable enough.

The image size increases significantly when -O3 is enabled for thumb files, however it does not look like performance has been improved as much as expected. Could you please let me know if you think it is worth building thumb files with -O3 regardless of size? Thanks.

Regards

-- Chao Yang Android Platform Team Linaro.org │ Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro/155974581091106 http://twitter.com/#%21/linaroorg http://www.linaro.org/linaro-blog/

Attachments:

attachment.html (text/html — 2.8 KB)

Show replies by date

Michael Hope

19 Aug 19 Aug

4:31 a.m.

On Fri, Aug 19, 2011 at 2:40 PM, Chao Yang chao.yang@linaro.org wrote:

...

HI Zach, The BP (https://blueprints.launchpad.net/linaro-android/+spec/linaro-android-use-gcc...) and Bug #822113 aim at improving android performance. I think we also need to balance the size and the performance improvement. I used the gcc benchmark tool to benchmark the performance with different configuration: -O3 for arm files only, -O3 for both arm files and thumb files and -fstrict-aliasing. The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization. Please note, the results are based on linaro_android_2.3.4 for panda and toolchain-4.6-1107. I will benchmark linaro_android_2.3.5 and toolchain-4.6-1108 if necessary when they are stable enough. The image size increases significantly when -O3 is enabled for thumb files, however it does not look like performance has been improved as much as expected. Could you please let me know if you think it is worth building thumb files with -O3 regardless of size? Thanks. Regards

Hi Chao. I'm a bit confused by your numbers. There is no significant difference between the performance or size numbers across the different options you tested, except the Thumb results which grew unexpectedly.

My experience is that Thumb-2 is typically 75 % of the size of ARM and 95 % of the speed, and that -O3 is significantly faster than -O2. I just ran a popular deeply embedded benchmark and found: * In Thumb-2 mode, -O3 is 4.3 % faster than -O2 and 122 % bigger (!) * At -O3, ARM mode is 12.4 % faster than Thumb-2 and 12.2 % bigger

This benchmark is a bit small which is why the code size blew out so much and the -O3 improvement is so small. I used the size of the .text section. bz2 compressing and taking the on disk size to more closely match your method gives: * -O3 is 86 % bigger than -O2 * ARM is 4.4 % bigger than Thumb-2

Is there something strange going on with your benchmarks or options?

-- Michael

Christian Robottom Reis

12:40 p.m.

On Fri, Aug 19, 2011 at 03:40:26AM +0100, Chao Yang wrote:

...

The image size increases significantly when -O3 is enabled for thumb files,

Size goes /up/ when enabling thumb? That's definitely unexpected.

-- Christian Robottom Reis, Engineering VP Brazil (GMT-3) | [+55] 16 9112 6430 | [+1] 612 216 4935 Linaro.org: Open Source Software for ARM SoCs

Ramana Radhakrishnan

1:37 p.m.

On 19 August 2011 13:40, Christian Robottom Reis kiko@linaro.org wrote:

...

On Fri, Aug 19, 2011 at 03:40:26AM +0100, Chao Yang wrote:

...
The image size increases significantly when -O3 is enabled for thumb files,

Size goes /up/ when enabling thumb? That's definitely unexpected.

The size increas is probably more driven by O3 rather than Thumb I suspect. If there are where ARM state is smaller than Thumb state but we should look at those but then you've got to be comparing O3 and -marm and O3 and -mthumb to be comparing apples and apples.

cheers Ramana

Bernhard Rosenkranzer

2:38 p.m.

Hi,

On 18 August 2011 19:40, Chao Yang chao.yang@linaro.org wrote:

...

The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization.

Interesting... I'd have expected getting rid of -fno-strict-aliasing gives the biggest performance boost, but I'd have expected O3 to be a bit more effective than it is... Are you sure the relevant parts actually use O3 (keep in mind that e.g. -O3 + -Os = -Os -- whatever is specified later (LOCAL_CFLAGS, ...) is effective)?

Could you add another combination to the benchmark? I'm curious about -O3 -fno-inline-functions (function inlining is always a bit of a 2-edged sword because of the code size increases...)

ttyl bero

Chao Yang

2:50 p.m.

HI Bero,

What I did was changing both Os and O2 to O3 in TARGET_linux-arm.mk. I did not change those O2/Os specified in each module internally. As there may be a reason for the module itself to specify the optimisation level. I think it is risky to change those. But I don't think it should be a big problem.

Thanks and regards Chao

On 19 August 2011 15:38, Bernhard Rosenkranzer < bernhard.rosenkranzer@linaro.org> wrote:

...

Hi,

On 18 August 2011 19:40, Chao Yang chao.yang@linaro.org wrote:

...
The results can be found at https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization.

Interesting... I'd have expected getting rid of -fno-strict-aliasing gives the biggest performance boost, but I'd have expected O3 to be a bit more effective than it is... Are you sure the relevant parts actually use O3 (keep in mind that e.g. -O3 + -Os = -Os -- whatever is specified later (LOCAL_CFLAGS, ...) is effective)?

Could you add another combination to the benchmark? I'm curious about -O3 -fno-inline-functions (function inlining is always a bit of a 2-edged sword because of the code size increases...)

ttyl bero

Michael Hope

21 Aug 21 Aug

10:14 p.m.

On Sat, Aug 20, 2011 at 2:50 AM, Chao Yang chao.yang@linaro.org wrote:

...

HI Bero, What I did was changing both Os and O2 to O3 in TARGET_linux-arm.mk. I did not change those O2/Os specified in each module internally. As there may be a reason for the module itself to specify the optimisation level. I think it is risky to change those. But I don't think it should be a big problem.

Do you have the build log? You could do a quick grep over it to see if the GCC command line is sane.

-- Michael

Chao Yang

22 Aug 22 Aug

11:31 a.m.

HI Michael,

The build log can be found at http://people.linaro.org/~chaoyang/shared_sources/build_2011-08-19_20-33.log

The benchmark wiki page https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization is updated on 1. Adding benchmark for -fno-inline-function option (the size is reduced a bit) 2. Replacing O2 with O3 in build/core/combo/select.mk (a bit better results)

Thanks and regards

On 21 August 2011 23:14, Michael Hope michael.hope@linaro.org wrote:

...

On Sat, Aug 20, 2011 at 2:50 AM, Chao Yang chao.yang@linaro.org wrote:

...
HI Bero, What I did was changing both Os and O2 to O3 in TARGET_linux-arm.mk. I

did

...
not change those O2/Os specified in each module internally. As there may

be

...
a reason for the module itself to specify the optimisation level. I think

it

...
is risky to change those. But I don't think it should be a big problem.

Do you have the build log? You could do a quick grep over it to see if the GCC command line is sane.

-- Michael

Michael Hope

23 Aug 23 Aug

4:01 a.m.

On Mon, Aug 22, 2011 at 11:31 PM, Chao Yang chao.yang@linaro.org wrote:

...

HI Michael, The build log can be found at http://people.linaro.org/~chaoyang/shared_sources/build_2011-08-19_20-33.log The benchmark wiki page https://wiki.linaro.org/ChaoYang/Sandbox/gccoptimization%C2%A0is updated on

Adding benchmark for -fno-inline-function option (the size is reduced a

bit) 2. Replacing O2 with O3 in build/core/combo/select.mk (a bit better results)

I picked a command line at random and had a poke through it. There's a few interesting things:

It includes -fgcse-after-reload and -finline-functions at all levels. These are normally in -O3 only, which may be why the -O2 results are so similar to -O3.

It includes both -msoft-float and -mfloat-abi=softfp. softfp occurs second but you might want to remove the potentially conflicting -msoft-float.

It uses -fno-strict-aliasing which reduces the number of optimisations that can be done especially at high optimisation levels.

It uses -Wstrict-aliasing=2, i.e. turned down from the default -Wstrict-aliasing. I suggest removing the =2.

It uses -fno-inline-functions-called-once which turns off a common optimisation.

It uses -frename-registers and -frerun-cse-after-loop, which are normally part of -funroll-loops.

I recommend pulling out most of the -f flags and re-running at -O2 and -O3.

-- Michael

5063

days inactive

5067

days old

linaro-dev@lists.linaro.org

8 comments

participants

tags (0)

participants (5)

Bernhard Rosenkranzer
Chao Yang
Christian Robottom Reis
Michael Hope
Ramana Radhakrishnan