Input for an "optimized" slide
Michael Hope
michael.hope at linaro.org
Sun Aug 21 22:00:31 UTC 2011
On Sat, Aug 20, 2011 at 7:13 AM, Zach Pfeffer <zach.pfeffer at linaro.org> wrote:
> Thanks Bero. Sending this extremely useful information out to a wider audience.
>
> Alex,
>
> I think you're probably be very interested in this for your Mozilla work.
>
>>> -O3
>>> * What is is, does, available on
>>
>> -O3 enables several additional compiler optimizations such as tree
>> vectorizing and loop unswitching, and optimizes for speed over code
>> size somewhat more aggressively than -O2, e.g. by inlining all calls
>> to small static functions.
>> It is available on any platform supported by gcc.
>>
>>> OpenMP
>>> * What is is, does, available on
>>
>> OpenMP is a simple API that makes it easier for a programmer to make
>> use of multi-core or multi-processor systems, e.g. by automatically
>> splitting marked loops into several threads.
>> Example:
>>
>> #pragma omp parallel for
>> for(int i=0; i<100; i++)
>> do_something(i);
>>
>> Would use up to 100 threads to do its job.
>>
>>
>> It is available on plaforms supported by gcc that can use libgomp,
>> gcc's OpenMP library. This includes most platforms that support POSIX
>> threads - but -- initially -- not Android.
>>
>>
>>> Loop parallelization
>>> * What is is, does, available on
>>
>> Loop parallelization takes OpenMP a step further by automatically
>> determining which loops are suitable for "#pragma omp parallel for"
>> and similar constructs. This allows code that was written without
>> multiprocessing in mind (such as most code written specifically for
>> ARM platforms - multicore/SMP ARM systems are quite new) to take
>> advantage of multicore/SMP systems (to some extent) without having to
>> modify the code.
>>
>> Compiler flag: -ftree-parallelize-loops=X (where X is the number of
>> threads to be optimized for - typically the number of CPU cores in the
>> target system)
>>
>> Available on anything supported by gcc that has both libgomp and
>> graphite (incl. CLooG, PPL or ISL) - the original Android toolchain
>> has neither of those.
>>
>>> ...and any other optimizations that you've done.
>>
>> None of the following is enabled yet (but the support in the toolchain
>> is there now), but I'm planning to enable them step by step once we
>> have systems built w/ the new toolchain that actually boot:
>>
>> binutils: --hash-style=gnu
>> By default, ld creates SysV style hash tables for function tables
>> in shared libraries. With --hash-style=gnu, we switch to GNU style
>> hashes, making symbol lookup a lot faster. (details:
>> http://sourceware.org/ml/binutils/2006-10/msg00377.html)
Sorry, silly question, but does Android use the glibc dynamic linker?
If not, does its linker support other hash styles?
>> binutils: -Bsymbolic-functions
>> Speed up the dynamic linker by binding references to global
>> functions in shared libraries where it is known that this doesn't
>> break things (it's safe for libraries that don't have any users trying
>> to override their symbols - it's probably safe to assume e.g. skia and
>> opengl could benefit).
>> (details: http://www.fkf.mpg.de/edv/docs/intel_composer/Documentation/en_US/compiler_f/main_for/copts/common_options/option_bsymbolic_functions.htm)
>>
>> binutils/gcc: -flto, -fwhole-program
>> Link-Time Optimization - causes code to be optimized again at link
>> time, when the compiler knows what functions are called form what
>> parts of the code, what functions are only called with constant
>> parameters, etc.
>>
>> gcc: -mtune=cortex-a9 (or whatever the actual target CPU is)
>> The Android build system uses -march=arm-v7a, which is good -- but
>> it doesn't do any tuning for the specifc CPU type (e.g. cortex-a8 vs.
>> cortex-a9).
Good. Using -march=armv7-a -mtune=cortex-a9 enables the Cortex-A8
fixups. Using a -mcpu=cortex-a9 disables them which means your build
may not run on an A8.
>> gcc: -fvisibility-inlines-hidden
>> Don't export C++ inline methods in shared libraries. Makes the
>> symbol table smaller, improving startup time and diskspace efficiency
>>
>> gcc: -fstrict-aliasing -Werror=strict-aliasing
>> Currently, Android uses -fno-strict-aliasing unconditionally for
>> thumb code, to work around some pieces of code that violate strict
>> aliasing rules. Using -Werror=strict-aliasing, we can determine what
>> pieces of code are affected, and fix them, or limit the use of
>> -fno-strict-aliasing to the specific files that need it - enabling the
>> rather useful strict-aliasing optimization for the rest of the build
>>
>> gcc: Investigate Graphite optimizations that aren't even enabled at -O3:
>> -fgraphite-identity -floop-block -floop-interchage
>> -floop-strip-mine -ftree-loop-distribution -ftree-loop-linear
Looks good. I'd add SMS to the list as well: first -fmodulo-sched,
then -fmodulo-sched -fmodulo-sched-allow-regmoves.
More information about the linaro-dev
mailing list