On Sat, Aug 20, 2011 at 7:13 AM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
Thanks Bero. Sending this extremely useful information out to a wider audience.
Alex,
I think you're probably be very interested in this for your Mozilla work.
-O3 * What is is, does, available on
-O3 enables several additional compiler optimizations such as tree vectorizing and loop unswitching, and optimizes for speed over code size somewhat more aggressively than -O2, e.g. by inlining all calls to small static functions. It is available on any platform supported by gcc.
OpenMP * What is is, does, available on
OpenMP is a simple API that makes it easier for a programmer to make use of multi-core or multi-processor systems, e.g. by automatically splitting marked loops into several threads. Example:
#pragma omp parallel for for(int i=0; i<100; i++) do_something(i);
Would use up to 100 threads to do its job.
It is available on plaforms supported by gcc that can use libgomp, gcc's OpenMP library. This includes most platforms that support POSIX threads - but -- initially -- not Android.
Loop parallelization * What is is, does, available on
Loop parallelization takes OpenMP a step further by automatically determining which loops are suitable for "#pragma omp parallel for" and similar constructs. This allows code that was written without multiprocessing in mind (such as most code written specifically for ARM platforms - multicore/SMP ARM systems are quite new) to take advantage of multicore/SMP systems (to some extent) without having to modify the code.
Compiler flag: -ftree-parallelize-loops=X (where X is the number of threads to be optimized for - typically the number of CPU cores in the target system)
Available on anything supported by gcc that has both libgomp and graphite (incl. CLooG, PPL or ISL) - the original Android toolchain has neither of those.
...and any other optimizations that you've done.
None of the following is enabled yet (but the support in the toolchain is there now), but I'm planning to enable them step by step once we have systems built w/ the new toolchain that actually boot:
binutils: --hash-style=gnu By default, ld creates SysV style hash tables for function tables in shared libraries. With --hash-style=gnu, we switch to GNU style hashes, making symbol lookup a lot faster. (details: http://sourceware.org/ml/binutils/2006-10/msg00377.html)
Sorry, silly question, but does Android use the glibc dynamic linker? If not, does its linker support other hash styles?
binutils: -Bsymbolic-functions Speed up the dynamic linker by binding references to global functions in shared libraries where it is known that this doesn't break things (it's safe for libraries that don't have any users trying to override their symbols - it's probably safe to assume e.g. skia and opengl could benefit). (details: http://www.fkf.mpg.de/edv/docs/intel_composer/Documentation/en_US/compiler_f...)
binutils/gcc: -flto, -fwhole-program Link-Time Optimization - causes code to be optimized again at link time, when the compiler knows what functions are called form what parts of the code, what functions are only called with constant parameters, etc.
gcc: -mtune=cortex-a9 (or whatever the actual target CPU is) The Android build system uses -march=arm-v7a, which is good -- but it doesn't do any tuning for the specifc CPU type (e.g. cortex-a8 vs. cortex-a9).
Good. Using -march=armv7-a -mtune=cortex-a9 enables the Cortex-A8 fixups. Using a -mcpu=cortex-a9 disables them which means your build may not run on an A8.
gcc: -fvisibility-inlines-hidden Don't export C++ inline methods in shared libraries. Makes the symbol table smaller, improving startup time and diskspace efficiency
gcc: -fstrict-aliasing -Werror=strict-aliasing Currently, Android uses -fno-strict-aliasing unconditionally for thumb code, to work around some pieces of code that violate strict aliasing rules. Using -Werror=strict-aliasing, we can determine what pieces of code are affected, and fix them, or limit the use of -fno-strict-aliasing to the specific files that need it - enabling the rather useful strict-aliasing optimization for the rest of the build
gcc: Investigate Graphite optimizations that aren't even enabled at -O3: -fgraphite-identity -floop-block -floop-interchage -floop-strip-mine -ftree-loop-distribution -ftree-loop-linear
Looks good. I'd add SMS to the list as well: first -fmodulo-sched, then -fmodulo-sched -fmodulo-sched-allow-regmoves.