Re: Input for an "optimized" slide

21 Aug 2011


      On Sat, Aug 20, 2011 at 7:13 AM, Zach Pfeffer zach.pfeffer@linaro.org wrote:
...
Thanks Bero. Sending this extremely useful information out to a wider audience.
Alex,
I think you're probably be very interested in this for your Mozilla work.
...
...
-O3
     * What is is, does, available on
-O3 enables several additional compiler optimizations such as tree
vectorizing and loop unswitching, and optimizes for speed over code
size somewhat more aggressively than -O2, e.g. by inlining all calls
to small static functions.
It is available on any platform supported by gcc.
...
OpenMP
     * What is is, does, available on
OpenMP is a simple API that makes it easier for a programmer to make
use of multi-core or multi-processor systems, e.g. by automatically
splitting marked loops into several threads.
Example:
#pragma omp parallel for
for(int i=0; i<100; i++)
   do_something(i);
Would use up to 100 threads to do its job.
It is available on plaforms supported by gcc that can use libgomp,
gcc's OpenMP library. This includes most platforms that support POSIX
threads - but -- initially -- not Android.
...
Loop parallelization
     * What is is, does, available on
Loop parallelization takes OpenMP a step further by automatically
determining which loops are suitable for "#pragma omp parallel for"
and similar constructs. This allows code that was written without
multiprocessing in mind (such as most code written specifically for
ARM platforms - multicore/SMP ARM systems are quite new) to take
advantage of multicore/SMP systems (to some extent) without having to
modify the code.
Compiler flag: -ftree-parallelize-loops=X (where X is the number of
threads to be optimized for - typically the number of CPU cores in the
target system)
Available on anything supported by gcc that has both libgomp and
graphite (incl. CLooG, PPL or ISL) - the original Android toolchain
has neither of those.
...
...and any other optimizations that you've done.
None of the following is enabled yet (but the support in the toolchain
is there now), but I'm planning to enable them step by step once we
have systems built w/ the new toolchain that actually boot:
binutils: --hash-style=gnu
   By default, ld creates SysV style hash tables for function tables
in shared libraries. With --hash-style=gnu, we switch to GNU style
hashes, making symbol lookup a lot faster. (details:
http://sourceware.org/ml/binutils/2006-10/msg00377.html)
Sorry, silly question, but does Android use the glibc dynamic linker?
If not, does its linker support other hash styles?
...
...
binutils: -Bsymbolic-functions
   Speed up the dynamic linker by binding references to global
functions in shared libraries where it is known that this doesn't
break things (it's safe for libraries that don't have any users trying
to override their symbols - it's probably safe to assume e.g. skia and
opengl could benefit).
(details: http://www.fkf.mpg.de/edv/docs/intel_composer/Documentation/en_US/compiler_f...)
binutils/gcc: -flto, -fwhole-program
   Link-Time Optimization - causes code to be optimized again at link
time, when the compiler knows what functions are called form what
parts of the code, what functions are only called with constant
parameters, etc.
gcc: -mtune=cortex-a9 (or whatever the actual target CPU is)
   The Android build system uses -march=arm-v7a, which is good -- but
it doesn't do any tuning for the specifc CPU type (e.g. cortex-a8 vs.
cortex-a9).
Good.  Using -march=armv7-a -mtune=cortex-a9 enables the Cortex-A8
fixups.  Using a -mcpu=cortex-a9 disables them which means your build
may not run on an A8.
...
...
gcc: -fvisibility-inlines-hidden
   Don't export C++ inline methods in shared libraries. Makes the
symbol table smaller, improving startup time and diskspace efficiency
gcc: -fstrict-aliasing -Werror=strict-aliasing
   Currently, Android uses -fno-strict-aliasing unconditionally for
thumb code, to work around some pieces of code that violate strict
aliasing rules. Using -Werror=strict-aliasing, we can determine what
pieces of code are affected, and fix them, or limit the use of
-fno-strict-aliasing to the specific files that need it - enabling the
rather useful strict-aliasing optimization for the rest of the build
gcc: Investigate Graphite optimizations that aren't even enabled at -O3:
  -fgraphite-identity -floop-block -floop-interchage
-floop-strip-mine -ftree-loop-distribution -ftree-loop-linear
Looks good.  I'd add SMS to the list as well:  first -fmodulo-sched,
then -fmodulo-sched -fmodulo-sched-allow-regmoves.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Input for an "optimized" slide