Hi everyone,
When building a kernel I have two seemingly simple questions, however I
have been getting some very different advice depending on who I talk to
and what I read online, study in gits etc. Hardware specific
optimizations are confusing and hard to test in a kernel since it such a
multi-purpose conglomeration of code. I just want to make sure I am
using the correct general approach before moving forward with trying
things and testing. Our project is all about testing and researching
ways to increase kernel/Android performance, so please don't reply with
"just use an -O2 compilation and forget about it" unless you have data
you can provide that suggest that this will give better performance than
adding specific hardware compilation flags.
Hopefully this is the right crowd to ask, wasn't sure if I should try
the kernel or Android lists? We'll see how it goes here first since it
seems applicable.
Background:
* 3.4.x Android kernel
* Qualcomm APQ8064 quad core CPU (Cortex A15-like SoC with NEON/vfpv4
per core support).
* We are using the Linaro ARM toolchain 4.7.3 release 2012.11 on Linux
(arm-linux-gnueabihf).
Part 1) Which hardware and floating point compiler flags are
recommended/applicable for the above mentioned SoC when building kernel
itself?
-mtune=cortex-a15 (is this really doing anything for us in the
tool-chain's current state?)
Which -mfpu flag and other associated flags should we use in the Linaro
12.11 toolchain?
-mfpu=-neon-vfpv4
-mfpu=-vfpv4
-mfpu=-neon
-mvectorize-with-neon-quad
-funsafe-math-optimizations (is this required for -neon-vfpv4 and
-vfpv4 like we would use it for plain old -neon?)
Part 2) Next, which kernel Makefiles should be optimized using the
hardware specific flags from Q1? From my research thus far, this is our
current setup and we currently doing an -O2 build.
/Makefile:
KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
-fno-strict-aliasing -fno-common \
-Werror-implicit-function-declaration \
-Wno-format-security \
-fno-delete-null-pointer-checks -mno-unaligned-access \
-march=armv7-a -mtune=cortex-a15 \
-fpredictive-commoning -fgcse-after-reload -ftree-vectorize \
-fipa-cp-clone -fsingle-precision-constant -pipe \
-funswitch-loops -floop-interchange \
-floop-strip-mine -floop-block
CFLAGS_MODULE = (BLANK, but some say we should have flags here)
AFLAGS_MODULE = (BLANK, but some say we should have flags here)
LDFLAGS_MODULE =
CFLAGS_KERNEL = (BLANK, but some say we should have flags here)
AFLAGS_KERNEL = (BLANK, but some say we should have flags here)
/arch/arm/Makefile
arch-$(CONFIG_CPU_32v7) :=-D__LINUX_ARM_ARCH__=7 $(call
cc-option,-mtune=cortex-a15 -march=armv7-a -mfpu=neon-vfpv4
-ftree-vectorize -funsafe-math-optimizations,-march=armv7-a
-Wa$(comma)-march=armv7-a)
/arch/arm/vfp/Makefile
KBUILD_AFLAGS :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=neon-vfpv4
-ftree-vectorize -funsafe-math-optimizations)
If you can give any advice, it would be greatly appreciated.
Thanks and have a Happy New Year!