Hi everyone,
When building a kernel I have two seemingly simple questions,
however I have been getting some very different advice depending on
who I talk to and what I read online, study in gits etc. Hardware
specific optimizations are confusing and hard to test in a kernel
since it such a multi-purpose conglomeration of code. I just want
to make sure I am using the correct general approach before moving
forward with trying things and testing. Our project is all about
testing and researching ways to increase kernel/Android performance,
so please don't reply with "just use an -O2 compilation and forget
about it" unless you have data you can provide that suggest that
this will give better performance than adding specific hardware
compilation flags.
Hopefully this is the right crowd to ask, wasn't sure if I should
try the kernel or Android lists? We'll see how it goes here first
since it seems applicable.
Background:
- 3.4.x Android kernel
- Qualcomm APQ8064 quad core CPU (Cortex A15-like SoC with
NEON/vfpv4 per core support).
- We are using the Linaro ARM toolchain 4.7.3 release 2012.11
on Linux (arm-linux-gnueabihf).
Part 1) Which hardware and floating point compiler flags are
recommended/applicable for the above mentioned SoC when building
kernel itself?
-mtune=cortex-a15 (is this really doing anything for us in the
tool-chain's current state?)
Which -mfpu flag and other associated flags should we use in the
Linaro 12.11 toolchain?
-mfpu=-neon-vfpv4
-mfpu=-vfpv4
-mfpu=-neon
-mvectorize-with-neon-quad
-funsafe-math-optimizations (is this required for -neon-vfpv4 and
-vfpv4 like we would use it for plain old -neon?)
Part 2) Next, which kernel Makefiles should be optimized using the
hardware specific flags from Q1? From my research thus far, this is
our current setup and we currently doing an -O2 build.
/Makefile:
KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes
-Wno-trigraphs \
-fno-strict-aliasing -fno-common \
-Werror-implicit-function-declaration \
-Wno-format-security \
-fno-delete-null-pointer-checks -mno-unaligned-access \
-march=armv7-a -mtune=cortex-a15 \
-fpredictive-commoning -fgcse-after-reload
-ftree-vectorize \
-fipa-cp-clone -fsingle-precision-constant -pipe \
-funswitch-loops -floop-interchange \
-floop-strip-mine -floop-block
CFLAGS_MODULE = (BLANK, but some say we should have flags
here)
AFLAGS_MODULE = (BLANK, but some say we should have flags
here)
LDFLAGS_MODULE =
CFLAGS_KERNEL = (BLANK, but some say we should have flags
here)
AFLAGS_KERNEL = (BLANK, but some say we should have flags
here)
/arch/arm/Makefile
arch-$(CONFIG_CPU_32v7) :=-D__LINUX_ARM_ARCH__=7 $(call
cc-option,-mtune=cortex-a15 -march=armv7-a -mfpu=neon-vfpv4
-ftree-vectorize -funsafe-math-optimizations,-march=armv7-a
-Wa$(comma)-march=armv7-a)
/arch/arm/vfp/Makefile
KBUILD_AFLAGS :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=neon-vfpv4
-ftree-vectorize -funsafe-math-optimizations)
If you can give any advice, it would be greatly appreciated.
Thanks and have a Happy New Year!