Hi everyone,
When building a kernel with the Linaro ARM toolchain I have two
seemingly simple questions, however I have been getting some very
different advice depending on who I talk to and what I read online,
study in gits etc. Hardware specific optimizations are confusing and
hard to test in a kernel since it such a multi-purpose
conglomeration of code. I just want to make sure I am using the
correct general approach before moving forward with trying things
and testing. Our project is all about testing and researching ways
to increase kernel/Android performance, so please don't reply with
"just use an -O2 compilation and forget about it" unless you have
data you can provide that suggest that this will give better
performance than adding specific hardware compilation flags.
Hopefully this is the right crowd to ask, wasn't sure if I should
try the kernel or Android list. I posted in the NEON list, but my
message is the only one from December! That leaves me with little
hope there, so we'll see how it goes here. If anyone can help me
with part 1 or part 2, I would be delighted!
Background:
- 3.4.x Android kernel
- Qualcomm APQ8064 quad core CPU (Cortex A15-like SoC with
NEON/vfpv4 per core support).
- We are using the Linaro ARM toolchain 4.7.3 release 2012.11
on Linux (arm-linux-gnueabihf).
Part 1) Which hardware and floating point compiler flags are
recommended/applicable for the above mentioned SoC when building
kernel itself?
-mtune=cortex-a15 (is this really doing anything for us in the
tool-chain's current state?)
Which -mfpu flag and other associated flags should we use in the
Linaro 12.11 toolchain?
-mfpu=-neon-vfpv4
-mfpu=-vfpv4
-mfpu=-neon
-mvectorize-with-neon-quad
-funsafe-math-optimizations (is this required for -neon-vfpv4 and
-vfpv4 like we would use it for plain old -neon?)
Part 2) Next, which kernel Makefiles should be optimized using the
hardware specific flags from Q1? From my research thus far, this is
our current setup and we currently doing an -O2 build.
/Makefile:
KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes
-Wno-trigraphs \
-fno-strict-aliasing -fno-common \
-Werror-implicit-function-declaration \
-Wno-format-security \
-fno-delete-null-pointer-checks -mno-unaligned-access \
-march=armv7-a -mtune=cortex-a15 \
-fpredictive-commoning -fgcse-after-reload
-ftree-vectorize \
-fipa-cp-clone -fsingle-precision-constant -pipe \
-funswitch-loops -floop-interchange \
-floop-strip-mine -floop-block
CFLAGS_MODULE = (BLANK, but some say we should have flags
here)
AFLAGS_MODULE = (BLANK, but some say we should have flags
here)
LDFLAGS_MODULE =
CFLAGS_KERNEL = (BLANK, but some say we should have flags
here)
AFLAGS_KERNEL = (BLANK, but some say we should have flags
here)
/arch/arm/Makefile
arch-$(CONFIG_CPU_32v7) :=-D__LINUX_ARM_ARCH__=7 $(call
cc-option,-mtune=cortex-a15 -march=armv7-a -mfpu=neon-vfpv4
-ftree-vectorize -funsafe-math-optimizations,-march=armv7-a
-Wa$(comma)-march=armv7-a)
/arch/arm/vfp/Makefile
KBUILD_AFLAGS :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=neon-vfpv4
-ftree-vectorize -funsafe-math-optimizations)
If you can give any advice, it would be greatly appreciated.
Thanks and have a Happy New Year!