Hi everyone,

When building a kernel with the Linaro ARM toolchain I have two seemingly simple questions, however I have been getting some very different advice depending on who I talk to and what I read online, study in gits etc. Hardware specific optimizations are confusing and hard to test in a kernel since it such a multi-purpose conglomeration of code.  I just want to make sure I am using the correct general approach before moving forward with trying things and testing. Our project is all about testing and researching ways to increase kernel/Android performance, so please don't reply with "just use an -O2 compilation and forget about it" unless you have data you can provide that suggest that this will give better performance than adding specific hardware compilation flags.

Hopefully this is the right crowd to ask, wasn't sure if I should try the kernel or Android list. I posted in the NEON list, but my message is the only one from December! That leaves me with little hope there, so we'll see how it goes here. If anyone can help me with part 1 or part 2, I would be delighted!

Background:
Part 1) Which hardware and floating point compiler flags are recommended/applicable for the above mentioned SoC when building kernel itself?

-mtune=cortex-a15 (is this really doing anything for us in the tool-chain's current state?)

Which -mfpu flag and other associated flags should we use in the Linaro 12.11 toolchain?
-mfpu=-neon-vfpv4
-mfpu=-vfpv4
-mfpu=-neon
-mvectorize-with-neon-quad
 -funsafe-math-optimizations (is this required for -neon-vfpv4 and -vfpv4 like we would use it for plain old -neon?)

Part 2) Next, which kernel Makefiles should be optimized using the hardware specific flags from Q1? From my research thus far, this is our current setup and we currently doing an -O2 build.

/Makefile:
    KBUILD_CFLAGS   := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
           -fno-strict-aliasing -fno-common \
           -Werror-implicit-function-declaration \
           -Wno-format-security \
           -fno-delete-null-pointer-checks -mno-unaligned-access \
           -march=armv7-a -mtune=cortex-a15 \
           -fpredictive-commoning -fgcse-after-reload -ftree-vectorize \
           -fipa-cp-clone -fsingle-precision-constant -pipe \
           -funswitch-loops -floop-interchange \
           -floop-strip-mine -floop-block
     CFLAGS_MODULE   = (BLANK, but some say we should have flags here)   
    AFLAGS_MODULE   = (BLANK, but some say we should have flags here)    
    LDFLAGS_MODULE  =
    CFLAGS_KERNEL    = (BLANK, but some say we should have flags here)
    AFLAGS_KERNEL    = (BLANK, but some say we should have flags here)
 
/arch/arm/Makefile
    arch-$(CONFIG_CPU_32v7)         :=-D__LINUX_ARM_ARCH__=7 $(call cc-option,-mtune=cortex-a15 -march=armv7-a -mfpu=neon-vfpv4 -ftree-vectorize -funsafe-math-optimizations,-march=armv7-a -Wa$(comma)-march=armv7-a)

/arch/arm/vfp/Makefile
KBUILD_AFLAGS    :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=neon-vfpv4 -ftree-vectorize -funsafe-math-optimizations)


If you can give any advice, it would be greatly appreciated.

Thanks and have a Happy New Year!