Hi everyone,
    
    When building a kernel with the Linaro ARM toolchain I have two
    seemingly simple questions, however I have been getting some very
    different advice depending on who I talk to and what I read online,
    study in gits etc. Hardware specific optimizations are confusing and
    hard to test in a kernel since it such a multi-purpose
    conglomeration of code.  I just want to make sure I am using the
    correct general approach before moving forward with trying things
    and testing. Our project is all about testing and researching ways
    to increase kernel/Android performance, so please don't reply with
    "just use an -O2 compilation and forget about it" unless you have
    data you can provide that suggest that this will give better
    performance than adding specific hardware compilation flags. 
    
    Hopefully this is the right crowd to ask, wasn't sure if I should
    try the kernel or Android list. I posted in the NEON list, but my
    message is the only one from December! That leaves me with little
    hope there, so we'll see how it goes here. If anyone can help me
    with part 1 or part 2, I would be delighted!
    
    Background:
    
      -  3.4.x Android kernel
 
      - Qualcomm APQ8064 quad core CPU (Cortex A15-like SoC with
        NEON/vfpv4 per core support).
 
      -  We are using the Linaro ARM toolchain 4.7.3 release 2012.11
        on Linux (arm-linux-gnueabihf). 
 
    
    Part 1) Which hardware and floating point compiler flags are
    recommended/applicable for the above mentioned SoC when building
    kernel itself?
    
    -mtune=cortex-a15 (is this really doing anything for us in the
    tool-chain's current state?)
    
    Which -mfpu flag and other associated flags should we use in the
    Linaro 12.11 toolchain?
    -mfpu=-neon-vfpv4 
    -mfpu=-vfpv4
    -mfpu=-neon
    -mvectorize-with-neon-quad
     -funsafe-math-optimizations (is this required for -neon-vfpv4 and
    -vfpv4 like we would use it for plain old -neon?)
    
    Part 2) Next, which kernel Makefiles should be optimized using the
    hardware specific flags from Q1? From my research thus far, this is
    our current setup and we currently doing an -O2 build.
    
    /Makefile:
        KBUILD_CFLAGS   := -Wall -Wundef -Wstrict-prototypes
    -Wno-trigraphs \
               -fno-strict-aliasing -fno-common \
               -Werror-implicit-function-declaration \
               -Wno-format-security \
               -fno-delete-null-pointer-checks -mno-unaligned-access \
               -march=armv7-a -mtune=cortex-a15 \
               -fpredictive-commoning -fgcse-after-reload
    -ftree-vectorize \
               -fipa-cp-clone -fsingle-precision-constant -pipe \
               -funswitch-loops -floop-interchange \
               -floop-strip-mine -floop-block
         CFLAGS_MODULE   = (BLANK, but some say we should have flags
    here)    
        AFLAGS_MODULE   = (BLANK, but some say we should have flags
    here)     
        LDFLAGS_MODULE  = 
        CFLAGS_KERNEL    = (BLANK, but some say we should have flags
    here)
        AFLAGS_KERNEL    = (BLANK, but some say we should have flags
    here)
     
    /arch/arm/Makefile 
        arch-$(CONFIG_CPU_32v7)         :=-D__LINUX_ARM_ARCH__=7 $(call
    cc-option,-mtune=cortex-a15 -march=armv7-a -mfpu=neon-vfpv4
    -ftree-vectorize -funsafe-math-optimizations,-march=armv7-a
    -Wa$(comma)-march=armv7-a)
    
    /arch/arm/vfp/Makefile
    KBUILD_AFLAGS    :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=neon-vfpv4
    -ftree-vectorize -funsafe-math-optimizations)
    
    
    If you can give any advice, it would be greatly appreciated.
    
    Thanks and have a Happy New Year!