Hi Tom,

On 17 April 2013 21:13, Tom Gall <tom.gall@linaro.org> wrote:
When the clang .o is linked to the gcc/gcc+, I'm getting
/home/tgall/opencl/SNU/tmp2/cl_temp_1.tkl uses VFP register arguments,
/home/tgall/opencl/SNU/tmp2/cl_temp_1.o does not

This is pretty common. Clang assumes ARMv4 unless you're pretty specific about your core.


clang -mfloat-abi=hard -mfpu=neon -S -emit-llvm -x cl
-I/home/tgall/opencl/SNU/src/compiler/tools/clang/lib/Headers
-I/home/tgall/opencl/SNU/inc -include
/home/tgall/opencl/SNU/inc/comp/cl_kernel.h
/home/tgall/opencl/SNU/tmp2/cl_temp_1.cl -o
/home/tgall/opencl/SNU/tmp2/cl_temp_1.ll

What target triple do you see when you run:

$ head /home/tgall/opencl/SNU/tmp2/cl_temp_1.ll

If it's "arm-blah", then it'll default to ARMv4. It has to be "armv7*" to default to Cortex-A8, but would be good to specify the CPU as well. It won't detect from the hardware you're in yet.

 
so first obvious question is -mfloat-abi=hard -mfpu=neon correct for clang?

Neither required, nor sufficient. ;)

When you chose your triple "armv7l-*" it'll default to A8, Neon, hard-float. If you specify hard-float and Neon, it won't default to A8 and the parameters will be ignored further in. It doesn't make sense, I agree, and it's a problem not just for cross-compilation, but native.

The best bet is to specify the triple AND the CPU, so that you're sure you're getting what you want:

$ clang -target arm-linux-gnueabihf -mcpu=cortex-a9 -mfpu=neon -mthumb

As you noticed, Thumb2 is not the default for Cortex-A*, but hard-float is. You can always see what hidden options you got by adding -v to the command line. Also, the triple here is "arm-*" but Clang will notice the A9 option and will change accordingly in the IR and pass the correct options to the assembler. If you do in two steps, you still have to pass it yourself, because "armv7-*" in the IR will turn out as Cortex-A8 by default.

Two other options that I encourage you to try:

-integrated-as : the experimental (on ARM) integrated assembler. You won't be using GAS, so if your code depends on GAS' idiosyncrasies, don't use this option.

-O3 : Apart from the usual, this will turn on auto-vectorization (like GCC), which is also kind of experimental. Just be aware of that.

Hope that helps,
--renato

PS: If you're cross compiling, you'll have to manually specify the include paths.