[RFC] ARM: add support for kernel-mode NEON intrinsics

List overview All Threads
Download

newer

older

gator profiling

Patch merge request for STE...

Ard Biesheuvel

17 May 2013 17 May '13

9:17 a.m.

The GCC support header for NEON intrinsics <arm_neon.h> cannot be included directly due to its dependency on <stdint.h>.

Add a header <asm/neon.h> that checks/tweaks the environment so <arm_neon.h> can be included without problems.

Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org --- arch/arm/include/asm/neon.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 arch/arm/include/asm/neon.h

diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h new file mode 100644 index 0000000..0f76dc3 --- /dev/null +++ b/arch/arm/include/asm/neon.h @@ -0,0 +1,44 @@ +/* + * linux/arch/arm/include/asm/neon.h + * + * Copyright (C) 2013 Linaro Ltd ard.biesheuvel@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef _ASM_NEON_H +#define _ASM_NEON_H + +/* + * The GCC support header file for NEON intrinsics, <arm_neon.h>, does an + * unconditional #include of <stdint.h>, assuming it will never be used outside + * a C99 conformant environment. Sadly, this is not the case for the kernel. + * The only dependencies <arm_neon.h> has on <stdint.h> are the + * uint[8|16|32|64]_t types, which the kernel defines in <linux/types.h>. + */ +#include <linux/types.h> + +/* + * The GCC option -ffreestanding prevents GCC's internal <stdint.h> from + * including the <stdint.h> system header, it will #include "stdint-gcc.h" + * instead. + */ +#if __STDC_HOSTED__ != 0 +#error You must compile with -ffreestanding to use NEON intrinsics +#endif + +/* + * The type uintptr_t is typedef'ed to __UINTPTR_TYPE__ by "stdint-gcc.h". + * However, the bare metal and GLIBC versions of GCC don't agree on the + * definition of __UINTPTR_TYPE__. Bare metal agrees with the kernel + * (unsigned long), but GCC for GLIBC uses 'unsigned int' instead. + */ +#ifdef __linux__ +#undef __UINTPTR_TYPE__ +#endif + +#include <arm_neon.h> + +#endif

-- 1.8.1.2

Show replies by date

Arnd Bergmann

17 May 17 May

12:18 p.m.

On Friday 17 May 2013, Ard Biesheuvel wrote:

...

The GCC support header for NEON intrinsics <arm_neon.h> cannot be included directly due to its dependency on <stdint.h>.

Add a header <asm/neon.h> that checks/tweaks the environment so <arm_neon.h> can be included without problems.

Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org

I wonder if it would be easier to just use the gcc built-ins or inline assembly rather than the intrinsics, if including the header is such a pain.

Arnd

Ard Biesheuvel

21 May 21 May

6:45 a.m.

On 17 May 2013 14:18, Arnd Bergmann arnd@arndb.de wrote:

...

I wonder if it would be easier to just use the gcc built-ins or inline assembly rather than the intrinsics, if including the header is such a pain.

Using the __builtins directly is not the way to go imo. The various intrinsic types also go by completely different __builtin names and the mapping of functions and parameters is not 1 to 1, so it is very cumbersome to use in that way. Also, we will be coding against the implementation and not against the published interface.

Some alternatives: - #define the double #include guard used by GCC's internal stdint.h (the one that does #include_next on the actual stdint.h) before including arm_neon.h. This is more of a hack but less of a kludge. - add an empty stdint.h - convince the GCC guys to supply some way of including the arm_neon.h header without doing any other includes (isn't it bad form anyway to include standard headers automatically if the user hasn't asked for it?)

In the mean time, inline asm is feasible for the particular implementation I am looking at.

Regards, Ard.

Nicolas Pitre

7:20 p.m.

On Tue, 21 May 2013, Ard Biesheuvel wrote:

...

On 17 May 2013 14:18, Arnd Bergmann arnd@arndb.de wrote:

...
I wonder if it would be easier to just use the gcc built-ins or inline assembly rather than the intrinsics, if including the header is such a pain.

Using the __builtins directly is not the way to go imo. The various intrinsic types also go by completely different __builtin names and the mapping of functions and parameters is not 1 to 1, so it is very cumbersome to use in that way. Also, we will be coding against the implementation and not against the published interface.

Some alternatives:

#define the double #include guard used by GCC's internal stdint.h

(the one that does #include_next on the actual stdint.h) before including arm_neon.h. This is more of a hack but less of a kludge.

add an empty stdint.h

convince the GCC guys to supply some way of including the arm_neon.h

header without doing any other includes (isn't it bad form anyway to include standard headers automatically if the user hasn't asked for it?)

In the mean time, inline asm is feasible for the particular implementation I am looking at.

The kernel already tries hard to be self contained and not depend on any external include files or libraries, may them be "standard" or not.

So if inline asm is easier and always right then I'm all for it.

Nicolas

Vladimir Murzin

20 May 20 May

4:19 p.m.

On Fri, May 17, 2013 at 11:17:07AM +0200, Ard Biesheuvel wrote:

...

The GCC support header for NEON intrinsics <arm_neon.h> cannot be included directly due to its dependency on <stdint.h>.

Add a header <asm/neon.h> that checks/tweaks the environment so <arm_neon.h> can be included without problems.

Hi Ard

AFAIK NEON intrinsics requires -mfpu=neon GCC option. I wonder if you have any approach to prevent code outside kernel_fpu_begin/kernel_fpu_end guards being optimized for VFP/NEON?

Personally I end up with two options while doing NEON optimized syndrome generation for RAID: * consolidate all NEON specific code into separate compilation unit, i.e. S-file * inline bare opcodes into C source

Vladimir Murzin

...

Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org

arch/arm/include/asm/neon.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 arch/arm/include/asm/neon.h

diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h new file mode 100644 index 0000000..0f76dc3 --- /dev/null +++ b/arch/arm/include/asm/neon.h @@ -0,0 +1,44 @@ +/*

linux/arch/arm/include/asm/neon.h

Copyright (C) 2013 Linaro Ltd ard.biesheuvel@linaro.org

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License version 2 as

published by the Free Software Foundation.

*/

+#ifndef _ASM_NEON_H +#define _ASM_NEON_H

+/*

The GCC support header file for NEON intrinsics, <arm_neon.h>, does an

unconditional #include of <stdint.h>, assuming it will never be used outside

a C99 conformant environment. Sadly, this is not the case for the kernel.

The only dependencies <arm_neon.h> has on <stdint.h> are the

uint[8|16|32|64]_t types, which the kernel defines in <linux/types.h>.

*/

+#include <linux/types.h>

+/*

The GCC option -ffreestanding prevents GCC's internal <stdint.h> from

including the <stdint.h> system header, it will #include "stdint-gcc.h"

instead.

*/

+#if __STDC_HOSTED__ != 0 +#error You must compile with -ffreestanding to use NEON intrinsics +#endif

+/*

The type uintptr_t is typedef'ed to __UINTPTR_TYPE__ by "stdint-gcc.h".

However, the bare metal and GLIBC versions of GCC don't agree on the

definition of __UINTPTR_TYPE__. Bare metal agrees with the kernel

(unsigned long), but GCC for GLIBC uses 'unsigned int' instead.

*/

+#ifdef __linux__ +#undef __UINTPTR_TYPE__ +#endif

+#include <arm_neon.h>

+#endif

1.8.1.2

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

Ard Biesheuvel

6:34 p.m.

On 20 May 2013 18:19, Vladimir Murzin murzin.v@gmail.com wrote:

...

AFAIK NEON intrinsics requires -mfpu=neon GCC option. I wonder if you have any approach to prevent code outside kernel_fpu_begin/kernel_fpu_end guards being optimized for VFP/NEON?

That is a good point. First of all, I don't think there is a point to compiling the whole kernel with -mfpu=neon, so setting the flag per compilation unit is probably a better approach (if you use -mfloat-abi=softfp, the resulting code should link fine against the other soft-float objects) Inside a compilation unit, it is a bit more tricky, as you correctly pointed out.

I am preparing a couple of patches that use kernel-mode NEON in various ways: - RAID 6 syndrome calculations using NEON intrinsics -> same as the Intel code, i.e., use a noinline function that uses the intrinsics and call it from another function that does the kernel_vfp_begin/end pair; as long as you don't perform any floating point arithmetic in the same compilation unit, the compiler shouldn't emit any other NEON/VFP code; - XOR_BLOCKS NEON implementation using -ftree-vectorize -> a bit trickier, but I think with a bit of care, the above approach should work as well; if not, having a separate compilation unit containing the vectorized functions and calling those from another unit that does the kernel_vfp_begin/end pair should do the trick; - bit sliced AES using NEON assembler -> no need so set the -mpfu=neon flag, you can just set '.fpu neon' in the asm code, and the assembler emits the code exactly as you typed it.

Regards, Ard.

...

Personally I end up with two options while doing NEON optimized syndrome generation for RAID:

consolidate all NEON specific code into separate compilation unit, i.e. S-file

inline bare opcodes into C source

Vladimir Murzin

...
Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org

arch/arm/include/asm/neon.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 arch/arm/include/asm/neon.h

diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h new file mode 100644 index 0000000..0f76dc3 --- /dev/null +++ b/arch/arm/include/asm/neon.h @@ -0,0 +1,44 @@ +/*

linux/arch/arm/include/asm/neon.h

Copyright (C) 2013 Linaro Ltd ard.biesheuvel@linaro.org

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License version 2 as

published by the Free Software Foundation.

*/

+#ifndef _ASM_NEON_H +#define _ASM_NEON_H

+/*

The GCC support header file for NEON intrinsics, <arm_neon.h>, does an

unconditional #include of <stdint.h>, assuming it will never be used outside

a C99 conformant environment. Sadly, this is not the case for the kernel.

The only dependencies <arm_neon.h> has on <stdint.h> are the

uint[8|16|32|64]_t types, which the kernel defines in <linux/types.h>.

*/

+#include <linux/types.h>

+/*

The GCC option -ffreestanding prevents GCC's internal <stdint.h> from

including the <stdint.h> system header, it will #include "stdint-gcc.h"

instead.

*/

+#if __STDC_HOSTED__ != 0 +#error You must compile with -ffreestanding to use NEON intrinsics +#endif

+/*

The type uintptr_t is typedef'ed to __UINTPTR_TYPE__ by "stdint-gcc.h".

However, the bare metal and GLIBC versions of GCC don't agree on the

definition of __UINTPTR_TYPE__. Bare metal agrees with the kernel

(unsigned long), but GCC for GLIBC uses 'unsigned int' instead.

*/

+#ifdef __linux__ +#undef __UINTPTR_TYPE__ +#endif

+#include <arm_neon.h>

+#endif

1.8.1.2

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

Vladimir Murzin

21 May 21 May

3:04 a.m.

On Mon, May 20, 2013 at 08:34:11PM +0200, Ard Biesheuvel wrote:

...

On 20 May 2013 18:19, Vladimir Murzin murzin.v@gmail.com wrote:

...
AFAIK NEON intrinsics requires -mfpu=neon GCC option. I wonder if you have any approach to prevent code outside kernel_fpu_begin/kernel_fpu_end guards being optimized for VFP/NEON?

That is a good point. First of all, I don't think there is a point to compiling the whole kernel with -mfpu=neon, so setting the flag per compilation unit is probably a better approach (if you use -mfloat-abi=softfp, the resulting code should link fine against the other soft-float objects) Inside a compilation unit, it is a bit more tricky, as you correctly pointed out.

I am preparing a couple of patches that use kernel-mode NEON in various ways:

RAID 6 syndrome calculations using NEON intrinsics -> same as the

Intel code, i.e., use a noinline function that uses the intrinsics and call it from another function that does the kernel_vfp_begin/end pair;

Hmmm.. Intel code uses inline asm for all MXX, SSE, SSE2 and AVX optimizations. PPC uses intrinsics for its ALTIVEC optimizations, may be you mean this?

...

as long as you don't perform any floating point arithmetic in the same compilation unit, the compiler shouldn't emit any other NEON/VFP code;

NEON optimization might be done on integer operations by GCC easily. It isn't a good practice to make assumptions, especially speaking about compiler - one day it might hurt you.

...

XOR_BLOCKS NEON implementation using -ftree-vectorize -> a bit

trickier, but I think with a bit of care, the above approach should work as well; if not, having a separate compilation unit containing the vectorized functions and calling those from another unit that does the kernel_vfp_begin/end pair should do the trick;

bit sliced AES using NEON assembler -> no need so set the -mpfu=neon

flag, you can just set '.fpu neon' in the asm code, and the assembler emits the code exactly as you typed it.

Intrinsics is a good technology which supposed to make life easier. From the kernel side perspective, I see we have to worry a lot while using them. I'll wait for your rest code.

Vladimir Murzin

...

Regards, Ard.

...
Personally I end up with two options while doing NEON optimized syndrome generation for RAID:

consolidate all NEON specific code into separate compilation unit, i.e. S-file

inline bare opcodes into C source

Vladimir Murzin

...
Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org

arch/arm/include/asm/neon.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 arch/arm/include/asm/neon.h

diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h new file mode 100644 index 0000000..0f76dc3 --- /dev/null +++ b/arch/arm/include/asm/neon.h @@ -0,0 +1,44 @@ +/*

linux/arch/arm/include/asm/neon.h

Copyright (C) 2013 Linaro Ltd ard.biesheuvel@linaro.org

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License version 2 as

published by the Free Software Foundation.

*/

+#ifndef _ASM_NEON_H +#define _ASM_NEON_H

+/*

The GCC support header file for NEON intrinsics, <arm_neon.h>, does an

unconditional #include of <stdint.h>, assuming it will never be used outside

a C99 conformant environment. Sadly, this is not the case for the kernel.

The only dependencies <arm_neon.h> has on <stdint.h> are the

uint[8|16|32|64]_t types, which the kernel defines in <linux/types.h>.

*/

+#include <linux/types.h>

+/*

The GCC option -ffreestanding prevents GCC's internal <stdint.h> from

including the <stdint.h> system header, it will #include "stdint-gcc.h"

instead.

*/

+#if __STDC_HOSTED__ != 0 +#error You must compile with -ffreestanding to use NEON intrinsics +#endif

+/*

The type uintptr_t is typedef'ed to __UINTPTR_TYPE__ by "stdint-gcc.h".

However, the bare metal and GLIBC versions of GCC don't agree on the

definition of __UINTPTR_TYPE__. Bare metal agrees with the kernel

(unsigned long), but GCC for GLIBC uses 'unsigned int' instead.

*/

+#ifdef __linux__ +#undef __UINTPTR_TYPE__ +#endif

+#include <arm_neon.h>

+#endif

1.8.1.2

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

4676

days inactive

4680

days old

linaro-kernel@lists.linaro.org

6 comments

participants

tags (0)

participants (4)

Ard Biesheuvel
Arnd Bergmann
Nicolas Pitre
Vladimir Murzin