Auto-detection of vector size for NEON
julian at codesourcery.com
Wed Nov 3 09:55:59 UTC 2010
On Mon, 1 Nov 2010 15:57:11 +0200
Ira Rosen <IRAR at il.ibm.com> wrote:
> It looks like it's enough to implement targetm.vectorize.
> autovectorize_vector_sizes for NEON in order to enable initial
> auto-detection of vector size. With the attached patch and
> -mvectorize-with-neon-quad flag, the vectorizer first tries to
> vectorize for 128 bit, and if this fails, it tries to vectorize for
> 64 bit. For example, in the attached testcase number of iterations is
> too small for 128 bit (first 2 iterations have to be peeled in order
> to align the array accesses), but is sufficient for 64 bit (the
> accesses are aligned here).
Cool, I hadn't found out about that hook yet (nor that
UNITS_PER_SIMD_WORD had disappeared in favour of
> I'd appreciate your comments on the patch, and I also have a few
> questions: 1. Why the default vector size is 64?
Ah, well, that was a decision made several years ago, and may no longer
be the optimal choice. There are several possible reasons:
1. NEON hardware available at the time (Cortex-A8) only processed data
in 64-bit chunks, so Q-reg operations weren't necessarily any faster
than D-reg operations (that may still be true).
2. Initially misaligned operations were not supported, so larger
vectors increase the chances that loop versioning would be required.
3. Generally, setup/tear-down code is assumed to be larger for wider
vectors, and ARM hardware generally isn't very well-endowed with cache.
The choice wasn't supported by benchmarking, since IIRC we didn't
have real hardware when we implemented initial NEON support, and
nobody's revisited the decision since.
> 2. Where is the place of NEON vectorization tests? I found NEON tests
> with intrinsics at gcc.target/arm, is that the right place?
Looks like a good place to me. It might be an idea to avoid putting
stuff in gcc.target/arm/neon, since most of those tests are
> 3. According to gcc.dg/vect/vect.exp the only flag that is used for
> NEON (in addition to target independent flags) is -ffast-math. Is
> that enough?
I'm not sure what you're asking here: there aren't any other
target-specific flags to tweak NEON behaviour. Generally to enable NEON
you need to set the float ABI and FPU variant (if they're not
configured-in to the compiler), e.g.:
-mfloat-abi=softfp/-mfloat-abi=hard -mfpu=neon* [-march=armv7-a]
* there are several variants for this, e.g. neon, neon-fp16,
neon-vfpv4 ... generally -mfpu=neon should do though, for Cortex-A8
chips at least.
> * config/arm/arm.c (arm_autovectorize_vector_sizes): New
> (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.
Looks fine to me (though I'm not an ARM maintainer, so can't approve it
upstream). It's a bit of a pity the hook is defined in such a way that
it doesn't support non-power-of-two sized vectors, but never mind...
I wonder if this allows us to remove -mvectorize-with-neon-quad
already (or, perhaps, wire it on but make the option a no-op, for
possible backward-compatibility reasons)?
More information about the linaro-toolchain