NEON vectorization improvements - preliminary notes
IRAR at il.ibm.com
Wed Sep 15 13:23:12 BST 2010
I need to learn much more about ARM architecture, but I have some initial
Julian Brown <julian at codesourcery.com> wrote on 15/09/2010 11:37:21 AM:
> * automatic vector size selection (it's currently selected by command
> line switch)
> Generally (check assumption) I think that wider vectors may make inner
loops more efficient,
> but may increase the size of setup/teardown code (e.g. setup: increased
> increased insns for reduction ops). More importantly, sometimes larger
vectors may inhibit vectorization.
> We ideally want to calculate costs per vector-size per-loop (or per other
There is a patch http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00167.html
that was not committed to mainline (and I think not to vect256, but I am
not sure about that). This patch tries to vectorize for the wider option
unless it is impossible because of data dependence constraints.
I agree with that cost model approach.
> * ensure that all gcc vectorizer pattern names are implemented in the
> machine description (those that can be).
In my opinion we better concentrate on:
> * Conversly, perhaps identify NEON capabilities not covered by GCC
> patterns, and add them to gcc (e.g. vld2/vld3/vld4 insns)
Most of the existing vectorizer patterns were inspired by Altivec's
capabilities. I think our approach should originate from the architecture
and not the other way around. For example, I don't think we should spend
time on implementation of vect_extract_even/odd and
vect_interleave_high/low (even though they seem to match VUNZIP and VZIP),
when we have those amazing VLD2/3/4 and VST2/3/4 instructions.
> I've not even started on looking at:
> * loops with more than two basic blocks (caused by if statements
> (anything else?))
What do you mean by that? If-conversion improvements?
> Do you (Ira) have access to the ARM ISA docs detailing the NEON
I have "ARM® Architecture Reference Manual ARM®v7-A and ARM®v7-R edition".
> Julian[attachment "CS308-vectorization-improvements.txt" deleted by
> Ira Rosen/Haifa/IBM]
More information about the linaro-toolchain