[gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)
dan at codesourcery.com
Thu Dec 2 13:56:28 UTC 2010
On Thu, Dec 02, 2010 at 10:54:32AM +0200, Ira Rosen wrote:
> On 1 December 2010 17:57, Daniel Jacobowitz <dan at codesourcery.com> wrote:
> > On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote:
> >> The meaning of the builtin (or maybe a new tree code would be better?)
> >> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the
> >> MEM_REFs, since we actually have three data accesses here, and
> >> something (builtin or tree code) to indicate the deinterleaving. Since
> >> the vectors are passed to the builtin, I don't think it's a problem if
> >> the statements get separated. When the expander sees the builtin, it
> >> has to remove the loads it created for the MEM_REFs and create a new
> >> "vector load multiple and deinterleave". Is that possible?
> > This is a problem I've struggled with before. My only caution is that
> > representing the MEM_REF's separately from the deinterleaving in the IR
> > allows all sorts of ways (many we haven't thought of yet) for them to
> > get separated, and there's no instruction to efficiently implement the
> > deinterleaving from registers. For instance, suppose a pseudo gets
> > propagated into the builtin and we can't find the MEM_REFs any more.
> > The resulting code could easily be worse than pre-vectorization.
> I see. So one builtin for everything, like
> vector_load_deinterleave (v0, v1, v2,..., stride,...)
> is our only option?
It's not the only option; the way you've described might work, too.
But yes, it's my opinion that a single builtin is less likely to
generate something the compiler can't recover from.
More information about the linaro-toolchain