Vectorised copy

Michael Hope michael.hope at linaro.org
Mon Aug 29 03:14:45 UTC 2011


While out benchmarking today, I ran across code similar to this:

int *a;
int *b;
int *c;

const int ad[320];
const int bd[320];
const int cd[320];

void fill()
{
  for (int i = 0; i < 320; i++)
    {
      a[i] = ad[i];
      b[i] = bd[i];
      c[i] = cd[i];
    }
}

I was surprised and happy to see the vectoriser kick in for the copy.
The inner loop looks like:

	add	r5, r3, ip
	adds	r4, r3, r7
	vldmia	r2!, {d16-d17}
	vldmia	r1!, {d18-d19}
	adds	r0, r3, r6
	vst1.32	{q9}, [r5]
	vst1.32	{q8}, [r4]
	vldmia	r3, {d16-d17}
	adds	r3, r3, #16
	cmp	r3, r8
	vst1.32	{q8}, [r0]
	bne	.L3

so r3 is the loop variable and {ip,r7} are the offsets from r3 to the
destination pointers.  Adding a __restrict doesn't change the code.

Richard, will your auto-inc/dec changes combine the final vldmia r3,
add r3 into a vldmia r3! ?

Changing the int *a into in-file arrays like int a[320] gives:

	vldmia	r0!, {d16-d17}
	vldmia	r5!, {d18-d19}
	vstmia	r4!, {d18-d19}
	vstmia	r1!, {d16-d17}
	vldmia	r2!, {d16-d17}
	vstmia	r3!, {d16-d17}
	cmp	r3, r6
	bne	.L2

Marking them as extern int a[320] goes back to the first form.

Can we always use the second form?  What optimisation is preventing it?

-- Michael



More information about the linaro-toolchain mailing list