Vectorised copy
Michael Hope
michael.hope at linaro.org
Wed Sep 7 00:39:19 UTC 2011
On Wed, Sep 7, 2011 at 2:14 AM, Richard Sandiford
<richard.sandiford at linaro.org> wrote:
> Michael Hope <michael.hope at linaro.org> writes:
>> While out benchmarking today, I ran across code similar to this:
>>
>> int *a;
>> int *b;
>> int *c;
>>
>> const int ad[320];
>> const int bd[320];
>> const int cd[320];
>>
>> void fill()
>> {
>> for (int i = 0; i < 320; i++)
>> {
>> a[i] = ad[i];
>> b[i] = bd[i];
>> c[i] = cd[i];
>> }
>> }
>>
>> I was surprised and happy to see the vectoriser kick in for the copy.
>> The inner loop looks like:
>>
>> add r5, r3, ip
>> adds r4, r3, r7
>> vldmia r2!, {d16-d17}
>> vldmia r1!, {d18-d19}
>> adds r0, r3, r6
>> vst1.32 {q9}, [r5]
>> vst1.32 {q8}, [r4]
>> vldmia r3, {d16-d17}
>> adds r3, r3, #16
>> cmp r3, r8
>> vst1.32 {q8}, [r0]
>> bne .L3
>>
>> so r3 is the loop variable and {ip,r7} are the offsets from r3 to the
>> destination pointers. Adding a __restrict doesn't change the code.
>
> FWIW, this comes from ivopts. I raised the "problem" on gcc@
> a few months back, but it seems to be intentional behaviour:
>
> http://gcc.gnu.org/ml/gcc/2011-07/msg00050.html
>
> That is, all things being equal, the current code tends to prefer
> cases where it can hoist the difference between potential ivs
> rather than creating separate ivs.
>
> As far as the end of today's meeting goes: ivopts is one of those
> things on my unwritten list of areas that it would be nice to look at.
> I posted some benchmark comparing -fivopts with -fno-ivopts to the
> benchmark list in July. As expected, ivopts does help a lot cases,
> but there were also a fair number of cases where turning it off
> significantly improved performance.
Spawned into:
https://blueprints.launchpad.net/gcc-linaro/+spec/investigate-ivopts
-- Michael
More information about the linaro-toolchain
mailing list