All,

 

In the below code, I tried few compiler options and got following observations:

 

1)      arm-linux-gnueabi-gcc -O2 -mcpu=cortex-a15 -mfpu=neon -ftree-vectorizer-verbose=6  -ftree-vectorize

 

Compiler throws following info messages:

 

foo.c:16: note: not vectorized: unsupported use in stmt.

foo.c:16: note: not vectorized: unsupported use in stmt.

foo.c:18: note: not vectorized: unsupported use in stmt.

foo.c:18: note: not vectorized: unsupported use in stmt.

 

 

2)      –O2 -mcpu=cortex-a15 -mfpu=neon

 

 

None of the generated code contains the NEON instructions. Code generated with case 1 is taking 3000 cycles, and code generated by option 2 is taking 2500 cycles.

 

Even if vectorization failed in case1, it should not generate more inefficient code than case 2. My belief was that the executables from both would take same cycles, any thing done for doing unsuccessful vectorization must be reverted if it did not succeed.

 

###################################################################

#define SIZE1 20

#define SIZE2 26

 

unsigned int array[SIZE1][SIZE2];

 

void  foo()

{

  unsigned int i,j;

  unsigned int max = 0;

 

  for(i = 0; i < SIZE1; i++)

  {

    for(j = 0; j < SIZE2; j++)

    {

      if (array[i][j] > max)

      {

        max = array[i][j];

        index = j;

      }

    }

  }

 

  printf("Max value: %u Index: %u\n", max, index);

}

 

 

 

Regards

RKS