Hi Matt,

Thanks for sharing the information.

On Tue, Oct 9, 2012 at 3:51 PM, Matthew Gretton-Dann <matthew.gretton-dann@linaro.org> wrote:
On 9 October 2012 10:37, Jubi Taneja <jubitaneja@gmail.com> wrote:
> Hi All,
>
> I wanted to see the difference in objdump of an application where I can make
> the difference between the VFPV3 and VFPV4 support. I tried enabling the
> flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test
> code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o-
/tmp/fma.c -mfloat-abi=hard -O2 */
float f(float a, float b, float c)
{
  return a * b + c;
}
/* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example.  Which
one you want to use depends on whether you have configured your system
for hard or soft-float ABIs).

I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same assembly code. VMLA insn is emitted for both the cases. I was wondering if I can get any test case so that I may observe the difference in the two objdumps.
 
> According to my survey, the fused multiply and accumulate is the only
> instruction that can create the difference in two. Can any one provide the
> sample test code for the same? Precisely, I wish to see the difference in
> performance for vfpv3 and vfpv4.

I would be surprised if you see much difference at all.  VFPv3 has the
VMLA (non-fused multiply-accumulate) instruction, which does an extra
rounding-step,
Correct, I checked this.
 
but I expect will have similar performance
characteristics to VFMA.
Yes, since the assembly code are similar and they cannot make any performance difference as of now. 

Note that between -mfpu=vfpv3 and -mfpu=vfpv4 there is also
-mfpu=vfpv3-fp16 which added support for loading and storing
half-precision floating-point values.  Again this won't make a
performance difference unless you use half-precision as your storage
format.

I need to check this once.

Thanks,
Jubi 

Thanks,

Matt

--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann@linaro.org