Hi there,
On Thu, Jul 8, 2010 at 7:44 PM, Hector Oron hector.oron@gmail.com wrote:
Hello,
2010/7/8, JD Zheng entropy.zjd@gmail.com:
I don't quite get why A9 won't gain much by using hard float.
Because A9 benefits from `softfp` which it is compatible with `soft`. In theory, hard floating point (incompatible with soft*) should not be much of a win over softfp on A9 cores which much better structured pipeline.
Regarding this discussion, I stongly advocate getting some benchmarks --- we should be careful about drawing conclusions like "won't be much of a win on A9" without some quantification.
For all v7 processors (A8, A9, etc.) the hardfp ABI will increase the register bandwidth for funtion calls. In some cases of floating-point intensive code, the increase will be substantial. For VFPv2 or VFPv3-D16:
* Up to 8 double-precision arguments, or 16 single-precision arguments can be passed in fp registers, in addition to the usual limit of up to 4 integer or pointer arguments in the integer regs. This can eliminate many instructions at call sites and can reduce stack frame size and cache footprint, particularly in and around leaf functions. For C++ the benefit increases again due to the precense of 'this' as an implicit first argument in member functions: a C++ member function with a single explicit double argument will use r0 for the 'this' pointer and r1 will be wasted because double arguments must be padded to an even-numbered register in the register bank. So hardfp could allow up to three extra integer/pointer arguments to be moved from the stack into registers in such cases. * A floating-point result can be returned in an fp register and used directly by the caller * Moving values between the floating-point and integer pipelines can be reduced. This is a benefit on all processors, particularly for floating->integer moves, but as discussed previously the benefit is significantly greater on A8 than it is on A9.
One particular issue we have is that the toolchain cannot easily handle intermixing of multiple ABIs, so it isn't straightforward to use a different ABI (hard) internally to a library or shared object compared with the ABI (softfp) used at the public interface. This means that some libs which may get significant benefit from the hard fp ABI to accelerate internal function calls (such as libm, as well as any computational library) cannot be built using the hard fp ABI internally without doing significant work, unless the whole system is built with hard fp. It's certainly not something we can achieve by simply using dififerent build options for targeted libraries, as can be done for NEON optimisations for example.
Judging how much these changes will improve the performance of real-world code, and how the improvement compares on A9 versus A8, is difficult without doing some benchmarking though.
Cheers ---Dave