On Fri, Jul 9, 2010 at 6:21 PM, Mark Mitchell mark@codesourcery.com wrote:
Loďc Minier wrote:
For all v7 processors (A8, A9, etc.) the hardfp ABI will increase the register bandwidth for funtion calls. In some cases of floating-point intensive code, the increase will be substantial.
Yes; I agree, in any case, the hard-floating point will be superior, we don't know how much.
Like everyone else, I'd like to see numbers.
We can, however, to some extent scope the kind of programs that will benefit. To benefit, a program must have a call to a small function that takes floating-point arguments or returns a floating-point value in an inner loop. (It must be a small function, since otherwise the parameter-passing costs will be dwarfed by the function itself.) This is a relatively rare situation, but OpenGL or the like are probably examples of where this could be important. In many cases, making the small function "inline" may be a better solution than the hardfp ABI.
This is a fair point, although there are a good number of projects, which follow the "many tiny source files" approach and so where the compiler doesn't get many static functions to optimise and doesn't get much opportunity to inline. I believe libm is an example of this, but I'm prepered to be overridden...
From the user's point of view, it's not just code of this type that
will benefit, but anything that calls it --- in practice that's going to be a larger set of software.
Some of the examples in Dave's email can be dealt with without a completely hardfp world. For example, the ABI says nothing about calls to static helper functions within a module, so there's no reason (in principle) the compiler could not use the hardfp ABI in that situation. The same could be accomplished for a non-static function using a special attribute.
True, and Richard demonstrated to me that this can work in both cases. I don't recall exactly which compiler branch he was using, but he could clarify this if needed.
Ranking the possibilities in increasing order of effectiveness:
1. Using a modern toolchain (we get this for free, since we're already migrating)
If someone who's set up to do it quickly could test this with the linaro toolchain, that would be interesting:
static __attribute__ (( noinline )) double h(double x, double y) { return x * y; }
static __attribute__ (( noinline )) double g(double x, double y) { return h(x, y) + y; }
double f(double x) { return g(x, x); }
We should see d0,d1 used for the calls to g and h (hopefully both).
2. Use ABI tagging (high effort, involving modifications to affected projects - permits hardvfp ABI for explicitly selected functions)
3. Build a fully hard-float world (high effort - requires packing and distro work to define and build a new armelfp port of the archive)
Since (2) and (3) are both high-effort, perhaps it would be better to choose one of the other approach for now, rather than attempting to do both initially.
Cheers ---Dave