Hi
I'm sure people tracking linaro-dev@ will be interested about the ongoing discussion on a new ARM hard-float ABI port which is taking place on the debian-arm@ mailing-list: http://lists.debian.org/debian-arm/2010/07/threads.html#00019
Cheers,
I don't quite get why A9 won't gain much by using hard float.
Thanks, Jiandong
Hello,
2010/7/8, JD Zheng entropy.zjd@gmail.com:
I don't quite get why A9 won't gain much by using hard float.
Because A9 benefits from `softfp` which it is compatible with `soft`. In theory, hard floating point (incompatible with soft*) should not be much of a win over softfp on A9 cores which much better structured pipeline.
Loïc and other have kindly start working on a wiki page that might be of interest for you too.
http://wiki.debian.org/ArmHardFloatPort
Best regards,
Hi there,
On Thu, Jul 8, 2010 at 7:44 PM, Hector Oron hector.oron@gmail.com wrote:
Hello,
2010/7/8, JD Zheng entropy.zjd@gmail.com:
I don't quite get why A9 won't gain much by using hard float.
Because A9 benefits from `softfp` which it is compatible with `soft`. In theory, hard floating point (incompatible with soft*) should not be much of a win over softfp on A9 cores which much better structured pipeline.
Regarding this discussion, I stongly advocate getting some benchmarks --- we should be careful about drawing conclusions like "won't be much of a win on A9" without some quantification.
For all v7 processors (A8, A9, etc.) the hardfp ABI will increase the register bandwidth for funtion calls. In some cases of floating-point intensive code, the increase will be substantial. For VFPv2 or VFPv3-D16:
* Up to 8 double-precision arguments, or 16 single-precision arguments can be passed in fp registers, in addition to the usual limit of up to 4 integer or pointer arguments in the integer regs. This can eliminate many instructions at call sites and can reduce stack frame size and cache footprint, particularly in and around leaf functions. For C++ the benefit increases again due to the precense of 'this' as an implicit first argument in member functions: a C++ member function with a single explicit double argument will use r0 for the 'this' pointer and r1 will be wasted because double arguments must be padded to an even-numbered register in the register bank. So hardfp could allow up to three extra integer/pointer arguments to be moved from the stack into registers in such cases. * A floating-point result can be returned in an fp register and used directly by the caller * Moving values between the floating-point and integer pipelines can be reduced. This is a benefit on all processors, particularly for floating->integer moves, but as discussed previously the benefit is significantly greater on A8 than it is on A9.
One particular issue we have is that the toolchain cannot easily handle intermixing of multiple ABIs, so it isn't straightforward to use a different ABI (hard) internally to a library or shared object compared with the ABI (softfp) used at the public interface. This means that some libs which may get significant benefit from the hard fp ABI to accelerate internal function calls (such as libm, as well as any computational library) cannot be built using the hard fp ABI internally without doing significant work, unless the whole system is built with hard fp. It's certainly not something we can achieve by simply using dififerent build options for targeted libraries, as can be done for NEON optimisations for example.
Judging how much these changes will improve the performance of real-world code, and how the improvement compares on A9 versus A8, is difficult without doing some benchmarking though.
Cheers ---Dave
Hey Dave
Great to have you back!
On Fri, Jul 09, 2010, Dave Martin wrote:
Regarding this discussion, I stongly advocate getting some benchmarks --- we should be careful about drawing conclusions like "won't be much of a win on A9" without some quantification.
+1; this is just from hearsay so far.
Would you be in a position to do them?
Markos has the karmic rebuild of armel with hard-float ABI + sources. There are some rootfses up there as well. I will share the link with you off-list since I'm not sure it's ok to advertize it wildly.
For all v7 processors (A8, A9, etc.) the hardfp ABI will increase the register bandwidth for funtion calls. In some cases of floating-point intensive code, the increase will be substantial.
Yes; I agree, in any case, the hard-floating point will be superior, we don't know how much. What I meant to express is that Linaro didn't go for a hardfp port because the wins weren't obvious over the (large) time investment + maintenance commitment, and it seemed that by the time we would get there, Cortex-A9s would be common and the wins would be limited.
Now if we were to have benchmarks, the story might be entirely different indeed.
One particular issue we have is that the toolchain cannot easily handle intermixing of multiple ABIs, so it isn't straightforward to use a different ABI (hard) internally to a library or shared object compared with the ABI (softfp) used at the public interface. This means that some libs which may get significant benefit from the hard fp ABI to accelerate internal function calls (such as libm, as well as any computational library) cannot be built using the hard fp ABI internally without doing significant work, unless the whole system is built with hard fp. It's certainly not something we can achieve by simply using dififerent build options for targeted libraries, as can be done for NEON optimisations for example.
I think we either have a Debian (and an Ubuntu) hard-float port or we don't; once we have it, everything is available with hard-float, and we will probably shift our focus on that. If the cost is too high, then we will stay with the armel port and try to rip the most benefits out of it, building vfp versions of the libs (softfp). Perhaps we need to fix the toolchain to be more agressive in softfp mode (especially when combined with -Bsymbolic-functions), or perhaps we need to rework key libraries to use hardfp but keep a soft-float interface as you say.
Judging how much these changes will improve the performance of real-world code, and how the improvement compares on A9 versus A8, is difficult without doing some benchmarking though.
You speak the truth! Thanks a lot for your comments
Loïc Minier wrote:
For all v7 processors (A8, A9, etc.) the hardfp ABI will increase the register bandwidth for funtion calls. In some cases of floating-point intensive code, the increase will be substantial.
Yes; I agree, in any case, the hard-floating point will be superior, we don't know how much.
Like everyone else, I'd like to see numbers.
We can, however, to some extent scope the kind of programs that will benefit. To benefit, a program must have a call to a small function that takes floating-point arguments or returns a floating-point value in an inner loop. (It must be a small function, since otherwise the parameter-passing costs will be dwarfed by the function itself.) This is a relatively rare situation, but OpenGL or the like are probably examples of where this could be important. In many cases, making the small function "inline" may be a better solution than the hardfp ABI.
Some of the examples in Dave's email can be dealt with without a completely hardfp world. For example, the ABI says nothing about calls to static helper functions within a module, so there's no reason (in principle) the compiler could not use the hardfp ABI in that situation. The same could be accomplished for a non-static function using a special attribute.
On Fri, Jul 9, 2010 at 6:21 PM, Mark Mitchell mark@codesourcery.com wrote:
Loďc Minier wrote:
For all v7 processors (A8, A9, etc.) the hardfp ABI will increase the register bandwidth for funtion calls. In some cases of floating-point intensive code, the increase will be substantial.
Yes; I agree, in any case, the hard-floating point will be superior, we don't know how much.
Like everyone else, I'd like to see numbers.
We can, however, to some extent scope the kind of programs that will benefit. To benefit, a program must have a call to a small function that takes floating-point arguments or returns a floating-point value in an inner loop. (It must be a small function, since otherwise the parameter-passing costs will be dwarfed by the function itself.) This is a relatively rare situation, but OpenGL or the like are probably examples of where this could be important. In many cases, making the small function "inline" may be a better solution than the hardfp ABI.
This is a fair point, although there are a good number of projects, which follow the "many tiny source files" approach and so where the compiler doesn't get many static functions to optimise and doesn't get much opportunity to inline. I believe libm is an example of this, but I'm prepered to be overridden...
From the user's point of view, it's not just code of this type that
will benefit, but anything that calls it --- in practice that's going to be a larger set of software.
Some of the examples in Dave's email can be dealt with without a completely hardfp world. For example, the ABI says nothing about calls to static helper functions within a module, so there's no reason (in principle) the compiler could not use the hardfp ABI in that situation. The same could be accomplished for a non-static function using a special attribute.
True, and Richard demonstrated to me that this can work in both cases. I don't recall exactly which compiler branch he was using, but he could clarify this if needed.
Ranking the possibilities in increasing order of effectiveness:
1. Using a modern toolchain (we get this for free, since we're already migrating)
If someone who's set up to do it quickly could test this with the linaro toolchain, that would be interesting:
static __attribute__ (( noinline )) double h(double x, double y) { return x * y; }
static __attribute__ (( noinline )) double g(double x, double y) { return h(x, y) + y; }
double f(double x) { return g(x, x); }
We should see d0,d1 used for the calls to g and h (hopefully both).
2. Use ABI tagging (high effort, involving modifications to affected projects - permits hardvfp ABI for explicitly selected functions)
3. Build a fully hard-float world (high effort - requires packing and distro work to define and build a new armelfp port of the archive)
Since (2) and (3) are both high-effort, perhaps it would be better to choose one of the other approach for now, rather than attempting to do both initially.
Cheers ---Dave
Dave Martin wrote:
To benefit, a program must have a call to a small function that takes floating-point arguments or returns a floating-point value in an inner loop. (It must be a small function, since otherwise the parameter-passing costs will be dwarfed by the function itself.) This is a relatively rare situation, but OpenGL or the like are probably examples of where this could be important. In many cases, making the small function "inline" may be a better solution than the hardfp ABI.
This is a fair point, although there are a good number of projects, which follow the "many tiny source files" approach and so where the compiler doesn't get many static functions to optimise and doesn't get much opportunity to inline. I believe libm is an example of this, but I'm prepered to be overridden...
libm (if we're talking about the version in GLIBC) is many tiny source files, but many of them do not call one another.
From the user's point of view, it's not just code of this type that will benefit, but anything that calls it --- in practice that's going to be a larger set of software.
Yes, but it's still the case that those calls must be in an inner loop. For example, if your application calls "cos" in an inner loop, then this optimization might be important. (That depends on how many cycles "cos" takes to execute, but assuming "cos" takes only 100 cycles or so, then this is going to be important.)
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
- Build a fully hard-float world (high effort - requires packing and
distro work to define and build a new armelfp port of the archive)
Since (2) and (3) are both high-effort, perhaps it would be better to choose one of the other approach for now, rather than attempting to do both initially.
I agree. And, for what it's worth, I would try to avoid (3) at almost all costs. One of the advantages of the ARM ABI and one of the objectives of Linaro is to provide a standard platform. Life as a Linux ISV is complex enough (multiple distributions, kernel versions, etc.) without also having to worry about the ABI. I think it would be better to do quite a bit of tools work than to fall back to the approach of a completely parallel distribution.
2010/7/12 Mark Mitchell mark@codesourcery.com:
Dave Martin wrote:
[...]
libm (if we're talking about the version in GLIBC) is many tiny source files, but many of them do not call one another.
From the user's point of view, it's not just code of this type that will benefit, but anything that calls it --- in practice that's going to be a larger set of software.
Yes, but it's still the case that those calls must be in an inner loop. For example, if your application calls "cos" in an inner loop, then this optimization might be important. (That depends on how many cycles "cos" takes to execute, but assuming "cos" takes only 100 cycles or so, then this is going to be important.)
Just in case there was confusion here, I didn't mean to imply that the cost of the function call from the application was likely to be significant, but rather that internal library function call overhead might account for a significant proportion of the execution time of the library function itself. If I call a function to transform 10000 3D points for example, that may well map to 10000 floating-point function calls inside a library - the ABI used to call from the application into the library is irrelevant in terms of cost, but the way the library is built can have a significant effect.
But I agree that we shouldn't expect to see a huge speedup unless we see it for real - and that there's plenty of reason to be sceptical about the chance of seeing large speedups for most software.
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
- Build a fully hard-float world (high effort - requires packing and
distro work to define and build a new armelfp port of the archive)
Since (2) and (3) are both high-effort, perhaps it would be better to choose one of the other approach for now, rather than attempting to do both initially.
I agree. And, for what it's worth, I would try to avoid (3) at almost all costs. One of the advantages of the ARM ABI and one of the objectives of Linaro is to provide a standard platform. Life as a Linux ISV is complex enough (multiple distributions, kernel versions, etc.) without also having to worry about the ABI. I think it would be better to do quite a bit of tools work than to fall back to the approach of a completely parallel distribution.
Agreed, in general - if I get an opportunity to follow up with some numbers, but it may be a while before I can find the time to do anything on this...
Cheers ---Dave
On Mon, Jul 12, 2010, Mark Mitchell wrote:
- Build a fully hard-float world (high effort - requires packing and
distro work to define and build a new armelfp port of the archive)
Since (2) and (3) are both high-effort, perhaps it would be better to choose one of the other approach for now, rather than attempting to do both initially.
I agree. And, for what it's worth, I would try to avoid (3) at almost all costs. One of the advantages of the ARM ABI and one of the objectives of Linaro is to provide a standard platform. Life as a Linux ISV is complex enough (multiple distributions, kernel versions, etc.) without also having to worry about the ABI. I think it would be better to do quite a bit of tools work than to fall back to the approach of a completely parallel distribution.
It's an appealing case which you make in favor of simplicity and universality, but I think this only covers half of our goals. We certainly aim at standardizing the platform, but we're also building higher-level tools to base on, extend, and derive that base platform. That is, I don't expect all products to ship based on the same standard platform; OEMs, ODMs, Silicon vendors, other folks will want: - to change sources (deviate/fork/etc.) - the smallest possible "disk" and memory footprint - the fastest as possible binaries - all of this for yesterday
So building a standard platform which will get good but not best results is useful, but we also need to provide Linaro's "end-users" with tools to get even closer to "best value" for them.
Perhaps it's possible to build a standard platform which always include two versions of functions, and to then /usr/bin/strip it down down the pipe -- but perhaps it's actually easier (less work) to maintain an ABI variant.
In any case, we weren't ready within Linaro to go out make such a port all by ourselves, because it didn't seem to bring much compared to other things we can work on, but we decided that if Debian was to do it, we should support them. So if you think it's not a wise move to invest their time into this new port, it would be good to speak directly to this group bringing your arguments forward. I think Debian rather than Linaro would be putting the biggest amount of work into a new armhf port, while it sounds like Linaro would be doing a non-trivial amount of complex work otherwise. I'm all for doing the thing which minimizes work for everybody overall (/if it gets us the "best" results/), but we need to align with Debian on such a plan.
There's the question of the timeline too: Debian folks are excited about doing (3) now, and it's within reach. How long will it take to get the toolchain able to do (2)?
Cheers,
Loïc Minier wrote:
It's an appealing case which you make in favor of simplicity and universality, but I think this only covers half of our goals. We certainly aim at standardizing the platform, but we're also building higher-level tools to base on, extend, and derive that base platform.
Sure -- but building a completely parallel distribution with a hard-float ABI is a big cost. It's going to cause anyone building an application binary or library to have to build it twice, validate it twice, etc. In the worst case, distributors will end up deciding they need to put both hard- and soft-float versions of libraries on their systems, so that they can run binaries built for both versions of the ABI, with attendant costs in terms of paging, flash usage, etc.
Before we decide that's necessary, someone should have some very good evidence that this is really useful. Right now, as far as I know, we don't.
So if you think it's not a wise move to invest their time into this new port, it would be good to speak directly to this group bringing your arguments forward.
I don't know who the right people in Debian are; I'd rather suspect you're a lot more connected than I. :-)
There's the question of the timeline too: Debian folks are excited about doing (3) now, and it's within reach.
It's within reach, sure -- but it has big long-term costs. It's cheap to do because there's a button to press to build a new variant of all the packages. But, then you have two distributions to support forevermore, and a fragmentation cost for the entire ARM Linux community.
How long will it take to get the toolchain able to do (2)?
It would take months, but not years. It's going to take a few person-weeks to implement the source attribute. GLIBC is going to need to add new versions of libm functions that can be called with the hard-float ABI, while preserving the old soft-float versions using symbol versioning. GDB may need some work to handle the attribute. So, there's non-trivial work.
But, when you're done, you actually have the solution that you really want -- critical high-performance math functions can be called more efficiently, and nothing else is impacted.
(NOTE I am a thrid party)
Hello Mark,
I think you have really good arguments.
2010/7/12, Mark Mitchell mark@codesourcery.com:
I don't know who the right people in Debian are; I'd rather suspect you're a lot more connected than I. :-)
You only have to reply to the following thread: * http://lists.debian.org/debian-arm/2010/07/threads.html#00019 (You do not even need to subscribe to the list)
How long will it take to get the toolchain able to do (2)?
It would take months, but not years. It's going to take a few person-weeks to implement the source attribute. GLIBC is going to need to add new versions of libm functions that can be called with the hard-float ABI, while preserving the old soft-float versions using symbol versioning. GDB may need some work to handle the attribute. So, there's non-trivial work.
But, when you're done, you actually have the solution that you really want -- critical high-performance math functions can be called more efficiently, and nothing else is impacted.
Are you seriously thinking on implementing (2) before the end of the year? That would be really nice.
Best regards,
Hector Oron wrote:
Are you seriously thinking on implementing (2) before the end of the year? That would be really nice.
It's a prioritization question. Our work with Linaro (and to some extent other work we are doing with other customers) is driven by what people feel will provide maximum benefit. If this hard-float issue is important to Linaro (or others), it certainly is something that our team could do.
Thanks,
On Mon, Jul 12, 2010, Mark Mitchell wrote:
Sure -- but building a completely parallel distribution with a hard-float ABI is a big cost. It's going to cause anyone building an application binary or library to have to build it twice, validate it twice, etc. In the worst case, distributors will end up deciding they need to put both hard- and soft-float versions of libraries on their systems, so that they can run binaries built for both versions of the ABI, with attendant costs in terms of paging, flash usage, etc.
Yes; in fact, the lpia port in Ubuntu was painful for some of these reasons: - maintaining the port, buildds, update source packages for lpia, ... - validating twice - no ISV package (e.g. skype_i386.deb)
...but the biggest pain was its absence in Debian! This is an important point: what's not in Debian is pain to maintain, and a port even more so because it's all over the place.
People come to me and ask about building hard-float images; they believe it will be faster, and I can certainly imagine it will be, but it's not clear how fast; nevertheless they want the best. I can't suggest them to use the "armel" port for that, because it would mean the system would appear to be compatible with packages built for a different ABI -- the right way to do a .deb hard-float image is to use a different dpkg architecture name. But they can't do that because a new port is too much effort for their own project. Now I could also tell them that they could extend the toolchain and some libraries to achieve this, but again it's not something they can do right now nor undertake by themselves.
hard-float Debian port; pros: + motivated Debian people working on it now + toolchain can already do it + best possible binaries for hard-float systems
cons: - fragmentation - cost of maintaining the port
extension to the toolchain and some libs to support hard-float flavors of some functions; pros: + rips out biggest benefits of hard-float + single binary works on all systems; no fragmentation
cons: - not there yet and can't be done by the Debian people who would work on a new port - manual changes to libraries, only some libraries benefit ? more complex toolhain
The arguments in terms of disk and memory usage are pretty much the same I made :-) except I think the hard-float case is in a better place: its binaries will be just a bit smaller, and I don't see an issue with a vendor including a soft-float libc if that's what the vendor wants to do for maximum compatibility. This situation is similar to 64-bits system including a set of 32-bits libraries to run proprietary software; of course the 64-bits aren't usually challenged for disk space.
Perhaps one way to kill the size/memory discussion would be to have a mechanism to strip down the special double-ABI binaries, e.g. libm. Do you think this would be possible?
Also, why do we only consider libm / manual optimizations with source attributes? Wouldn't it be possible to apply this mixed-float approach to random C libraries automatically? I'd very much like this to be the case, because that would make it possible to use a "do mixed-float flavors" flag in gcc across a whole distro and worry about striping down the binaries later, either to tailor for hard-float or to tailor for soft-float; it would cover all libraries rather than manually optimized ones like libm and would make supporting these two flavors easier.
What I perceive as high risks: - doing a lot of work such as a new port or large toolchain developments, and then discovering they are not worth the gain (anymore) in some months; perhaps because the VFP unit has been improved substantially - doing the toolchain changes, but still seeing a measurable difference across hard-float and mixed-float worlds, because a bunch of small libraries use floats here and there
What I perceive as medium risk: - spending time doing these toolchain changes instead of more useful ones
I don't know who the right people in Debian are; I'd rather suspect you're a lot more connected than I. :-)
I am, but I don't like proxying arguments and I hate have both Linaro and Debian hats without real toolchain contributions to backup toolchain positions. I have background to comment on the pain of new ports for sure, but not in favor of the toolchain changes; and I would be perceived as biased if I were to recommend to Debian not to do anything right now and to wait for Linaro's new toolchain features in 6 months...
On Mon, Jul 12, 2010, Dave Martin wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
This is the first time I hear this suggested, and it's quite a nice option
I don't understand how much effort it implies, nor what kind of adverse side effects in might have: size of binaries, compilation time perhaps?
Is the idea to select some functions to be built for each ABI, or rather build a while piece of software (with some toolchain flags) turning on the generation of the two ABIs with relevant tags on all functions taking floats as arguments? Will the toolchain be involved in taking the decision of which functions to provide for the two ABIs?
On Mon, 2010-07-12 at 22:03 +0200, Loïc Minier wrote:
On Mon, Jul 12, 2010, Dave Martin wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
This is the first time I hear this suggested, and it's quite a nice option
I don't understand how much effort it implies, nor what kind of adverse side effects in might have: size of binaries, compilation time perhaps?
This sounds like it will require a new attribute, modification of the code generating function prologues and epilogues, possibly changes in the way static functions are handled based on earlier mails in this thread. It sounds like a lot of work, and possibly a maintenance headache. Is the code this affects changed often or will it be something we modify, test, fix and forget about for the most part?
Is the idea to select some functions to be built for each ABI, or rather build a while piece of software (with some toolchain flags) turning on the generation of the two ABIs with relevant tags on all functions taking floats as arguments? Will the toolchain be involved in taking the decision of which functions to provide for the two ABIs?
I think the idea was to tag a function with __attribute(hardfp)__ or the like to force the hardfp abi on that function. Suitable static functions would also automatically use the hardfp abi since all its callers can theoretically be identified. This assumption breaks if a function pointer to the static function is passed out. Not sure if that can be automatically detected and handled.
Scott
Scott Bambrough wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
This is the first time I hear this suggested, and it's quite a nice option
You will definitely need a source attribute to indicate what ABI a function has. As Scott says, the compiler can then call it appropriately. The issue about pointers-to-functions isn't insoluble; the attribute becomes part of the function type to a pointer to that function has type "pointer-to-hard-float-function", and the compiler can warn if you cast that away.
You could try to get away without a source attribute and have the linker generate fixup code when you call a function using the wrong ABI. That's theoretically possible, but it's a lot of work, and I don't really see that it's necessary.
On Mon, 2010-07-12 at 13:36 -0700, Mark Mitchell wrote:
Scott Bambrough wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
This is the first time I hear this suggested, and it's quite a nice option
You will definitely need a source attribute to indicate what ABI a function has. As Scott says, the compiler can then call it appropriately. The issue about pointers-to-functions isn't insoluble; the attribute becomes part of the function type to a pointer to that function has type "pointer-to-hard-float-function", and the compiler can warn if you cast that away.
Isn't the source attribute and the relevant bits for the calling convention already implemented as a part of the hard float ABI work ?
It's documented here
http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Function-Attributes.html#Functio...
I've pasted the relevant text at the end of the mail but not sure about the warnings...
cheers Ramana
[1] pcs The pcs attribute can be used to control the calling convention used for a function on ARM. The attribute takes an argument that specifies the calling convention to use.
When compiling using the AAPCS ABI (or a variant of that) then valid values for the argument are "aapcs" and "aapcs-vfp". In order to use a variant other than "aapcs" then the compiler must be permitted to use the appropriate co-processor registers (i.e., the VFP registers must be available in order to use "aapcs-vfp"). For example,
/* Argument passed in r0, and result returned in r0+r1. */ double f2d (float) __attribute__((pcs("aapcs")));
Variadic functions always use the "aapcs" calling convention and the compiler will reject attempts to specify an alternative.
On Mon, Jul 12, 2010 at 9:36 PM, Mark Mitchell mark@codesourcery.com wrote:
[...]
You could try to get away without a source attribute and have the linker generate fixup code when you call a function using the wrong ABI. That's theoretically possible, but it's a lot of work, and I don't really see that it's necessary.
Unfortunately, converting between ABIs requires knowledge of the type and order of arguments - so while the compiler can do it if function prototypes are appropriately marked, I think the linker probably doesn't have the information it would need to generate the required fixups... unless the .o files contain more information about function symbols that I'm currently aware of.
Cheers ---Dave
On Mon, Jul 12, 2010 at 9:03 PM, Loïc Minier loic.minier@linaro.org wrote:
On Mon, Jul 12, 2010, Dave Martin wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
This is the first time I hear this suggested, and it's quite a nice option
I don't understand how much effort it implies, nor what kind of adverse side effects in might have: size of binaries, compilation time perhaps?
Is the idea to select some functions to be built for each ABI, or rather build a while piece of software (with some toolchain flags) turning on the generation of the two ABIs with relevant tags on all functions taking floats as arguments? Will the toolchain be involved in taking the decision of which functions to provide for the two ABIs?
a) The easy option: tag functions internal to libraries only, leaving the library entry points unchanged.
-> only certain functions can be safely tagged - i.e., functions whose address is not passed outside the library (i.e., the "internal" ELF visibility attribute would be legal) AND which are not called in any context where a non-tagged function might be called (i.e., the tag has to form part of the function type so that a function pointer lacking the tag cannot be used to call a function which has the tag, and vice versa). Determining the taggable functions automatically may be hard. Checking automatically that the developer didn't tag any functions which weren't safe to tag may also be challenging.
-> I believe the the toolchain can automatically optimise calls to functions within the same compilation unit under some situations (including static functions). Other cases would be have to be addressed manually.
...so...
-> only the tagged functions (and corresponding call sites) will change in the build -> some manual effort to tag the functions in the affected library (the tools cannot automatically perform the procedure call optimisation between compilation units) -> probably no significant impact on compilation time -> slight reduction in code size of the library due to the shrinkage of some function call sequences (but probably not very significant)
Software which links against the library will be completely unaffected, at build, dynamic-link or run-time.
b) The harder option: allow library entry points to be tagged also.
-> may require two builds of the library, or at least a second set of external API entry points within the same library -> some code size increase to the provision of extra API entry points -> need a way to control which entry points a client project uses at build time -> ABI and packaging issues, and linker support required to manage the duplicate entry points for libraries. Some ABI interoperability issues - packages built to use the hardfp-optimised entry points will not work on platforms which don't have those entry points.
For now, (a) is probably the most practical and easiest thing to target. It's also where we might be expected to get the most benefit. (b) seems much harder - it might be worth doing for libm, but is probably not worth attempting in most other cases.
Cheers ---Dave
On Mon, 2010-07-12 at 10:28 +0100, Dave Martin wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
While this might sound attractive at first, I'm not sure it will fly.
Consider tagging all the functions in libm() to use hard float.
Firstly, many people 'know' that sinf()'s prototype is
float sinf (float);
and will declare it directly rather than including math.h. That's going to quietly break. Autoconf-like tools are notorious for making assumptions like this.
Secondly, if I understand what you're suggesting, I think this just makes the world more fragmented rather than less. A library that exports
float __attribute__((pcs("aapcs-vfp"))) sinf(float)
can't be replaced by a library that exports
float __attribute__((pcs("aapcs"))) sinf(float)
because the caller must know which PCS convention is used at compile time.
R.
On Tue, Jul 13, 2010 at 11:50 AM, Richard Earnshaw rearnsha@arm.com wrote:
On Mon, 2010-07-12 at 10:28 +0100, Dave Martin wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
While this might sound attractive at first, I'm not sure it will fly.
Consider tagging all the functions in libm() to use hard float.
Firstly, many people 'know' that sinf()'s prototype is
float sinf (float);
and will declare it directly rather than including math.h. That's going to quietly break. Autoconf-like tools are notorious for making assumptions like this.
Possibly dumb questions:
Why would anyone (including autoconf) do this instead of including math.h? Is this for working around "broken" headers etc.? Do you know how common this is?
Secondly, if I understand what you're suggesting, I think this just makes the world more fragmented rather than less. A library that exports
float __attribute__((pcs("aapcs-vfp"))) sinf(float)
can't be replaced by a library that exports
float __attribute__((pcs("aapcs"))) sinf(float)
because the caller must know which PCS convention is used at compile time.
Indeed - see my post of 0925 UTC. This falls under (b), which looks challenging to manage, though I didn't discuss the knock-on issues with headers and function prototypes there.
However, just tagging internal functions sounds like it could bring some benefits and is more feasible (a) - can you see any problems with that?
Cheers ---Dave
This many be a daft comment, but shouldn't high performance math be implemented via DSP or GPU hardware or some other co-pro on an embedded system?
Joel
On Tue, Jul 13, 2010 at 6:05 AM, Dave Martin dave.martin@linaro.org wrote:
On Tue, Jul 13, 2010 at 11:50 AM, Richard Earnshaw rearnsha@arm.com wrote:
On Mon, 2010-07-12 at 10:28 +0100, Dave Martin wrote:
- Use ABI tagging (high effort, involving modifications to affected
projects - permits hardvfp ABI for explicitly selected functions)
While this might sound attractive at first, I'm not sure it will fly.
Consider tagging all the functions in libm() to use hard float.
Firstly, many people 'know' that sinf()'s prototype is
float sinf (float);
and will declare it directly rather than including math.h. That's going to quietly break. Autoconf-like tools are notorious for making assumptions like this.
Possibly dumb questions:
Why would anyone (including autoconf) do this instead of including math.h? Is this for working around "broken" headers etc.? Do you know how common this is?
Secondly, if I understand what you're suggesting, I think this just makes the world more fragmented rather than less. A library that exports
float __attribute__((pcs("aapcs-vfp"))) sinf(float)
can't be replaced by a library that exports
float __attribute__((pcs("aapcs"))) sinf(float)
because the caller must know which PCS convention is used at compile time.
Indeed - see my post of 0925 UTC. This falls under (b), which looks challenging to manage, though I didn't discuss the knock-on issues with headers and function prototypes there.
However, just tagging internal functions sounds like it could bring some benefits and is more feasible (a) - can you see any problems with that?
Cheers ---Dave
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Dnia wtorek, 13 lipca 2010 o 19:05:30 Joel Crisp napisał(a):
This many be a daft comment, but shouldn't high performance math be implemented via DSP or GPU hardware or some other co-pro on an embedded system?
fmul operation takes few cycles (plus some more for moving data between registers in softfp) while doing same operation on TI DSP (part of omap3 cpu) will require more time.
DSP is good to have for more complex operations but thats a job for application developers as they know what their app runs on and how to make it run smoothly.
GCC knows about ARM part of cpu and optimize for it. And we need to keep that way because cortex-a8/a9/a5 cores do not contain DSP/GPU or other coprocessors.
Regards,
On Tue, Jul 13, 2010 at 6:05 PM, Joel Crisp cydergoth@gmail.com wrote:
This many be a daft comment, but shouldn't high performance math be implemented via DSP or GPU hardware or some other co-pro on an embedded system?
Where the hardware available is a good fit, yes - but porting work is need in every case, for every target device, so it's not very scalable in terms of effort or portability.
Generally only bulk-number-crunching software components designed with this sort of acceleration in mind will perform well with it - using hardware accelerators to process data can give high data throughput, but suffers from relatively massive setup costs and latency compared with running general-purpose code on the CPU. Using extra hardware blocks for general-purpose work may also interfere with the ability to save power by turning those devices off when they're not in use.
Components for which hardware acceleration is a big win typically includes codecs, video rendering and streaming components.
For other cases, software implementation gives you a universal fallback which will work (albeit at reduced performance) on all devices - we still want this to be as fast a possible, since in general not every device will have an accelerated path for everything. Suppose someone invents a novel codec after the hardware was built, for example.
Cheers, ---Dave
Hello,
2010/7/8, JD Zheng entropy.zjd@gmail.com:
I don't quite get why A9 won't gain much by using hard float.
Because A9 benefits from `softfp` which it is compatible with `soft`. In theory, hard floating point (incompatible with soft*) should not be much of a win over softfp on A9 cores which much better structured pipeline.
Loïc and other have kindly start working on a wiki page that might be of interest for you too.
http://wiki.debian.org/ArmHardFloatPort
Best regards,