Hi,
Sorry for the cross post and long email :-)
Currently I am working on a very initial state build of Mandriva for arm. Thanks to Jeff Johnson for giving me ssh access to armv7 hosts, and Matthew Dawkins for building several Mandriva/Unity linux armv5 packages.
What I am trying to understand now is about choice of float abi. I understand that the IHI0042D_aapcs.pdf file I donwload says to use vfp registers for float/double arguments, but softfp seems too good to miss, as armv5 should be around for some time yet.
So, I have two chroots, running: softfp# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper Target: armv7l-mandriva-linux-gnueabi Configured with: /home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure --prefix=/usr --build=i586-mandriva-linux-gnu --host=armv7l-mandriva-linux-gnueabi --target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx --with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --enable-languages=c,c++ --enable-threads=posix --disable-libssp --disable-libmudflap Thread model: posix gcc version 4.6.1 20110722 (Mandriva) (GCC)
thumb# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper Target: armv7l-mandriva-linux-gnueabi Configured with: /home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure --prefix=/usr --build=i586-mandriva-linux-gnu --host=armv7l-mandriva-linux-gnueabi --target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx --with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a --with-mode=thumb --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --enable-languages=c,c++ --enable-threads=posix --disable-libssp --disable-libmudflap Thread model: posix gcc version 4.6.1 20110722 (Mandriva) (GCC)
This is unmodified upstream gcc, and using a set of bootstrap scripts from a git branch I made on a checkout of
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
Just so you know I am running thumb and arm builds, with thumb using hard float and the softfp with arm instructions set:
softfp# objdump -d /usr/lib/libm.so | less [...] 00008d30 <__ieee754_atan2>: 8d30: e3a0c000 mov ip, #0 8d34: e347cff0 movt ip, #32752 ; 0x7ff0 8d38: e92d4030 push {r4, r5, lr} 8d3c: ed2d8b10 vpush {d8-d15} 8d40: e3a05000 mov r5, #0 8d44: ec432b18 vmov d8, r2, r3 8d48: e3475ff0 movt r5, #32752 ; 0x7ff0 8d4c: e003c00c and ip, r3, ip 8d50: e15c0005 cmp ip, r5 8d54: e24dd02c sub sp, sp, #44 ; 0x2c 8d58: e1a04003 mov r4, r3 8d5c: ec410b19 vmov d9, r0, r1 8d60: e1a05002 mov r5, r2 8d64: 0a000022 beq 8df4 <__ieee754_atan2+0xc4> [...]
thumb# objdump -d /usr/lib/libm.so | less [...] 00007884 <__ieee754_atan2>: 7884: 2100 movs r1, #0 7886: 2000 movs r0, #0 7888: f6c7 71f0 movt r1, #32752 ; 0x7ff0 788c: ec53 2b11 vmov r2, r3, d1 7890: f6c7 70f0 movt r0, #32752 ; 0x7ff0 7894: 4019 ands r1, r3 7896: 4281 cmp r1, r0 7898: e92d 03f0 stmdb sp!, {r4, r5, r6, r7, r8, r9} 789c: ed2d 8b10 vpush {d8-d15} 78a0: 461c mov r4, r3 78a2: b08a sub sp, #40 ; 0x28 78a4: eeb0 8b41 vmov.f64 d8, d1 78a8: 4616 mov r6, r2 78aa: eeb0 9b40 vmov.f64 d9, d0 78ae: d03c beq.n 792a <__ieee754_atan2+0xa6> [...]
I am kind of trying to figure what "The Industry" says about it, and just checked the linaro gcc-4.6 relevant changes for me right now, that are...
+ --with-arch=armv7-a --with-tune=cortex-a8 \ + --with-float=$(float_abi) --with-fpu=neon \
+# check if we're building for armel or armhf +ifeq ($(DEB_TARGET_ARCH),armhf) + float_abi := hard +else ifneq (,$(filter $(DEB_TARGET_ARCH), arm armel)) + float_abi := softfp +endif
If I understand correctly, neon will have better support for simd instructions right?
Either way, I used two simple benchmarks to try to sell myself the idea of breaking compatibility with armv5 or older binaries, but still not convinced, but, as I said, we should use whatever "The Industry" chooses :-) I used for benchmark http://www.tux.org/~mayer/linux/bmark.html and http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI-... and also compared with my home computer (quad)core i5 x86_64, and attached results...
Thanks and again sorry for cross posting and long email, Paulo
On Sun, Jul 31, 2011, Paulo César Pereira de Andrade wrote:
If I understand correctly, neon will have better support for simd instructions right?
NEON is effectively SIMD instructions, but not all modern SoCs have NEON, e.g. NVidia Tegra2 don't have NEON. It's becoming common place in recent SoCs though.
Either way, I used two simple benchmarks to try to sell myself the idea of breaking compatibility with armv5 or older binaries, but still not convinced, but, as I said, we should use whatever "The Industry" chooses :-)
Depends which industry though. Yes, ARMv5 will be around for a while, but high-end devices, phones, tablets etc. are designed around ARMv7. Depends what your distro targets too; for instance Debian armel targets ARMv4T+ while Ubuntu armel is based of the same sources but targets ARMv7+ and uses Thumb2.
I used for benchmark http://www.tux.org/~mayer/linux/bmark.html and http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI-... and also compared with my home computer (quad)core i5 x86_64, and attached results...
You're likely not going to see much difference switching float ABI alone with common benchmarks, because GCC is clever enough to use the best possible ABI for non-public functions. It's mostly visible when you're crossing library calls with floating points.
You should however definitely see a difference between ARM mode and Thumb-2 mode (which is ARMv6+ only IIRC), as the code is denser and fits more easily in CPU cache.
There were many discussions around this on the debian-arm list last year; Konstantinos Margaritis collected benchmarks for hard-float which should be linked from http://wiki.debian.org/ArmHardFloatPort
HTH
2011/8/1 Loïc Minier loic.minier@linaro.org:
On Sun, Jul 31, 2011, Paulo César Pereira de Andrade wrote:
If I understand correctly, neon will have better support for simd instructions right?
NEON is effectively SIMD instructions, but not all modern SoCs have NEON, e.g. NVidia Tegra2 don't have NEON. It's becoming common place in recent SoCs though.
Either way, I used two simple benchmarks to try to sell myself the idea of breaking compatibility with armv5 or older binaries, but still not convinced, but, as I said, we should use whatever "The Industry" chooses :-)
Depends which industry though. Yes, ARMv5 will be around for a while, but high-end devices, phones, tablets etc. are designed around ARMv7. Depends what your distro targets too; for instance Debian armel targets ARMv4T+ while Ubuntu armel is based of the same sources but targets ARMv7+ and uses Thumb2.
I used for benchmark http://www.tux.org/~mayer/linux/bmark.html and http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI-... and also compared with my home computer (quad)core i5 x86_64, and attached results...
You're likely not going to see much difference switching float ABI alone with common benchmarks, because GCC is clever enough to use the best possible ABI for non-public functions. It's mostly visible when you're crossing library calls with floating points.
Yes. That is what I noticed. Trying a simple test case of what should be the worst case for softfp, e.g. this "dumb" program: -%<- #include <stdio.h>
__attribute__((noinline)) double d_d(double a, double b, double c, double d, double e, double f, double g, double h) { return a + b + c + d + e + f + g + h; }
int main(int argc, char *argv[]) { int i; double d;
for (d = 0.0, i = 0; i < 100000000; i++) d += d_d(d, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0); printf("%f\n", d);
return 0; } -%<-
and compiling with -O0 otherwise gcc is smart enough to figure out the constant additions, or other variants of the above, I see 20-25% slower softfp on panda board.
Besides breaking compatibility with armv5 binaries due to special corner cases like linking to libraries with functions with several "float by value arguments" is still to convince me (mostly because gcc will optimize most of it for "internal functions")... There is also the issue of the abi still using softfp abi for varargs, e.g. printf...
You should however definitely see a difference between ARM mode and Thumb-2 mode (which is ARMv6+ only IIRC), as the code is denser and fits more easily in CPU cache.
There were many discussions around this on the debian-arm list last year; Konstantinos Margaritis collected benchmarks for hard-float which should be linked from http://wiki.debian.org/ArmHardFloatPort
I tried to follow all links etc from there, but the best I found was an interesting example of wrapping a single assembly instruction in a function call; somewhat like my example above (where it is required to use -O0 to see the difference), and I am afraid some tests, following links from there, were even using different compilers...
Either way, so far I have only tested on a remote panda board, and I am still to see a "physical" armv7, so blame me for not testing on other hardware (due to lack of access). But from more "readings", it would be a shame if the switch to an incompatible abi was mainly to satisfy binary blobs like nvidia binary video drivers... (you can bash me now for saying that).
HTH
Loïc Minier
Paulo
2011/8/1 Loïc Minier loic.minier@linaro.org:
On Sun, Jul 31, 2011, Paulo César Pereira de Andrade wrote:
Hi, sorry for returning back to this discussion.
If I understand correctly, neon will have better support for simd instructions right?
NEON is effectively SIMD instructions, but not all modern SoCs have NEON, e.g. NVidia Tegra2 don't have NEON. It's becoming common place in recent SoCs though.
Either way, I used two simple benchmarks to try to sell myself the idea of breaking compatibility with armv5 or older binaries, but still not convinced, but, as I said, we should use whatever "The Industry" chooses :-)
Depends which industry though. Yes, ARMv5 will be around for a while, but high-end devices, phones, tablets etc. are designed around ARMv7. Depends what your distro targets too; for instance Debian armel targets ARMv4T+ while Ubuntu armel is based of the same sources but targets ARMv7+ and uses Thumb2.
I used for benchmark http://www.tux.org/~mayer/linux/bmark.html and http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI-... and also compared with my home computer (quad)core i5 x86_64, and attached results...
You're likely not going to see much difference switching float ABI alone with common benchmarks, because GCC is clever enough to use the best possible ABI for non-public functions. It's mostly visible when you're crossing library calls with floating points.
You should however definitely see a difference between ARM mode and Thumb-2 mode (which is ARMv6+ only IIRC), as the code is denser and fits more easily in CPU cache.
On simple tests it did not show much difference, but the space saving is quite noticeable. Same for neon, but I believe neon would make more of a difference on vectorized integer operations.
There were many discussions around this on the debian-arm list last year; Konstantinos Margaritis collected benchmarks for hard-float which should be linked from http://wiki.debian.org/ArmHardFloatPort
I did make an initial port of Mandriva using softfp, it has almost all dependencies for a "proper" distro rebuild in place, e.g. most of gtk/gnome, qt4/kde, but it is unofficial, and so far only have a qemu image, or can unpack the image in a chroot; I have been using vnc in a ssh tunnel for testing graphical applications...
But now I am working on repeating the same steps in a hardfp bootstrap, so that, theoretically, any binary for fedora hardfp should just work, and hopefully, the same for linaro.
What is the current state of hardfp toolchain? I ask because softfp, what I earlier considered the most sane engineering option, just builds and works using upstream binutils/gcc, but hardfp requires some hacking; I have been using and building weekly LATEST-4.6 gcc snapshots for quite some time.
HTH
Loïc Minier
Thanks, Paulo
Em 18 de outubro de 2011 13:43, Paulo César Pereira de Andrade paulo.cesar.pereira.de.andrade@gmail.com escreveu:
[...]
What is the current state of hardfp toolchain? I ask because softfp, what I earlier considered the most sane engineering option, just builds and works using upstream binutils/gcc, but hardfp requires some hacking; I have been using and building weekly LATEST-4.6 gcc snapshots for quite some time.
Sorry for the reply to my own email, but shortly after sending the email I noticed what is the cause, and it is misconfigured rpmbuild macros. Need to update several macros in official packages...
HTH
Loïc Minier
Paulo
On Tue, Oct 18, 2011, Paulo César Pereira de Andrade wrote:
What is the current state of hardfp toolchain? I ask because softfp, what I earlier considered the most sane engineering option, just builds and works using upstream binutils/gcc, but hardfp requires some hacking; I have been using and building weekly LATEST-4.6 gcc snapshots for quite some time.
It should be pretty good; if you're having issues with FSF GCC, try Linaro's as we've backported some fixes and speed improvements. Support might be better in 4.6 series than in 4.5 series.
Em 18 de outubro de 2011 17:09, Loïc Minier loic.minier@linaro.org escreveu:
On Tue, Oct 18, 2011, Paulo César Pereira de Andrade wrote:
What is the current state of hardfp toolchain? I ask because softfp, what I earlier considered the most sane engineering option, just builds and works using upstream binutils/gcc, but hardfp requires some hacking; I have been using and building weekly LATEST-4.6 gcc snapshots for quite some time.
It should be pretty good; if you're having issues with FSF GCC, try Linaro's as we've backported some fixes and speed improvements. Support might be better in 4.6 series than in 4.5 series.
Thanks. Currently I am using the same spec file for x86 and arm. But one issue I would like to ask is about openjdk bootstrap.
I adapted fedora java bootstrap procedure to my gcc build, that is, on a x86:
-%<- <<just finished compilation of gcc>>
%if %{with java_build_tar} find libjava -name *.h -type f | \ xargs grep -l '// DO NOT EDIT THIS FILE - it is machine generated' \ > libjava-classes.list find libjava -name *.class -type f >> libjava-classes.list find libjava/testsuite -name *.jar -type f >> libjava-classes.list tar cf - -T libjava-classes.list | bzip2 -9 \ > %{_sourcedir}/libjava-classes-%{version}-%{release}.tar.bz2 %endif -%<-
and on the arm:
-%<- <<unpack tarball and enter gcc build directory>>
%if %{with java_bootstrap} tar xjf %{SOURCE6} %endif
<<configure and build>> -%<-
And it compiled all the java stuff; now I have:
# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/armv7hl-mandriva-linux-gnueabi/4.6.2/lto-wrapper Target: armv7hl-mandriva-linux-gnueabi Configured with: ./configure --build=armv7hl-mandriva-linux-gnueabi --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --x-includes=/usr/include --x-libraries=/usr/lib --disable-libjava-multilib --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-java-awt=gtk --enable-gtk-cairo --with-cloog --with-ppl --enable-cloog-backend=ppl --disable-libquadmath --disable-libquadmath-support --disable-libssp --disable-libunwind-exceptions --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-checking=release --enable-gnu-unique-object --enable-languages=c,c++,fortran,java,lto,objc,obj-c++ --enable-linker-build-id --enable-plugin --enable-shared --enable-threads=posix --with-system-zlib --with-bugurl=https://qa.mandriva.com/ --with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a --with-mode=thumb --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --host=armv7hl-mandriva-linux-gnueabi --target=armv7hl-mandriva-linux-gnueabi Thread model: posix gcc version 4.6.2 20111019 (Mandriva) (GCC)
This is the first gcc 4.6.2 Release Candidate, the eclipse-ecj.jar is from a x86 computer (noarch bootstrap package).
But I think it is completely "booged", for example, the Mandriva openjdk build is based on the fedora one, but, to get it past of early stage "rewriting" rhino (another noarch package), it was failing with an out of bounds exception, so, I changed:
/stage3/rpmbuild/java-1.6.0-openjdk/BUILD/icedtea6-1.9.7/rewriter/com/redhat/rewriter/ClassRewriter.java like this:
- private static final boolean DEBUG = false; + private static final boolean DEBUG = true;
then, after rebuild, it did print a very large amount of debug, but worked... (that should have triggered some flush somewhere) but did not matter much, because I got to a point where gij would jut core dump, specifically, this command:
/stage3/rpmbuild/BUILD/icedtea6-1.10.3/bootstrap/jdk1.6.0/bin/java -Djava.endorsed.dirs=/stage3/rpmbuild/BUILD/icedtea6-1.10.3/bootstrap/jdk1.6.0/lib/endorsed -classpath /stage3/rpmbuild/BUILD/icedtea6-1.10.3/openjdk.build-ecj/hotspot/outputdir/linux_arm_zero/product/../generated/jvmtifiles jvmtiGen -IN /stage3/rpmbuild/BUILD/icedtea6-1.10.3/openjdk-ecj/hotspot/src/share/vm/prims/jvmti.xml -XSL /stage3/rpmbuild/BUILD/icedtea6-1.10.3/openjdk-ecj/hotspot/src/share/vm/prims/jvmtiEnter.xsl -OUT /stage3/rpmbuild/BUILD/icedtea6-1.10.3/openjdk.build-ecj/hotspot/outputdir/linux_arm_zero/product/../generated/jvmtifiles/jvmtiEnter.cpp -PARAM interface jvmti
Since I recently wrote a jit for arm for my "toy" language, and have a fork (for "playground" hacking) of GNU lightning where I added support for arm, I was aware of issues with the hardfp abi and varargs functions (I still need to add support for hardfp to my jit code, and write a thumb2 jit at some point, but porting a distro from scratch takes quite some time :-)
So, sometime ago I saw this email
http://lists.fedoraproject.org/pipermail/arm/2011-August/001732.html
and just did look a bit more to find
http://sourceware.org/ml/libffi-discuss/2011/msg00075.html
So, I believe I should wait a bit more for things to stabilize for openjdk I guess? Well, I am only trying to make some experiments with building java for the sake of completeness, I do not like, neither use java :-)
Also, since it is early stages I thought it was OK to "cheat" for the sake of experimenting, so, for the softfp distro port, I got java-1.6.0-openjdk running fine after using a java-1.6.0-openjdk-devel for armv5te. And, also tried to "cheat" as apparently fedora already have a java-1.6.0-openjdk for armv7hl, e.g. see
http://aph.fedorapeople.org/RPMS/armv7hl/
but that will just crash on my armv7hl build (maybe I need to add patches to libffi?, not use the one built by gcc?).
Possibly of useful information, gcj would a few times block, and the same did happen on softfp. So, when noticing a build was stuck, I would just run "gstack <pid>" to get it to work; learned that after attaching gdb to all threads just to find out that when detaching it would get out of the dead lock.
I do not know much about openjdk build, or/if it calls varargs functions with float/double arguments, hope not, or things are really broken...
-- Loïc Minier
Thanks and sorry for the usual long email... Paulo
What I am trying to understand now is about choice of float abi.
Not much to understand - each project chooses the abi that best meets their goals. If you want to learn the history, it's all in the mail archives.
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
If you want to do that, you don't need my bootstrap scripts. The whole *point* of a bootstrap was to bring up an *incompatible* abi from scratch. If you want to use a compatible abi, just keep using the armv5 version of Fedora instead. It was decided long ago that the armv7 version of Fedora would use the hardfp abi (hence the project name "hardfp bootstrap"), but you can't build hardfp binaries on a softfp platform, so we had to start from scratch to do hardfp.
It's also a fun exercise in bootstrapping, to make sure we still can do it.
I am kind of trying to figure what "The Industry" says about it,
If you need someone else's approval, you've missed the point of Free Software. Each project has their own goals, and there is no "The Industry" to tell us what to do. If you want to be part of a project, find the one that has the same goals as you do, and join them.
If I understand correctly, neon will have better support for simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
On Monday, August 01, 2011 12:35:06 PM DJ Delorie wrote:
What I am trying to understand now is about choice of float abi.
Not much to understand - each project chooses the abi that best meets their goals. If you want to learn the history, it's all in the mail archives.
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did
the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
If you want to do that, you don't need my bootstrap scripts. The whole *point* of a bootstrap was to bring up an *incompatible* abi from scratch. If you want to use a compatible abi, just keep using the armv5 version of Fedora instead. It was decided long ago that the armv7 version of Fedora would use the hardfp abi (hence the project name "hardfp bootstrap"), but you can't build hardfp binaries on a softfp platform, so we had to start from scratch to do hardfp.
We decided to keep using soft rather than softfp on armv5 because softfp while it can use a hardware floating point unit if its available has the extra overhead of working out at runtime if it has a hardware floting point unit or not. by making the distinct v7 port and using hardfp we gain the speed of using the hardware floating point unit without the runtime overhead. but since softfp and soft are compatiable you could just build using fedora as a base. or any other ABI compatiable distro.
It's also a fun exercise in bootstrapping, to make sure we still can do it.
I am kind of trying to figure what "The Industry" says about it,
If you need someone else's approval, you've missed the point of Free Software. Each project has their own goals, and there is no "The Industry" to tell us what to do. If you want to be part of a project, find the one that has the same goals as you do, and join them.
If I understand correctly, neon will have better support for
simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
marvell and nvidia armv7 chips dont have neon. which includes the xo-1.75
Dennis
Quoting DJ Delorie dj@redhat.com:
It's also a fun exercise in bootstrapping, to make sure we still can do it.
I haven't looked at the docs in a while, but we are most likely going to need this again in the distant future. Plus the fact it seems to come up all the time. It would be appropriate to have it overly documented. (Overly in my definition includes stupid details, like if xyz happens you screwed up step 113. Your definition may vary. :) )
If I understand correctly, neon will have better support for simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
IIRC Neon isn't a requirement for armv7 but a vfpu is. It is a good choice.
Neon is a simd processor, however, the code needs tweaking for neon, so that would be a great place to volunteer if you are looking for something. There are a 100 other places to help also, including testing documentation.. :)
Em 1 de agosto de 2011 14:35, DJ Delorie dj@redhat.com escreveu:
What I am trying to understand now is about choice of float abi.
Not much to understand - each project chooses the abi that best meets their goals. If you want to learn the history, it's all in the mail archives.
I am using some local branches of your scripts to build several combinations for armv7 chroots, stoping at stage2 and building a few rpms:
(calling hardfp for easier understanding and using vfpv3-d16 if neon ommited)
arm+hardfp arm+softfp thumb+hardfp thumb+softfp thumb+hardfp+neon thumb+softfp+neon
From my understanding, neon generates "prettier" objdump output when looking at libm.so, but runtime of simple benchmarks does not show any difference.
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
If you want to do that, you don't need my bootstrap scripts. The whole *point* of a bootstrap was to bring up an *incompatible* abi from scratch. If you want to use a compatible abi, just keep using the armv5 version of Fedora instead. It was decided long ago that the armv7 version of Fedora would use the hardfp abi (hence the project name "hardfp bootstrap"), but you can't build hardfp binaries on a softfp platform, so we had to start from scratch to do hardfp.
Actually, I know now that I was also partially confused by misunderstanding the --with-float=hard abi, so, I wrote a simple program to better understand the calling conversion being generated. For some reason I was thinking that it would use only two vfp registers for arguments, but it can use up to 8. But using softfp convention for variadic functions may be tough for some applications; I wrote two "initial state" jits for arm: https://github.com/pcpa/lightning/tree/master/lightning/arm and direct links to other, as it is not in a single project... https://code.google.com/p/exl/source/browse/trunk/lib/ejit_arm-cpu.c https://code.google.com/p/exl/source/browse/trunk/lib/ejit_arm-swf.c https://code.google.com/p/exl/source/browse/trunk/lib/ejit_arm-vfp.c
So, today after better understanding the ABI, I also made a simple test case, to call 100 million times a function receiving 8 double arguments and return one. Compiled with -O0 or gcc just optimizes out the call sequence and all timings become identical, and I noticed a 20-25% faster execution, on what should be where it should make most difference: 8 arguments in registers and return in register, contrary to 2 in r0,r1,r2,r3, converted to vfp, and 6 on stack, and then again the conversion for return...
As Loïc Minier said in the other response (Thanks!) this should be most of an issue when calling functions from different libraries, where gcc cannot optimize much. And presuming one is passing 2-8 float/double arguments a lot in inner loops, and not in vectors...
It's also a fun exercise in bootstrapping, to make sure we still can do it.
With that I agree :-)
I am kind of trying to figure what "The Industry" says about it,
If you need someone else's approval, you've missed the point of Free Software. Each project has their own goals, and there is no "The Industry" to tell us what to do. If you want to be part of a project, find the one that has the same goals as you do, and join them.
I did not express myself clearly. Attempting to better describe the idea I tried to expose, but failed: By doing packages for armv7, and assuming I am working for Mandriva, we are better sticking to what upstream does and supports (read "The Industry" -> "upstream"; I personally can hack here and there, but not much else)
If I understand correctly, neon will have better support for simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
I did not learn much yet about it, but maybe using neon for integer division could be a "huge win", as otherwise, there is no division instruction (well, not in arm mode)...
Thanks, Paulo
I am using some local branches of your scripts to build several combinations for armv7 chroots, stoping at stage2 and building a few rpms:
Not it's expected use-case, but intersting anyway :-)
I did not learn much yet about it, but maybe using neon for integer division could be a "huge win",
It would be a huge loss on chips without neon, though.
Performance is *not* the only criteria we deal with.