Hi,
Sorry for the cross post and long email :-)
Currently I am working on a very initial state build of Mandriva for arm.
Thanks to Jeff Johnson for giving me ssh access to armv7 hosts, and
Matthew Dawkins for building several Mandriva/Unity linux armv5
packages.
What I am trying to understand now is about choice of float abi.
I understand that the IHI0042D_aapcs.pdf file I donwload says
to use vfp registers for float/double arguments, but softfp seems
too good to miss, as armv5 should be around for some time yet.
So, I have two chroots, running:
softfp# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper
Target: armv7l-mandriva-linux-gnueabi
Configured with:
/home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure
--prefix=/usr --build=i586-mandriva-linux-gnu
--host=armv7l-mandriva-linux-gnueabi
--target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx
--with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a
--with-float=softfp --with-fpu=vfpv3-d16 --with-abi=aapcs-linux
--enable-languages=c,c++ --enable-threads=posix --disable-libssp
--disable-libmudflap
Thread model: posix
gcc version 4.6.1 20110722 (Mandriva) (GCC)
thumb# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper
Target: armv7l-mandriva-linux-gnueabi
Configured with:
/home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure
--prefix=/usr --build=i586-mandriva-linux-gnu
--host=armv7l-mandriva-linux-gnueabi
--target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx
--with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a
--with-mode=thumb --with-float=hard --with-fpu=vfpv3-d16
--with-abi=aapcs-linux --enable-languages=c,c++ --enable-threads=posix
--disable-libssp --disable-libmudflap
Thread model: posix
gcc version 4.6.1 20110722 (Mandriva) (GCC)
This is unmodified upstream gcc, and using a set of bootstrap
scripts from a git branch I made on a checkout of
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did
the thumb build to learn about thumb, so far, my impression
is that the best approach should be to use thumb+softfp.
Just so you know I am running thumb and arm builds, with
thumb using hard float and the softfp with arm instructions set:
softfp# objdump -d /usr/lib/libm.so | less
[...]
00008d30 <__ieee754_atan2>:
8d30: e3a0c000 mov ip, #0
8d34: e347cff0 movt ip, #32752 ; 0x7ff0
8d38: e92d4030 push {r4, r5, lr}
8d3c: ed2d8b10 vpush {d8-d15}
8d40: e3a05000 mov r5, #0
8d44: ec432b18 vmov d8, r2, r3
8d48: e3475ff0 movt r5, #32752 ; 0x7ff0
8d4c: e003c00c and ip, r3, ip
8d50: e15c0005 cmp ip, r5
8d54: e24dd02c sub sp, sp, #44 ; 0x2c
8d58: e1a04003 mov r4, r3
8d5c: ec410b19 vmov d9, r0, r1
8d60: e1a05002 mov r5, r2
8d64: 0a000022 beq 8df4 <__ieee754_atan2+0xc4>
[...]
thumb# objdump -d /usr/lib/libm.so | less
[...]
00007884 <__ieee754_atan2>:
7884: 2100 movs r1, #0
7886: 2000 movs r0, #0
7888: f6c7 71f0 movt r1, #32752 ; 0x7ff0
788c: ec53 2b11 vmov r2, r3, d1
7890: f6c7 70f0 movt r0, #32752 ; 0x7ff0
7894: 4019 ands r1, r3
7896: 4281 cmp r1, r0
7898: e92d 03f0 stmdb sp!, {r4, r5, r6, r7, r8, r9}
789c: ed2d 8b10 vpush {d8-d15}
78a0: 461c mov r4, r3
78a2: b08a sub sp, #40 ; 0x28
78a4: eeb0 8b41 vmov.f64 d8, d1
78a8: 4616 mov r6, r2
78aa: eeb0 9b40 vmov.f64 d9, d0
78ae: d03c beq.n 792a <__ieee754_atan2+0xa6>
[...]
I am kind of trying to figure what "The Industry" says about it,
and just checked the linaro gcc-4.6 relevant changes for me
right now, that are...
+ --with-arch=armv7-a --with-tune=cortex-a8 \
+ --with-float=$(float_abi) --with-fpu=neon \
+# check if we're building for armel or armhf
+ifeq ($(DEB_TARGET_ARCH),armhf)
+ float_abi := hard
+else ifneq (,$(filter $(DEB_TARGET_ARCH), arm armel))
+ float_abi := softfp
+endif
If I understand correctly, neon will have better support for
simd instructions right?
Either way, I used two simple benchmarks to try to sell
myself the idea of breaking compatibility with armv5 or
older binaries, but still not convinced, but, as I said, we
should use whatever "The Industry" chooses :-)
I used for benchmark http://www.tux.org/~mayer/linux/bmark.html
and http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI…
and also compared with my home computer (quad)core i5 x86_64,
and attached results...
Thanks and again sorry for cross posting and long email,
Paulo
Hey folks,
My first prototype changes to ld.so (see the attached patch) seem to
work ok, and my code will now complain (and fail) appropriately if you
try to mix soft-float ABI and hard-float ABI binaries in the same
process. I think the working code for the patch itself is clean
enough, but at the moment it's hacked in and I'm looking for a cleaner
way to integrate it.
In the meantime, I've been benchmarking the cost of checking each of
the binaries at startup. I've got a script that does:
* flush caches
* time ld.so $binary --version
n times in a loop, and times each run so I can calculate the mean and
standard deviation. I've tested four different programs (ls, emacs,
iceweasel and kcalc) to give a spread of sizes. Full figures below.
Quick summary: the extra work of checking does not seem to have any
noticeable bad effect on program startup. In some cases it even seems
to make things faster! But in all cases the differences between
"check" and "nocheck" are well within the standard deviation from the
testing, so I'm happy.
I've measured the effects using Debian armel and armhf chroots on my
Panda board, but the specific versions of software and the target CPU
version vary armel/armhf so please don't compare the numbers in that
direction!
Debian armel (v4t, soft-float ABI), using --version:
====================================================
Test using /bin/ls --version, running 100 times
Results from /tmp/time.2592, 100 runs of "ls" using "check":
Total times real 35.20 user 0.00 sys 0.00
Mean times real 0.352 user 0.000 sys 0.000
stddev real 0.068 user 0.000 sys 0.000
Test using /bin/ls --version, running 100 times
Results from /tmp/time.3305, 100 runs of "ls" using "nocheck":
Total times real 35.27 user 0.00 sys 0.00
Mean times real 0.353 user 0.000 sys 0.000
stddev real 0.069 user 0.000 sys 0.000
Test using /usr/bin/emacs23-x --version, running 100 times
Results from /tmp/time.5448, 100 runs of "emacs23-x" using "check":
Total times real 519.23 user 1.32 sys 3.20
Mean times real 5.192 user 0.013 sys 0.032
stddev real 0.430 user 0.015 sys 0.012
Test using /usr/bin/emacs23-x --version, running 100 times
Results from /tmp/time.6166, 100 runs of "emacs23-x" using "nocheck":
Total times real 507.24 user 1.31 sys 3.17
Mean times real 5.072 user 0.013 sys 0.032
stddev real 0.302 user 0.012 sys 0.010
Test using /usr/lib/iceweasel/firefox-bin --version, running 100 times
Results from /tmp/time.6880, 100 runs of "/usr/lib/iceweasel/firefox-bin" using "check":
Total times real 668.45 user 1.69 sys 6.29
Mean times real 6.684 user 0.017 sys 0.063
stddev real 0.692 user 0.018 sys 0.019
Test using /usr/lib/iceweasel/firefox-bin --version, running 100 times
Results from /tmp/time.7696, 100 runs of "/usr/lib/iceweasel/firefox-bin" using "nocheck":
Total times real 651.27 user 1.73 sys 6.27
Mean times real 6.513 user 0.017 sys 0.063
stddev real 0.557 user 0.020 sys 0.019
Test using /usr/bin/kcalc --version, running 100 times
Results from /tmp/time.4018, 100 runs of "kcalc" using "check":
Total times real 1249.40 user 5.77 sys 7.41
Mean times real 12.494 user 0.058 sys 0.074
stddev real 0.887 user 0.022 sys 0.021
Test using /usr/bin/kcalc --version, running 100 times
Results from /tmp/time.4733, 100 runs of "kcalc" using "nocheck":
Total times real 1240.01 user 5.29 sys 7.89
Mean times real 12.400 user 0.053 sys 0.079
stddev real 0.786 user 0.020 sys 0.019
Debian armhf (v7, hard-float ABI), using --version:
===================================================
Results from /tmp/time.7128, 100 runs of "ls" using "check":
Total times real 26.52 user 0.00 sys 0.00
Mean times real 0.265 user 0.000 sys 0.000
stddev real 0.058 user 0.000 sys 0.000
Test using /bin/ls --version, running 100 times
Results from /tmp/time.7843, 100 runs of "ls" using "nocheck":
Total times real 24.74 user 0.00 sys 0.00
Mean times real 0.247 user 0.000 sys 0.000
stddev real 0.038 user 0.000 sys 0.000
Test using /usr/bin/emacs23-x --version, running 100 times
Results from /tmp/time.9991, 100 runs of "emacs23-x" using "check":
Total times real 548.08 user 2.26 sys 2.40
Mean times real 5.481 user 0.023 sys 0.024
stddev real 0.630 user 0.017 sys 0.015
Test using /usr/bin/emacs23-x --version, running 100 times
Results from /tmp/time.10705, 100 runs of "emacs23-x" using "nocheck":
Total times real 549.29 user 2.15 sys 2.43
Mean times real 5.493 user 0.022 sys 0.024
stddev real 0.600 user 0.019 sys 0.016
Test using /usr/lib/iceweasel/firefox-bin --version, running 100 times
Results from /tmp/time.11423, 100 runs of "/usr/lib/iceweasel/firefox-bin" using "check":
Total times real 620.18 user 1.75 sys 5.98
Mean times real 6.202 user 0.018 sys 0.060
stddev real 0.682 user 0.019 sys 0.019
Test using /usr/lib/iceweasel/firefox-bin --version, running 100 times
Results from /tmp/time.12237, 100 runs of "/usr/lib/iceweasel/firefox-bin" using "nocheck":
Total times real 636.31 user 1.77 sys 5.95
Mean times real 6.363 user 0.018 sys 0.060
stddev real 0.766 user 0.018 sys 0.020
Test using /usr/bin/kcalc --version, running 100 times
Results from /tmp/time.8556, 100 runs of "kcalc" using "check":
Total times real 1338.36 user 7.00 sys 5.41
Mean times real 13.384 user 0.070 sys 0.054
stddev real 1.223 user 0.031 sys 0.029
Test using /usr/bin/kcalc --version, running 100 times
Results from /tmp/time.9275, 100 runs of "kcalc" using "nocheck":
Total times real 1345.21 user 7.46 sys 5.02
Mean times real 13.452 user 0.075 sys 0.050
stddev real 1.197 user 0.031 sys 0.030
Cheers,
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
At the ARM mini-summit there was a decision to change the linker path
encoded in binaries for the armhf port from:
/lib/ld-linux.so.3
to:
/lib/arm-linux-gnueabihf/ld-linux.so.3
Well, that's what I was told at least, room was too full when I got
there :)
I made an attempt at changing this simply by changing the static path,
but I got some feedback that pointed out this has problems on
(bi-|multi-) arch systems. So, I made a second pass by modifying the
linker spec to make this a runtime decision.
At this point I'd like to submit both patches as an RFC. I suspect the
static patch might be fine (and safest) for distributions like Debian
that don't yet enable bi-arch on armhf, but not for distros like
Ubuntu which do. The dynamic version should work in the bi-arch case,
but it also will change the default linker path for soft-float
binaries and I suspect we *don't* want to do that. Perhaps we need to
#ifdef that out somehow if we're building for a soft-float default
target?
Also, first time I've touched gcc spec files, so if anyone sees
anything that might be wrong, speakup - you're probably right :)