On Thu, Aug 04, 2011 at 12:03:00PM -0700, Taras Glek wrote:
> Recently we have been looking at how to squeeze more performance out
> of our toolchain for building Firefox on Android. Mike Hommey
> integrated GCC 4.6 into the android NDK and has been testing
> performance (with mixed results
> http://gcc.gnu.org/ml/gcc/2011-08/msg00096.html).
You should definitely be trying to build using the Linaro 4.5 and 4.6
compiler branches; they are pretty much guaranteed to give you better
performance, and if they don't, we're on the hook to fix it quickly! All
the patches go upstream, so there is no risk of you being stuck on a
fork -- it just makes everything you need available right now.
I'm copying the linaro-toolchain list to make sure that you get the
right people's attention (though if they weren't all coming back from
Connect in Cambridge this week they would have picked the email up
already).
> I like how Linaro is doing regular arm benchmarking, ie
> https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-…
We do much more than that, but it's not as easy to find right now; for
instance http://ex.seabright.co.nz/helpers/benchcompare is Michael's
regular release benchmark.
> . Would you be interested in adding a Firefox-based benchmark? As a
> large application it is a good testbed for LTO, FDO and other
> aggressive optimizations.
Totally. Let's do it. Can you give me an idea of what boards you are
testing the build on today? Do you have a test suite that we could run
in a reasonable timeframe (hours, not days)?
> We are also looking at setting a developer-friendly android ROM with
> oprofile, perf, systemtap, gdb, debug symbols, etc. It might even be
> beneficial for us to use newer kernels as we exlore options like
> kernel-assisted ld.so relocations, etc. That seems to similar to
> what Linaro provides in the evaluation ROMS. Is there any chance of
> Linaro providing developer-friendly "evaluation" ROMs for retail
> phones like the Nexus S?
It's indeed pretty similar (we just call them LEBs), and Zach will be
really interested in working with you on this.
As for supporting actual released phones, it lies somewhat outside of
our optimal operating model, and we don't have any hardware available. I
guess we could do a spin for a specific model if we had enough of them
to use by a set of engineers in the different teams. They are so
expensive, though. Do you guys have lots of them?
--
Christian Robottom Reis, Engineering VP
Brazil (GMT-3) | [+55] 16 9112 6430 | [+1] 612 216 4935
Linaro.org: Open Source Software for ARM SoCs
Hi there. This is a heads-up that the name of the Toolchain group
releases will change slightly with next weeks release. We're dropping
the respin suffix (the -0) to line up with the new whole of Linaro
naming convention.
What was:
gcc-linaro-4.6-2011.xx-0.tar.bz2
gdb-linaro-7.2-2011.xx-0.tar.bz2
qemu-linaro-0.15-2011.xx-0.tar.bz2
will now be:
gcc-linaro-4.6-2011.xx.tar.bz2
gdb-linaro-7.2-2011.xx.tar.bz2
qemu-linaro-0.15-2011.xx.tar.bz2
Earth shattering, eh? I've taken the opportunity to write up our
naming convention at the same time:
https://wiki.linaro.org/WorkingGroups/ToolChain/Naming
-- Michael
Dave Martin <dave.martin(a)linaro.org> writes:
> However, there's not really anything fundamentally
> architecture-specific about this problem, and ideally the solution and
> the directives should not be architecture-specific either.
> One option which appeals to me is to have some directives which can
> exist across all architectures, and do something analogous to what
> .set push and ,set pop do on MIPS.
FWIW, this sounds like a really good idea to me. I won't argue about
the syntax (I have no particular preference).
> I feel that the environment should also include global,
> target-independent state such as the current macro mode (.altmacro
> versus .noaltmacro) and current ELF section stack state, but not
> symbols or macro definitions themselves.
Sounds reasonable. To state the obvious, we'd have to make the existing
target-dependent groupings (like .set push/pop on MIPS) work with this
new scheme, but those directives musn't affect this extra target-independent
information. So the new directives would interact with both the
traditional .pushsection and the traditional target-dependent directives,
even though those two features would otherwise remain independent.
That is, .pushsection and .set push/pop operate on conceptually
separate stacks whoses pushes and pops can be freely mixed.
But .pushsection and the new directives would need to be
strictly stacked; pops must have the same form as their
corresponding pushes. Combinations of .set push/pop and
the new directives would also need to be strictly stacked.
Nothing a bit of code can't handle though.
Richard
Hi all,
On ARM, we've now hit the problem a few times of temporarily
overriding the assembler state (or rather, not being able to do this
reliably). For example, sometimes there's a need to assemble a few
instructions for a different architecture version so we can optionally
execute or skip them at run-time is not really possible at present.
This sort of feature is especially useful in macros but can be useful
elsewhere too.
There seem to be some target-specific solutions to this problem
already. MIPS has its "option stack", maintained by .set push and
.set pop directives. From the documentation, it sounds like this
saves/restores a somewhat comprehensive set of state, but doesn't make
much syntactic sense on arches which use .set to define symbols (i.e.,
most arches). PowerPC also has .machine push and .machine pop, but
those only act on one specific aspect of the assembler state, and
therefore aren't as portable a concept.
However, there's not really anything fundamentally
architecture-specific about this problem, and ideally the solution and
the directives should not be architecture-specific either.
One option which appeals to me is to have some directives which can
exist across all architectures, and do something analogous to what
.set push and ,set pop do on MIPS.
My names would be .pushenv and .popenv, but obviously, they can be
named any way people like. (For now I'm stealing groff's
"environment" terminology to refer to such saved and restored state --
hence "env". Again, the nomenclature is arbitrary.)
These directives would save and restore a target-specific set of
state, which the philosophy that anything that can reasonably be
changed with a directive mid-file can also be saved and restored with
.pushenv/.popenv. Effectively, .popenv would be equivalent to issuing
the necessary set of assembler directives to restore the assembler
state to whatever it was at the last .pushenv (including the state of
the environment stack itself)
I feel that the environment should also include global,
target-independent state such as the current macro mode (.altmacro
versus .noaltmacro) and current ELF section stack state, but not
symbols or macro definitions themselves. Currently, neither the macro
mode nor the behaviour of .previous is reliably restorable after being
changed (unless I missed something). This can result in unexpected
behaviour after a macro which switches sections or changes the macro
mode. This seems unfortunate since on most arches there is no
syntactic difference between a machine instruction and a macro
invocation -- hence in the presence of macros, the only time you're
really 100% certain what .previous will do is immediately after a
.pushsection or .section directive (which obviously is not much use).
Comments are welcome -- at the moment this is just a fuzzy idea for a
feature which might prove useful.
I haven't investigated the implementation implications -- maybe it
could be built straightforwardly around the current MIPS directives.
Cheers
---Dave
Hi,
* fixed PR 50014 and 50039 - to be backported to linaro-gcc
* tested the patch to change the default vector size on NEON
* found one test that fails with quad-words -
gcc.c-torture/execute/mode-dependent-address.c. Debugging it with
Ramana.
* started looking into widening shifts
Vacation plans:
next week Monday and Wednesday
and August 22 - 30.
Ira
Hi,
ld in the current (4.6-2011.07-0-8-2011-07-25_12-42-06) Android
toolchain fails to link uboot:
arm-eabi-ld: /mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/libgeneric.o:
Unknown mandatory EABI object attribute 44
arm-eabi-ld: failed to merge target specific data of file
/mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/crc16.o
arm-eabi-ld: /mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/libgeneric.o:
Unknown mandatory EABI object attribute 44
arm-eabi-ld: failed to merge target specific data of file
/mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/crc32.o
arm-eabi-ld: /mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/ctype.o:
Unknown mandatory EABI object attribute 44
arm-eabi-ld: failed to merge target specific data of file
/mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/ctype.o
arm-eabi-ld: /mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/div64.o:
Unknown mandatory EABI object attribute 44
arm-eabi-ld: failed to merge target specific data of file
/mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/div64.o
arm-eabi-ld: /mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/errno.o:
Unknown mandatory EABI object attribute 44
arm-eabi-ld: failed to merge target specific data of file
/mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/errno.o
arm-eabi-ld: /mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/ldiv.o:
Unknown mandatory EABI object attribute 44
arm-eabi-ld: failed to merge target specific data of file
/mnt/user/bero/android-iMX53-20110716151649/out/target/product/iMX53/obj/u-boot/lib/ldiv.o
I believe this is already fixed in upstream binutils (or at least in
hjl's 2.21.52.0.2 release from kernel.org /pub/linux/devel/binutils).
ttyl
bero
== Last week (Linaro Connect) ==
* Reran libav comparisons after Ira's fix for excessive promotion.
The vectorized versions are now at least as good as the non-vectorised
ones. Updated wiki page with new asm output and microbenchmark results.
* More work on SMS. I have some patches that wire up the ddg code
to IV analysis. It gave some nice benchmark improvements, but also
some regressions. Traced the regressions down to cases where the
schedule for small iis generated too many moves. E.g. in a small
microbenchmark, we were able to schedule 6 instructions with an
ii of 3 (i.e. in a loop iteration of 3 cycles), but then needed
to add ~9 moves in order to keep the dependencies correct.
We got much better code with a larger ii and fewer moves.
Wrote a patch to estimate how many moves would be added, and to try to
a larger ii if the number of moves is too high. This improved the
results for one benchmark independently of the iv patch, and had no
effect on the others.
Discussed this with Revital, who said that Mustafa had tried a similar
thing but seen no benefit.
* Got powerpc-ibm-aix5.3 bootstraps working. Needs a few local fixes
due to C++ bootstrapping. Used it to test a couple of preparatory
patches for the IV work. Submitted those patches upstream.
* Ran benchmarks with -fno-schedule-insns after seeing that the first
scheduling pass was responsible for the main NEON-vs.-non-NEON
regression in EEMBC. It fixed that case, but as expected,
made others worse. Mentioned this to Ramana, who pointed me at
-fsched-pressure.
Reran the benchmarks with -fsched-pressure instead of
-fno-schedule-insns. It too fixed the main regression,
and improved a couple of other tests too. It showed a regression
in another test though. Looked at that regression. It was a case
where many registers were live across a loop, but not used in it.
This was causing the loop to have a very conservative schedule.
It would be better to spill some of the other registers instead.
Wrote a patch to take loops into account, and it seemed to do
the right thing for EEMBC. Sent it to Andreas, after Ulrich
mentioned that he had been looking at -fsched-pressure problems
on s390. Andreas is away for a while, though, so I might put this
on the back burner until he gets back.
== This week ==
* SMS
* auto inc/dec
* libav, perhaps
Richard
Hi,
* committed upstream a patch that reduces over-promotion of vector operations
* started to work on a new version of the patch to change the default
vector size for Neon
* attended Linaro connect
Ira
* Committed a set of SMS patches to trunk and gcc-linaro branch.
* Implemented a hack to evaluate the potential of SMS on SPEC2006/libqauntum.
* involved in non linaro issue
== QEMU ==
* After discussion with Peter started writing QEMU fixup for 64bit
atomic helper version location.
* Sent fixes for soc-dma code to qemu list
* Trying to understand just how much of omap_dma's code is needed.
== Other ==
* Travelling to/from connect
* Wanted to dial into some of the seessions in Corpus and Magdelen
rooms but the remote audio from them was unusable.
Dave
Hi,
Libunwind:
* finished initial ARM support for remote unwinding (libunwind-ptrace)
Android:
* took a closer look at the debuggerd
* got the perflab benchmark running on my PandaBoard using Linaro GCC
Misc:
* remotely attended some Linaro Connect Android sessions
Regards
Ken
== GDB ==
* Created Linaro GDB 7.3 branch
* Ported all remaining feature patches from Linaro GDB 7.2
* Backported mainline patches to fix remote test issues:
- Fixed #804387 Shared library test problems
- Fixed #804392 Rebuilt executables not copied
- Fixed #804396 Spurious failures
* Committed mainline patch to fix dlopen test cases
for remote testing (#804387).
* Committed mainline patches to fix misc. other remote
test problems (#804396).
== Misc ==
* Attended Linaro Connect in Cambourne.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
I've updated:
https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv
so that it gives the output for current trunk, including Ira's commit
yesterday to reduce the amount of overpromotion. I also reran the
microbenchmarks. The good news is that the vectorised code is now
better in all cases than the non-vectorised code.
The biggest winner from last time was rgb24tobgr16_C(). It used to be
much worse with vectorisation due to lots of excessive widening.
Thanks to Ira's patch, the loop now looks pretty respectable,
and is ~3.25x faster than the non-vectorised code.
As well as using a more recent compiler, the new version also uses
-mvectorize-with-neon-quad. Once again it shows a significant improvement
over the default.
Richard
Continued work on widening multiplied. I've identified another cause for
the bootstrap failure, and submitted the new version for testing.
Continued trying to find out how my thumb2 constants patches are broken.
This is taking ages due to the time it takes to turn around a bootstrap
build on my IGEP board.
Tried to get the CS Panda boards to work again. They'll do the bootstrap
builds much faster (if still not quickly), but are no longer very well.
All my attempts to bring them back up remotely have failed. I've
discovered that the device the serial console on one was connected to
has been relocated to the new Mentor Graphics board lab, so this might
explain some of it ....
Chaired the Monday and Thursday meetings in Michael's absence.
Travelled to the Linaro Connect event in Cambourne, near Cambridge.
Other:
More machine trouble. I keep thinking I have the display issues solved,
and then it starts up with all the windows displayed double sized, but
requiring mouse clicks in the correct location .... typically this
happened just when I needed access to the pin number for the Monday
meeting. This hasn't happened since Monday, so hopefully it's now ironed
out ... this sort of thing does not happen with Windows. :(
----
Upstream patched requiring review:
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* Looking into SMS patches sent to mainline which expands SMS
functionally to avoid using doloop. The patches resolve the recent
bootstrap failure on mainline.
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html
* Continue looking into 462.libquantum.
Valgrind wants a less stripped ld-2.12.1.so or it won't work. The build
process (that Michael Hope put together) just downloads the
libc6_2.12.1-0ubuntu6_armel.deb, and the ld-2.12.1.so in there is fully
stripped. I thought I'd be able to just get the
libc6-dbg_2.12.1-0ubuntu6_armel.deb instead, thinking that was just the
pre-stripped version of these libs -- but apparently it's not, because
trying to use those libs instead of the stripped ones results in undefined
symbols. For example, ld-2.12.1.so defines _rtld_global -- but
libc-2.12.1.so is looking for _rtld_global@@GLIBC_PRIVATE, so
_rtld_global@@GLIBC_PRIVATE
ends up undefined. (Ditto for __tls_get_addr, __libc_enable_secure,
_dl_argv, etc.)
I'm not sure who actually builds these packages (they're retrieved from:
http://ports.ubuntu.com/pool/main/e/eglibc/), but if anyone has any
suggestions on how to get past this, I'd be most appreciative. (I've got
angry developers trying to track down memory issues, who about to come after
me with torches and pitchforks :P )
Thanks,
Diane
== GDB ==
* Committed second mainline patch to fix re-built executable
remote test problems (#804392).
* Prepared for rebasing Linaro GDB on top of GDB 7.3 release.
== Misc ==
* Prepared for Linaro Connect.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
* Monday was full of IBM internal meetings
* Android
* got a self built LEB and generic version 2.3.4 of linaro android
running on my pandaboard (build with the gcc 4.6 07 release plus the
patch that Richard made)
* requires libicui18n.so (external/icu4c/i18n) to be built with -O2
* ran into a few issues (816491, 807230)
* libunwind:
* simplified the local unwinding (there is no need to touch the ARM
exidx table segment when looking it up)
* fixed a bug (corner case: the info of the IP to be unwound is
described by the last unw entry)
* made some progress on the remote unwinding via ptrace
* remotely searching the unw withing entry exidx table segment
* next step is to remotely extract the unw isns
Regards
Ken
RAG:
Red:
Amber: OMAP3 patch upstreaming is slower progress than hoped
Green: various outstanding patches accepted upstream in time for 0.15
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
== upstream-omap3-patches ==
* omap-gpmc patches now all cleaned up; I think I need to look at
qdevifying this device before submitting patches, though
* sent patch for bug which makes n810 model crash when key is pressed
* sent a pull request collecting together the patches submitted so far
== other ==
* qemu 0.15: put together pull request for ARM patches I think should
go into this release; wrote ARM-related bits of the release notes
* helped GSoC student track down a bug causing android not to boot
* LP:816791: tracking down issues with running mono under qemu
(combination of a couple of known qemu bugs and a mono bug)
* admin/prep for upcoming travel (cambourne, vancouver, orlando)
* reviewing pl041 patches which add audio support to versatilepb
and vexpress models
* mailing list discussion of possible new qemu object model
* lots of meetings this week (toolchain, standup, doughnuts, team
comms x2)
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
1-5 August: Linaro sprint 1111
15-19 August: KVM Forum and LinuxCon NA, Vancouver
the current gcc-4.6/eglibc is now built multilib'd for -mfloat-abi=softfp|hard,
including the GCC runtime libraries. I hope that the gcc cross builds will pick
this up soonish, not needing to build the cross compiler twice for softfp and
hard float-abi.
Matthias
== 64 bit atomics ==
* Sent updated set of 64bit atomic patches to gcc list with fixes
from previous review
* Started hunting for other users of 64bit atomics than membase
jemalloc, sdl and boost lock free look like possibilities; but I've
not looked at them hard yet
== QEmu ==
* Released fix for last SD card block access error
- Vincent Palatin released a bunch of SD card fixes a few hours
later - that included a fix to the same bug; however it does look like
he has a bunch of other stuff we should keep sync'd with.
* Changing caching mode to writeback on the block layer fixes bug
732223 (hangs on heavy IO) - goes from 130KB/s to 8MB/s on vexpress
- Asked mailing list whether that's reasonable to make as default for SD
* Looking at path from CPU->MMC/SD card - the DMA on OMAP is pretty
inefficiently emulated, but the soc_dma code has an unused special
case for dma'ing to hardware, looks promising but need to figure
out how to use it and if it works.
* Comparing Vincent's SD card patch with earlier meego patches;
partial overlap.
== Other ==
* Pinged libc-ports for comments on my optimised memchr patch
* Image testing
Next week; I intend to be in Camborne on the afternoon of Monday,
Wednesday and Friday.
Dave
Hi,
I am checking the coverage of the NEON instructions mostly by writing
tests in C to check which instructions are generated (after
auto-vectorization) and which are not.
I put here https://wiki.linaro.org/IraRosen/Sandbox/InstructionCoverage
the list of things that I've checked till now.
Ira