Continued looking at my constant reuse optimization. I've identified a
couple of hundred optimization opportunities in the whole of gcc itself,
which is fewer than I had hoped. There are almost no opportunities when
compiling for size as constants are always loaded from a constant pool
in that case (I'm not sure why that's the case, given that this isn't
any more space efficient than movw+movt, unless it can share the
constant in more than one place).
Backported my -mtune=native patch to Linaro GCC.
Backported my generic tuning patch to Linaro GCC.
Backported my pr50717 patch to Linaro, and pushed to Launchpad for testing.
Analysed my benchmark results I made to aid generic tuning.
Disappointingly the A8/A9 tuning is not as beneficial as one would like.
In fact, the existing generic tuning patch (which was supposed to be a
framework only) is actually quite competitive and gives better
performance in some cases.
Set more benchmarks running, this time with NEON enabled. That's about
36 hour's worth on A9, and more like 90 hours on my A8 (obviously,
there's some difference in the clock speeds there).
Discovered that my native tuning code won't compile with a C++ compiler
(GCC Bugzilla PR50809). Tested and committed a fix upstream.
== GDB ==
* Worked on support for cross-platform core file generation.
Posted initial set of patches for comments.
* Created "Toolchain support for kernel debugging" blueprint.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I made some progress on transforming the hacks I did to get libunwind
working on Android into proper patches that can go upstream. Things learned:
* bionic employs OpenBSD header files that therefore lack some GNU and
ARM specific defines (only small fix needed - plan is to change
libunwind to work with non-patched bionic too)
* Android basically provides all the functionality that is required
for libunwind-ptrace - but...
* no one seems to build libunwind with remote unwind functionality
(including libunwind-ptrace) only
* most of the issues can be avoided by changing libunwind to be more
portable
Regards
Ken
* Still having trouble with using multistrap/pdebuild-cross for
cross-compiling Firefox - it looks like only x86 packages get downloaded,
not armel. I have asked Wookey for advice, and he will try to reproduce the
build.
* Falling back to native compiling until the cross-compiling set up has been
sorted out. I will now take a look at how to pass different compiler options
to the Mozilla build system and how to build different parts of the program.
Best Regards
Åsa
(short week: 4 days)
RAG:
Red:
Amber:
Green: blog started :-) http://translatedcode.wordpress.com/
Current Milestones:
|| || Planned || Estimate || Actual ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
== other ==
* upstream patch review, putting together pull requests
* more time spent on qemu on ARM host apparent memory corruption
bug (no luck yet :-(); found a Valgrind bug in the process,
though (KDE:284472). This ate up way too much of this week.
* A15 KVM planning work
* meetings etc
* moved over to patches.linaro for QEMU patch tracking
-- PMM
Hi,
* widening shifts - finally committed upstream
* SLP loads with different offsets and operand swaps - committed upstream
* SLP with multiple types - merged to gcc-linaro-4.6
* vectorizer stuff: patch review, test fixes, discussions, bug fix
* Ramana and I discussed what can be done with VEC_PERM_EXPR for NEON,
and created https://blueprints.launchpad.net/gcc-linaro/+spec/support-vec-perm
for this issue.
Ira
Following on from last night's performance call, I had a look at how
64 bit integer operations are mapped to NEON instructions. The
summary is:
* add - fine
* subtract - fine
* bitwise and - fine
* bitwise or - fine
* bitwise xor - fine
* multiply - can't as the instruction tops out at 32 bits. Might be
able to compose using VMLAL
* div, mod - no instruction
* negate - instruction tops out at 32 bits, but could be turned into
vmov #0, vsub
* left shift constant - missing
* right shift constant - missing
* right arithmetic shift constant - missing
* left shift register - missing
* right shift register - tricky, as you do this as a left shift -register
* not - no instruction, but could be done through a vceq, #0?
* bitwise not - missing
I also noticed that the replicated constants aren't being used. A
pre-increment is load constant pool; vadd but could be done as a vmov,
#-1; vsub. The same with pre-decrement - it could be done as a vmov,
#-1; vadd.
This seems worth blueprinting.
-- Michael
limits-fndefn.c takes an impressively long time to run. On an idle
machine, -O3 -g -c takes 17:31 and -O2 -g -c takes The test already
has a dg-timeout-factor of 4 giving a total timeout of 20 minutes.
Removing the -g brings this down to 30 s. Keeping the -g and adding
-fno-var-tracking brings this down to 45 s.
We could bump the multiplier up to 8 but it's getting a bit
ridiculous. Any thoughts?
-- Michael
== Last week and today ==
* Backported fix for returning std::pair<bool, bool>. Unfortunately
this showed up a regression on 4.5. I couldn't reproduce it cross,
and the testcase itself looks innocuous, so I'm wondering whether
the patch might trigger a miscompilation of cc1plus.
* Committed SMS register-scheduling patches upstream and backported
to Linaro 4.6.
* Most of the week spent on -fsched-pressure. Still trying a few
variations in order to get the right balance. (My local haifa-sched.c
now has about 20 new toggles.) Still feel like I'm making progress,
rather than hitting the point of diminishing returns.
Hope Connect goes well. See everyone in a few weeks' time.
Richard
Completed the 4.5 and 4.6 FSF to Linaro merges.
Spun the Linaro GCC release tarballs, uploaded them to the test farm,
and set off the test builds.
Continued looking at the constant reuse optimization. This time I've
build GCC itself with the new pass to see how many optimization
opportunities there are. This shook out a lot more small bugs, which was
useful.
Backported my negative-shifts patch to Linaro 4.6, pushed it to
Launchpad for testing, and then committed it to 4.6 once in was approved.
Experimented with running SPEC2K on A8 and A9 boards in order to
establish a baseline for the generic tuning tweaks. A short test doesn't
give much clue as to what can be achieved, and a long test takes way too
long. The problem is also complicated by the benchmarks where the A8
tuning works better on A9 than A9 tuning does. :S
Received a bug report (GCC bugzilla 50717) for my widening multiplies
patches. Analysed the problem, developed a patch, and posted it to
gcc-patches.
<Short week with 2 days gone on an internal training course>
==Progress===
* Some patch review.
* Spent time looking at LP 836588.
* Tried some different approaches for the vcvt.f64.s32 case and it
looks like the simple solution is the best one unfortunately :(
* 2 days off at internal training course.
=== Plans ===
* continue looking into LP 836588
* Patch review week/
* Work on getting vcvt.f* case done and finish some of the backlog.
Absences.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked -
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
I've just tried rerunning some benchmarks on my panda, which I
reinstalled recently and am getting
some odd behaviour:
The kernel is 3.0.0-1404-linaro-lt-omap
For example:
simple_strlen: ,102400, loops of ,62, bytes=6.054688 MB, transferred
in ,20324707.000000 ns, giving, 297.897898 MB/s
simple_strlen: ,102400, loops of ,32, bytes=3.125000 MB, transferred
in ,7904053.000000 ns, giving, 395.366782 MB/s
simple_strlen: ,102400, loops of ,16, bytes=1.562500 MB, transferred
in ,7354736.000000 ns, giving, 212.448142 MB/s
simple_strlen: ,102400, loops of ,8, bytes=0.781250 MB, transferred in
,91553.000000 ns, giving, 8533.308575 MB/s
simple_strlen: ,102400, loops of ,4, bytes=0.390625 MB, transferred in
,1495361.000000 ns, giving, 261.224547 MB/s
simple_strlen: ,102400, loops of ,2, bytes=0.195312 MB, transferred in
,1983643.000000 ns, giving, 98.461518 MB/s
Note the 8 byte one apparently 40 times faster, and for true oddness:
smarter_strlen_ldrd: ,102400, loops of ,62, bytes=6.054688 MB,
transferred in ,3936768.000000 ns, giving, 1537.984331 MB/s
smarter_strlen_ldrd: ,102400, loops of ,32, bytes=3.125000 MB,
transferred in ,0.000000 ns, giving, inf MB/s
smarter_strlen_ldrd: ,102400, loops of ,16, bytes=1.562500 MB,
transferred in ,4180909.000000 ns, giving, 373.722557 MB/s
Now, while I like infinite transfer rates, I suspect they're wrong.
Anyone else seeing this?
Dave
Implementing estimating register pressure in SMS.
Experimenting with the implementation on libav microbench.
Discussed with Richard some issue raised while implementing.
== GDB ==
* Created and published Linaro GDB 7.3-2011.10 release.
* Fixed LP #871901 (Linaro GDB crashes on 3.x kernels) in
mainline and Linaro GDB 7.3.
* Backported mainline fix for LP #829595 (Separate debuginfo
misidentified with "remote:" access) to Linaro GDB 7.3.
* Completed blueprint "GDB as a cross-debugger".
* Worked on support for cross-platform core file generation.
== GCC ==
* Patch review week.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== 64 bit atomics ==
* Thanks to Ramana for OKing my gcc patches; and Richard for
committing them - I've backported these to the gcc-linaro branch
and pushed it - hopefully those will pass OK!
== String routines ==
* Sent my memchr patch to upstream newlib, received comments,
tweeked, and resent
* Sent strlen patch to upstream newlib
* Spent some time getting confused by timing issues on our Panda; it
got reinstalled with 11.09 a few
weeks ago and is now showing some odd behaviours. In particular I'm
seeing some tests show completion in 0ns
(and my code isn't -that- fast!), and others where the times vary
wildly - it's almost as if a timer interrupt is delayed
or missing; my same test binary works fine on one of Michael's Ursa's
running an older install.
== QEMU ==
* Tested Peters QEMU image for release
== Other ==
* Spent an afternoon reading through the System trace docs
On holiday next week; I'll poll email occasionally.
Dave
Hi,
* working through my inbox after being away
* a former patch of mine accidentally broke remote unwinding on IA64
* maintainer made a quick fix that made things worse for ARM
* posted a patch that aims to fix things up for all archs
* identified and submitted libunwind-android patches that could go upstream
* sent patch that lets another testcase pass on ARM Linux
* tried to help people on the mailing list with various issues
Regards
Ken
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
== add-omap3-networking ==
* patches pushed into qemu-linaro and final testing done
== a15-usermode-support ==
* this is now in qemu-linaro 2011.10; only remaining thing is
for the patches to be taken upstream (no major review issues)
== linaro-qemu-11.11 ==
* qemu-linaro 2011.10 released
== other ==
* completed a desk move
* some sw/hw archaeology to answer a question about the FIFO size of
the PL041 on ARM devboards as part of reviewing audio support patch
* trying to track down a really weird problem where qemu on ARM host
dies with apparent memory corruption
* Working on using pdebuildcross/multistrap when cross compiling Firefox.
With multistrap all dependencies should be sorted out automatically.
The current status is that pdebuild-cross for armel is set up and the
compilation of the Firefox package starts. Does not come all the way though,
because some dependencies (for X11) are not in place after all. Will
continue investigation.
Best Regards
Åsa
The Linaro Toolchain Working Group is pleased to announce the
release of Linaro QEMU 2011.10.
Linaro QEMU 2011.10 is the latest monthly release of
qemu-linaro. Based off upstream (trunk) QEMU, it includes a
number of ARM-focused bug fixes and enhancements.
New in this month's release:
- Instructions introduced with the Cortex-A15 (ARM mode
SDIV and UDIV, and the VFPv4 fused multiply-accumulate
instructions VFMA, VFMS, VFNMA, VFNMS) are now supported
in linux-user mode
- Beagle models now support USB networking (run the model with
"-usb -device usb-net,netdev=mynet -netdev user,id=mynet")
Known issues:
- There may be some problems with running multithreaded programs in
linux-user mode (LP:823902)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.10
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The Linaro Toolchain Working Group is pleased to announce the 2011.10
release of both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 2011.10 is the eighth release in the 4.6 series. Based
off the latest GCC 4.6.1+svn179483, it contains a range of vectoriser
performance improvements and general bug fixes.
Interesting changes include:
* Updates to 4.6.1+svn179483
* Vectorises more straight-line code with data dependencies
* Now picks the best vector width when vectorising straight line code
* Better handles handling auto increment addresses in SMS
* Changes the default vector width from double word to quad word
* Better handling extracting the top or bottom half of a quad word vector
* Now supports the NEON absolute difference instruction
Fixes:
* LP: #689887 ICE in get_arm_condition_code
* LP: #809761 oss4 version 4.2-build2004-1ubuntu1 failed to build on armel
Linaro GCC 4.5 2011.10 is the fifteenth release in the 4.5
series. Based off the latest GCC 4.5.3+svn179438, this is a
maintenance focused release.
Interesting changes in 4.5 include:
* Updates to 4.5.3+svn179438
Fixes:
* LP: #689887 ICE in get_arm_condition_code
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2011.10https://launchpad.net/gcc-linaro/+milestone/4.5-2011.10
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release page:
https://launchpad.net/gcc-linaro/4.6/4.6-2011.10https://launchpad.net/gcc-linaro/4.5/4.5-2011.10
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.3.
Linaro GDB 7.3 2011.10 is the third release in the 7.3 series. Based
off the latest GDB 7.3, it includes a number of ARM-focused bug fixes
and enhancements.
This release contains:
* Support for disabling address space randomization in gdbserver
* Fix for GDB crashes on 3.x kernels
* Fix spurious "CRC mismatch" warnings when using "remote:" sysroot
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.3-2011.10
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
Hi,
* Finished a presentation for NEON forum. Revital and Richard kindly
agreed to take a look and gave me some valuable comments. Thanks!
* widen-shifts:
- While preparing the presentation I found some room for improvement
in the pattern detection, so I implemented it. It gave additional 13%
to rgb24tobgr16.
- Ramana suggested a solution on how to check the constant operand of vshll.
Testing these two things on ARM.
* SLP improvements:
- Implemented a patch that swaps operands if necessary to make the
operations isomorphic, and supports loads with different offsets.
Testing it now.
- The three relevant libav loops now get vectorized giving 42%-57% speedup.
Next week holidays: half days Sunday-Wednesday and Thursday.
Ira