Spun release tarballs for Linaro GCC 4.5 and 4.6. Sent them to Michael
Hope and Matthias Klose.
Testing for my widening multiplies patches revealed a bug when the
accumulate value had a different type. The problem is easily fixed, so
I've created a patch, submitted it, and now it's approved upstream.
Same again, this time with a bug involving constant integers. Again,
easily fixed, submitted, and approved.
Nobody had reviewed the first patch in my series - Richard Guenther had
reviewed all the others, but wasn't happy to review the expand pass. So,
I asked newly crowned RTL Maintainer Richard Sandiford to review it, but
apparently it's the wrong bit of the back-end, so I asked Bernd instead.
Bernd kindly reviewed and approved it, so now the whole series is ready
to commit if only my test comes back clean.
Continued trying to figure out why my thumb2 constants patch is broken.
So far, no further progress. It might be that Michael's build system is
confused, but it's looking likely to be a real bug.
----
Upstream patched requiring review:
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
- Opened PR49789 to record the bootstrap failure with SMS flags.
- SPEC2006/libquantum: Wrote a hack to apply SMS on the hot loop. Need
to make it more accurate.
- Pinged SMS patches in mainline.
- Looking with Ramana on the effect of the Tree reassociation
improvement patch on bwaves
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00904.html
== 64 bit atomics ==
* Updated gcc patches as per comments from Ramana and Joseph; build
currently cooking on Panda
== Qemu ==
* Testing Peter's pre-release, finding bug on beagle (that he
tracked down to x-loader change)
* Found cause of occasional SD card errors I was seeing (SD: CMD12
in a wrong state); I'll cut
a patch next week, but the bug is writing the last sector throws
an error and also leaves it in
the wrong state
* Added a bunch of tracing code to the SD card layer
* With the tracing code and fixing the other bug I'm starting to
understand how it works - and
half a dozen reasons that the emulation is really slow; whether
that's the cause of the reported
recoverable lock ups under load is an interesting question; I
plan to fix the obvious problems
and see how it goes.
Dave
== GDB ==
* Committed mainline patch to fix re-built executable remote test
problems (#804392).
* Committed two more mainline patches to fix remote test issues.
== GCC ==
* Patch review.
* Determined root cause of bug #809768 (ICE in bionic libm).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== GCC ==
=== Progress ===
* ivopts patch to minimise the amount of VFP moves to integer
registers because of auto-inc - sent out for review. It appears to
test fine and reduces the number of FP to integer moves in certain
SPEC2k6 benchmarks by about 20%
* More cases with vfp moves identified and some more patches coming
out soonish. one-case where we have moves from VFP regs to integer
regs because we allow POST_MODIFY_DISP.
and another case from scimark where we have a case with moves from
integer registers to VFP registers because I suspect the order of
constraints in movdf_vfp has integer registers before fp registers
while
the movsf_vfp doesn't .
* Panda died a couple of times again because of power glitches in the
office - restarted runs for BRANCH_COST.
* Sorted out travel plans for UDS orlando. Need to book tickets.
* Sometime spent on getting the Eagle boards working.
* Bug triage and some patch review.
=== Plans ===
* Benchmark the ivopts patch to see what happens.
* Some issues with my last patch on movdi_vfp . I think I've missed a
set of ce_count and was thinking of why there was an Ada failure
with things outside IT blocks There appears to be an ubuntu bug for that.
* Disable POST_MODIFY for VFP mode values and see what happens and
change the order of the constraints to have loads to VFP registers
before loads to core registers for movdf_vfp and thumb2_movdf_vfp.
* Look at effects of auto-inc-dec with the VFP mode stuff.
* Look at EPILOGUE_USES and clear more of my patch queue.
Meetings:
* 1-1s
* TCWG calls
Absences.
* 1st Aug - 5th August - Linaro sprint.
* 8th - 9th August - Internal training.
* 29th Aug - Sept. 2 - Holiday booked and approved.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel to be booked.
RAG:
Red:
Amber: OMAP3 patch upstreaming is slower progress than hoped
Green: various outstanding patches accepted upstream in time for 0.15
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
== linaro-qemu-11.11 ==
* tracking down a problem with very recent beagle snapshots not booting
in qemu; this turns out to be an x-loader bug (LP:813407)
* made the release
== other ==
* upstream are planning to branch for 0.15 release today
* most of the outstanding ARM patches have now been pulled
* reviewed a patch adding ARM1176 support
* wrote a patch fixing the feature flags for ARM1136r1 so it includes
the TLS registers (needed as newer kernels now try to use them)
* submitted some patches fixing a few VFP UNDEF/UNPREDICTABLE cases so
they don't crash qemu
* submitted patch to make v6 cp15 barrier insns work in linux-user mode
* looked at a reported problem where linux kernel versions 2.6.39+
display graphics wrongly. This turns out to be that 2.6.39 (or 38)
changed (inadvertently?) from programming the versatilepb CLCD as
RGB565 to setting it to BGR565; qemu wasn't implementing the latter.
Dusted off some PL111 support patches, added the mux control support
for PL110 and submitted them.
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
1-5 August: Linaro sprint 1111
15-19 August: KVM Forum and LinuxCon NA, Vancouver
== This week ==
* Wrote a fix for 809768. Accepted upstream.
* Looked at upstream PR 49742 (the failures seen with predictive commoning).
Accepted upstream.
* More shrink-wrap review.
* Sent auto-inc-dec changes out for comments. Got some good private
feedback (in the sense of being positive, and having good suggestions).
* Sent a related define_bypass patch out for review.
* Started looking at sms-and-memory-dependencies.
== Next week ==
* Deal with auto-inc-dec suggestions.
* More SMS.
The Linaro Toolchain Working Group is pleased to announce the release of
both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 is the fifth release in the 4.6 series. Based off the latest
GCC 4.6.1+svn175677, it adds new optimisations and vectoriser improvements.
Interesting changes include:
* Updates to 4.6.1+r175677
* Improves support for vector shifts by a constant
* Improves handling of memory dependencies in the SMS optimisation
* Improved vectorisation of widening multiplies by keeping the operands
smaller for longer
* Improves the peeling of potentially misaligned vectorised loops
* Improved vectorisation of signed and unsigned widening multiplies by a
constant
* Merges the new upstream Cortex-A5 tuning
Fixes:
* LP: #721531: Don't optimise out testing of the Thumb mode bit on function
pointers
* LP: #723185: ICE in reload_cse_simplify_operands when compiling with -marm
-mfpu=neon
* LP: #744754: ICE in *neon_movoi when using NEON intrinsics
* LP: #791327: ICE due to using the stack pointer in RSB instructions
* LP: #797748: ICE building SPEC2006 403.gcc emit-rtl.c
* LP: #803232: ICE on code that uses vld4q_s16() NEON intrinsic
* LP: #809435: Omit building the target libiberty when building a cross
compiler
* LP: #807573: ICE in *truncsisf2_vfp: Could not find a spill register
* PR 49385: Ensure at least one of the operands is a register in
thumb2_movhi_insn
* Fixes an EABI unwinding bug that improves interoperability with armcc
* Fixes a DWARF 2 problem exposed through shrinkwrap.
* Fixes a bug in __builtin_isgreaterequal
Known issues:
* Building Python 2.7 with -mfpu=neon exposes a bug in vmov.i64 in binutils
2.20.51. Please use 2.21 or later.
Linaro GCC 4.5 2011.07 is the twelfth release in the 4.5 series. Based off
the latest GCC 4.5.3+svn175676, the release is focused on maintenance.
Interesting changes in 4.5 include:
* Updates to 4.5.3+r175676
Fixes:
* LP: #721531: Don't optimise out testing of the Thumb mode bit on function
pointers
* LP: #723185: ICE in reload_cse_simplify_operands when compiling with -marm
-mfpu=neon
* LP: #744754: ICE in *neon_movoi when using NEON intrinsics
* LP: #797748: ICE building SPEC2006 403.gcc emit-rtl.c
* LP: #803232: ICE on code that uses vld4q_s16() NEON intrinsic
* Fixes a DWARF 2 problem exposed through shrinkwrap.
The source tarball is available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2011.07https://launchpad.net/gcc-linaro/+milestone/4.5-2011.07
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
Hi,
- I finally submitted the over-widening patch, but Richard Guenther
thought that this optimization should be done for scalars as well, and
he is now working on this himself.
- Some auto-vectorizer fixes
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.07.
Linaro QEMU 2011.07-0 is the latest monthly release of qemu-linaro. Based
off upstream (trunk) QEMU, it includes a number of ARM-focused bug fixes
and enhancements.
This month's release is primarily minor improvements:
- Fixes a compile failure on ia64 hosts
- syscall 369 (prlimit64) implemented in linux-user mode
- Fixes an ELF loader bug that caused problems with binaries generated
by the Google Go compiler
Plus of course new upstream fixes and improvements.
Known issues:
- The beagle and beaglexm models still do not support USB networking
- Very recent Linaro omap3 hwpacks (20110716 and later) do not boot on
the beagle model; this is caused by an x-loader bug (LP:813407)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.07
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
Hi All,
Apologies for missing the stand-up call today.
I've been having technical difficulties at my end. :(
I think they're resolved now ... maybe.
Andrew
Hi,
* continued to look into #809768 (ICE when building bionic's libm)
* created some toolchain and android builds for verification purposes
* libunwind
* discussions with Michael and Uli on how to proceed (thanks!)
* started to work on libunwind-ptrace
* also look for .debug_frame info if there is no .eh_frame info
for the given IP
* mimics the behaviour of the reworked local unwinding
* Attended an IBM internal class on Wednesday
Note: I'm off for two days (21-22) and back on Monday.
Regards
Ken
Hi there. The 2011.07 release has been spun and is testing up well.
The 4.5 and 4.6 branches are now open so feel free to commit any
approved patches.
-- Michael
== GCC ==
=== Progress ===
* Identified particular patterns that have issues with scheduler
descriptions in A8 and A9 . Fixes to be benchmarked next.
* Spent sometime on the new tree-reassoc work but SPEC2k failed for
some of the neon configurations. Needs investigation.
* T2 perf call.
* Looked at libquantum bits with Revital.
* BRANCH_COST benchmarking now complete for T2 . Same is running for ARM state.
* Some patch review and bugzilla triaging upstream.
=== Plans ===
* ivopts patch for RichardS to try out - related to the excessive
moves between integer and VFP unit.
* BRANCH_COST further results.
* Submit scheduler patches upstream after benchmarking.
Meetings:
* 1-1s
* TCWG calls
Absences.
* 1st Aug - 5th August - Linaro sprint.
* 8th - 9th August -Internal training.
* 29th Aug - Sept. 2 - Holiday booked and approved.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel to be booked.
Continued responding to review comments on my widening multiply patches.
Wrote large parts of most of the patches to fix bugs and tidy them up.
The result is that all but patch 1 are now approved. Pushed the patches
to Launchpad for final testing.
Monitored the test status of my thumb2 constants patch, but it still
hasn't returned any results. It seems Michael has been having some
problems with his systems.
Went back to looking at merging patches to 4.6. It's only really the
hard ones left. Many are blocked on work that needs to be done by
somebody else. Pinged Tom and Bernd to find out the status of their ones
- all are stuck on the back burner.
----
Upstream patched requiring review:
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* Widening Multiplies 1/7
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg08721.html
- Tracked the problematic file which contains the loop that causing
bootstrap failure with SMS flags on ARM machine. It is not caused by
SMS but rather due to doloop optimization which is applied when SMS
flags are set. Now working on locating the exact loop and producing a
testcase to reproduce the error.
- Looking into Spec2006/libquantum benchmark - it has hot loop with
conditional store which suppress SMS as it is only applied on single
basic-block loops. If-conversion can not be done (replacing the store
with conditional move and then a store) because in order to do that we
need to prove that there is a store to the same location in each
iteration of the loop; and it is not the case in this loop.
Apparently, when running with crotex-a8 flag cond_exe statement is
generated for the store but that's happening only after register
allocation pass which is applied after SMS (IIUC,moving the generation
of cond_exe before RA is not trivial
http://gcc.gnu.org/ml/gcc/2000-05/msg00079.html)
So, I'm looking into teaching SMS to handle conditional statements
based on technique presented in [1]. This change is not trivial so I'm
going to estimate the potential of applying SMS on the loop at first
stage.
[1] M. Lam, "Software pipelining: an effective scheduling technique
for VLIW machines"
== GDB ==
* Tested GDB 7.2.91 prerelease on ARM; everything looking good.
* Created a set of patches to prepare for Linaro GDB 7.3 series;
verified release process on top of a current 7.3 snapshot.
* Committed three mainline patches to fix shared library remote
test problems (#804387).
* Reviewed Yao's latest Thumb-2 displaced stepping patch.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== String routines ==
* Sent a patch to libc-ports with modified configure scripts to add
subdirectories
for architecture specific ARM code, and the memchr.S from cortex-strings.
== 64 bit atomics ==
* Working through comments on my patches and the set of discussions about the
kernel interface for the helper case - not really sure which way
that's going to go.
== QEmu ==
* Looking at how tracing works, considering adding tracing to sd
card code to help
track down some of the sd card issues.
Dave
== This week ==
* Fixed the unnecessary union initialisers that were causing ICEs
with -g. This turned out to be a lot more work than Richard's
one-liner suggested. :-)
* Backported Chung-Lin's arm_legitimize_reload_address patch to 4.5.
* Backported the smallest_mode_for_size patch to 4.5 and 4.6.
* Patch review.
* Found an off-by-one error in the vectoriser that caused it to think
that contiguous memory regions overlapped. Unfortunately, this meant
that a lot of my microbenchmarks were using the fallback ARM code
instead of the nice-looking NEON code that I could see in the asm.
* A bit more work on auto inc/dec. It tested regression-free for all
default languages. Ran some more benchmarks and posted the results.
== Next week ==
* Bugs and auto inc/dec.
Richard
Hi,
* analyzed/tested toolchain issues the Linaro Android folks are facing
* libquadmath disabled due to configure test fail of the target
libiberty (#809435)
* fix will be in 11.07 release
* ICE when building bionic's libm (#809768)
* not reproducible with a "plain" Linaro GCC
* non upstreamable workaround in place
(prevents the ICE but degrades the DWARF quality)
* binary toolchain at http://people.linaro.org/~kwerner/
* libunwind
* localunwrework branch now on git.linaro.org
Note: I'll take two days off at the end of next week (21-22).
Regards
Ken
Achieved:
* Set up networking on the Panda board, ssh to the board from my laptop
works fine.
* Downloaded the benchmarks (SPEC2000 and EEMBC) and built them for x86. I
now have a basic understanding of what the benchmarks do and how to run
them.
* For EEMBC I used the -m32 flag for building on my 64 bit installation.
* For the SPEC2000 I used the Linaro configuration file and enabled the
portability flags for a 64-bit host. Played around with different "runspec"
actions and options. Building and running individual test cases as well as
the full test suites. Finally I did a reportable run for all benchmarks. It
took several hours and in the end I got a result for my laptop.
Next step:
* Cross-compile EEMBC and SPEC2000. I will try linaro-gcc and the cross
compiler that comes with Natty.
Best Regards
Åsa
Hi,
- merged over-widened multiply patch to gcc-linaro-4.6 (now vectorized
rgbyiqv should be about as good as its scalar version)
- continued working on over-widened shifts and bit operations
Ira