== This week ==
* Away Monday, and a fair bit of time on non-Linaro duties.
* Looked at Dave's gromacs bug (693502). Turned out to be a reload
inheritance problem. Tested a patch. Spent some time coming up with
a brute-force testcase that I can submit with the patch.
* Found a bug in the x86 and x86_64 ifunc support that would affect
ARM too if we weren't careful. Came up with an example testcase
and filed the bug upstream:
http://sourceware.org/bugzilla/show_bug.cgi?id=12366
It has been fixed by H.J. Lu. I'm wondering about taking a
slightly different approach for ARM.
* More ifunc work.
* Tried again to reproduce the chrome failure, making sure to use
DEB_BUILD_HARDENING=1. (I hadn't realised first time round that,
in reaction to this bug, debian/rules specifically excluded armel
from the automatic DEB_BUILD_HARDENING=1 setting.) The build
takes a couple of days on my BeagleBoard.
Heh, and just as I wrote that, the build failed with the reported
link error. Neat.
== Next week ==
* Finish testing the patch for 693502 and submit it upstream.
Backport the patch to our tree once accepted.
* Look at the chromium problem.
* More ifunc.
Richard
RAG:
Red:
Amber:
Green: qemu git pull request accepted, patches seem to be flowing
into qemu upstream more freely now
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | |
Bonus extended holiday edition:
This report includes a number of things that happened over
the Christmas holidays as well as this week (which is
a short one, only 3 days).
Progress:
* merge-correctness-fixes:
** my git pull request for various ARM qemu patches was merged!
** a number of other patches were merged to qemu master:
+ implement correct NaN propagation rules
+ rename softfloat float*_is_nan() functions
+ fix UMAAL (Aurelien's patch, reviewed by me)
+ VQSHL (reg) patchset
** diagnosed the segfault in
https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/604872
and wrote a patchset which fixes it (and related problems):
http://patchwork.ozlabs.org/patch/77887/ (n/7)
** wrote and posted a patchset which implements save/restore
for the versatile platform (so I didn't have to wait 10
minutes for the test case to reach the segfault)
http://patchwork.ozlabs.org/patch/76529/
** finished and posted a patchset implementing flushing of
denormals to zero on input:
http://patchwork.ozlabs.org/patch/77798/ (n/3), now committed
** reviewed/tested Aurelien's SMMLA/SMMLS patch (now committed)
** wrote and posted patches which implement the FS_IOC_FIEMAP
ioctl (http://patchwork.ozlabs.org/patch/77725/) and
the file_sync_range{,2} syscalls
(http://patchwork.ozlabs.org/patch/77723/) -- these are used
by apt, and so linaro-media-create was generating a lot of
warnings from qemu about their lack of implementation
** posted patch to clean up NaN handling in linux-user NWFPE
emulation (follow-on from earlier NaN cleanups):
http://patchwork.ozlabs.org/patch/77795/
* maintain-beagle-models:
** implemented minimal ARM cp14 debug registers
and a random register in the TWL4030 emulation (both needed
to get recent Linaro kernels to boot on the beagle model)
Submitted merge request upstream:
http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/3
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Meetings: toolchain standup, pdsw doughnuts
Absences:
2011: Dallas Linaro sprint 9-15 Jan. Holiday 21 Jan, 22 Apr - 2 May.
Hi,
* implemented reduction support in SLP, I'll check if it helps
DenBench next week
* helping Sebastian Pop with if-conversion for vectorization
improvements (BTW, Sebastian's goal is to vectorize kernels from
ffmpeg)
* fixed GCC PR47139
Ira
Hi All,
Thanks for attending the call. I think we had some interesting discussions.
I've posted the minutes from the call on the same page as before:
https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations
I'll try to get the audio posted somewhere for anybody that's interested.
Andrew
You may have noticed that I have created a new BZR/Launchpad branch for
Linaro GCC 4.6:
lp:gcc-linaro/4.6
https://code.launchpad.net/~linaro-toolchain-dev/gcc-linaro/4.6
Up until now, this has not been buildable due to unfixed bugs. However,
upstream GCC have now straightened out the problems, so I have pushed a
buildable version into the branch.
I shall attempt to keep this branch as up-to-date as I can (at least, I
will once the holiday season and January travel are over), but I'll only
push updates if they build for me, so hopefully the branch should remain
fairly stable, at least for our purposes.
Note that so far I've only tested build-ability. Right now I'm not
making any promises about the quality of the compiler.
At some point, we'll want to use this branch to hold our own patches
(both those that will never go upstream, and those that are queued for
GCC 4.7), so it will diverge from upstream 4.6 a bit. For the moment,
it's merely a mirror.
Andrew
== Last Week ==
* Got a new ARM-specific unwind test case working, so its integration
with libunwind stands a marginally better chance of working (once it's
finally finished).
* Sent a ping to binutils mailing list about a patch to improve readelf
that I had sent in at the beginning of December. Still no response.
* Holidays and Vacation
== This Week ==
* Try to finish libunwind integration, as my time with Linaro is nearly
over. Update documentation on wiki to reflect current status.
* Ping ltrace list about a new release. It seemed so close, then the
list went abruptly silent.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== Linaro GCC ==
* Continued looking at element/structure load/store intrinsics
improvements. Some good initial results: it looks like the plan of
using "extra-wide" vectors for returning struct results works fine (at
least to a first approximation). Sent off WIP patch (internally to CS
only, so far).
Incidentally this looks like it'll be a good stepping-stone for the
"RTL half" (vs. the "tree half") of the representation of
element/structure loads/stores (e.g. vld2/vst2) also. The return type
(for loads) and argument type (for stores) of the RTL patterns for such
instructions is changed from the current wide-integer representation
(OImode, etc.) to a suitable wide vector instead (e.g. V16QImode). This
change might help lead to a more meaningful mapping from an equivalent
tree form -- though we haven't quite got the whole picture yet, as the
middle-end won't want to know about the ARM-specific and
non-standard-named patterns for the element/structure loads/stores.
(One might imagine a new standard-named RTL expander taking care of that
though.)
== Vacation ==
* Vacation Dec 20th-Jan 4th.
== GCC issues ==
* PR44557, Thumb-1 ICE, looked at the ARM specific secondary reload
parts, as well as some general reload internals context. Concluded that
concerns on Thumb-2 about my submitted patch should be unneeded, as the
reload_in/out patterns should never be used for Thumb-2. Also looked a
bit on how we should upgrade ARM to use TARGET_SECONDARY_RELOAD.
* PR45416, ARM code regression. Started working on this again, cleaning
up patch to submit.
* Had some email discussion with Revital Eres on Swing Modulo Scheduling
(SMS) for ARM issues, mainly on how the doloop_end pattern should be
done on ARM.
* Submitted and committed an obvious small patch for a VFP testsuite case.
== This week ==
* Continue on GCC issues.
* Flying to Dallas on Sunday, prepare for trip.
Hi,
* continued with my attempts to vectorize Viterbi:
- finished implementation of conditional store sinking in cselim
pass (I did only limited testing).
- reconsidered the idea of safe load if-conversion if an adjacent
field of the same structure is accessed unconditionally - this may be
incorrect. Instead I tried the last, not yet committed, patch by
Sebastian Pop that implements if-conversion for such cases of not-safe
data accesses. His patch if-converts the loop in Viterbi, however, it
also makes the loop not vectorizable - additional work should be done
in the data-refs analysis and the vectorizer to make it work.
Sebastian is working on the first part, and I'll help him with the
vectorizer part if necessary.
* analyzed EEMBC DenBench, couldn't find any action items for now. But
vld/vst support of strided data accesses should be very useful for
these benchmarks.
* fixed GCC PR testsuite/47057
* looking into SLP of reduction as in PR 41881. I saw similar patterns
several times in DenBench, but I'm not sure that SLP of reduction is
enough to vectorize all of these cases.
Happy New Year,
Ira
== Last Week ==
* Continue with libunwind. Wrote a new unit test for ARM-specific
unwinding code to help debug that new code's problems. Almost got it
working, which I hope means its integration with libunwind may be
nearing completion.
== This Week ==
* Try to finish ARM-specific improvements to libunwind. Famous Last Words.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== GCC related ==
* Launchpad #693686, GCC ARM segfault ICE when building Chromium in V8.
Spent some time reproducing; this ICE seems to be in the maverick
gcc-4.5, at the vectorizer phase. As the ICE happens in
tree-vect-stmts.c:supportable_widening_operation(), I'm suspecting
(without further verification yet) this might be due to vmovn not
backported? (Linaro 4.5 does has this ported I think)
* PR44557, Thumb-1 ICE. Looking further after seeing Richard Earnshaw's
comment on my patch. It would be nice if we could upgrade the entire
secondary reload bits, looking into this.
== This week ==
* Look into more GCC issues.
* Get some backports done.
== Linaro GDB ==
* LP:615972
Get patch approved upstreams. Committed to FSF tree. Propose merge
request to Linaro GDB tree.
* LP:616003 gdb.mi/mi-var-display.exp failure
Discussed in upstreams on how to handle fp in ARM/Thumb mode. Finally
work out a one-line patch. Approved and committed to FSF tree. Propose
merge request to Linaro GDB tree.
Draft another patch to clean up ARM register alias. Pending on upstreams.
* LP:616000 Handle -fstack-protector prologue code
Revise patch per Joel's comments. Approved, and committed to FSF tree.
Draft two patches to handle -fstack-protector prologue code on i386.
Sent them out for review. Due to lack of knowledge on i386 prologue
generate, not very confident on one of these patches.
* LP:615980 Support displaced stepping on Thumb
Get my test case to arm displaced stepping approved, and committed to
FSF tree.
A patch about supporting displace ARM insn in Thumb area is pending
upstreams. Tried the 2nd approach since the 1st approach is not
acceptable to upstreams reviewers. Without this patch, ARM displaced
stepping doesn't work on Linaro.
Support another three PC-related 16-bit Thumb insns (adr, ldr, and
cbz), and add test cases for them accordingly.
Spend some time splitting my big patch into three relatively small
patches in order to make them easier to be reviewed. Patches on
supporting Thumb 16-bit displaced stepping are sent out upstreams for
review.
== This Week ==
* Work from Mon. to Wed.
** Backport some approved upstreams patches to Linaro GDB
** Anything I should do for my pending patches.
* Vacation on Thu. and Fri. 3rd Jan. is China public holiday. Back to
work on 4th. Jan.
--
Yao (齐尧)
Khem Raj <raj.khem(a)gmail.com> wrote:
> The bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46883 files
> against GCC trunk also happens with linaro gcc 4.5
> My guess is that there is a backported patch from trunk into linaro
> 4.5 tree thats causing this ICE
>
> This ICE does not happen on upstream gcc-4.5 branch
Thanks for the bug report!
> I havent figured out the commit yet.
It looks like the regression was introduced by Bernd Schmidt's
patch to improve zero-/sign-extensions (PR 42172), which we
did indeed backport to Linaro GCC 4.5. (I've updated the
PR 46883 bugzilla with more details.)
> Should you need a bug in linaro
> bug tracker I will be happy to file one
Yes, please do so; this makes it easier to track the problem
on the Linaro side. Thanks!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== GCC ==
* Checked in mainline fix for #617384 and submitted backport merge
requests (.debug_line is wrong with -fpic)
* Submitted backport merge requests for the fix for #662324
(Pointer type information lost in 4.5 debuginfo)
* Checked in mainline fix for #693425 and submitted backport merge
request (SPU back-end incompatible with extension elimination pass)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I was on vacation on Sunday and starting from Tuesday stayed home with
a sick child, so I only had a couple of days to work.
* vectorization of Viterbi:
- continued implementing conditional store sinking in cselim pass
- made if-conversion to work on loads of structure fields if other
field from the same structure is accessed unconditionally
* fixed GCC PR 47001.
Ira
Continued looking at SPEC 2006.
The two ICEs I mentioned last week are gone on the Natty version of the
compiler, however the 4 programs that run and give the wrong
results still happen with the Natty version and the latest version from bzr.
The 4 failures are:
h264ref - still fails on bzr 99447 with -O2 or -O0
sphinx3 - still fails on bzr 99447 with -O2 or -O0
gromacs - still fails on bzr 99447 with -O2 but works with -O1; I've
followed this through and detailed it in bug 693502; it looks to me like
a post-increment gone wrong (it's split so it's not
actually a post increment and the original rather than post inc'd value gets
used)
zeusmp - this fails to load the binary; it's got a >1GB bss section.
Interestingly it gets further on my beagle with less memory but a bit of
swap,
even though I think it's not really using all of the BSS
in the config I'm using.
I'm hoping to leave a 'ref' run going over the new year.
The canis1 Orion board I was also running Spec on last weekend died during
the run and hasn't come back.
perf
We now have silverberry using the -proposed kernel which has the fixed
PERF_EVENT config, and perf seems to work fine.
libffi
I've started building the page
https://wiki.linaro.org/WorkingGroups/ToolChain/FFIusers listing things
that use FFI; (generated by a bit of apt wrangling).
There are basically 3 sets:
a) Apps that just use ffi for something specific
b) Languages that then let the users of those languages have varying
degrees of freedom in themselves
c) Haskell - While some of the packages are actually probably ffi
users, I think a lot of these are false dependencies; almost every haskell
user seems
to gain a dependency on libffi directly.
I'm back on the 4th January.
Dave
Hi,
I continued looking into EEMBC benchmarks:
- telecom fft is not vectorized because unknown number of iterations.
It has both non-constant step and its loop bound may overflow. I
think, the solution here could be loop versioning, but since
versioning increases code size, this kind of optimization can be less
beneficial.
- telecom viterbi (vectorization potential gain is 4x) requires
conditional store sinking and load hoisting to enable if-conversion. I
worked on implementation of store sinking this week.
Ira
Ulrich Weigand/Germany/IBM wrote on 12/20/2010 06:01:21 PM:
> Mark Mitchell <mark(a)codesourcery.com> wrote:
> > On 12/20/2010 8:35 AM, Ulrich Weigand wrote:
> > > Now, I guess there's two ways forward: either the outcome of the
ongoing
> > > discussions on gcc-patches is that it is in fact not a good idea to
> > > generate such sets, and the EE pass is subsequently rewritten to
avoid
> > > them; or else, if those instructions are considered valid, I'll have
to
> > > extend the SPU move expander to handle them. Thoughts?
> >
> > I haven't participated in the upstream discussion -- I'm way behind on
> > that list :-( :-( -- but I think such sets should be considered valid.
>
> OK, I'll have a look at fixing the SPU back-end then.
I've now fixed this problem in the back-end upstream:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01694.html
I've also created a back-port to Linaro GCC 4.5 and proposed the
branch for merge; you can find the details at:
https://bugs.launchpad.net/gcc-linaro/4.5/+bug/693425
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== Last Week ==
* Continued working on ARM unwinding in libunwind. Produced a draft
write-up of my progress in the event that I don't finish this work
before being swapped out of Linaro.
* (Re-)submitted patches to fix ltrace test suite. Hopefully, these will
be the last changes before the new release.
== This Week ==
* Continue working on libunwind.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
Hi,
We would like to build Android with the Linaro tool chain. Do any of you
know what kind of work will be needed to adapt Linaro gcc to Android?
Regards,
Patrik
== GCC related ==
* CS Issue #10201 / PR46883, unrecognizable insn ICE when compiling
Samba. Fixed this by changing the predicates of two split patterns.
Patch reviewed in CS internally and upstream, committed upstream, will
backport to SG++ and Linaro soon.
* LP:641397/PR46888: bitfield insert optimization. Andrew Pinski found a
testcase that escapes the CSE patch gets handled by combine, and also
found another bug with REG_EQUIV notes. Only looked at this minimally
last week, will really work on this later.
* LP:687406/PR46865, -save-temps creating different code. Backported and
bzr-pushed the upstream fix by Jakub Jelinek.
* PR45416, ARM code regression. Mostly can generate what I wanted by
now, under ARM and x86, although patch is still not in a submittable state.
* VFP index patch. Uncommitted GCC patch of mine from last year; added
Thumb-2 bits and corrected some things in the testcase. Committed upstream.
== This week ==
* Really get January travel stuff nailed.
* Upstream patch review is probably going to start getting
slow/suspended this week. Will probably do some study stuff on larger
projects.
* Continue to look at GCC issues.
== Linaro GDB ==
* LP:615972 Different output of 'info register' w/ and wo/ corefile.
Understands gcore impl in gdb. Two patches are reviewed in upstreams.
One is approved by Dan, and the other is still be reviewed.
Evaluate the two approaches for NEON registers in corefile. Resume the
discussion on kernel support for dumping NEON registers. Need a
decision with kernel side, but no progress on it.
* LP:685494 Revise patch per Pedro's suggestion. Waiting for someone
to approve it.
* LP:685702 Get it approved for FSF GDB 7.2 branch. Committed to both
FSF 7.2 branch and Linaro tree.
* LP:616003 gdb.mi/mi-var-display.exp failure
GDB always assumes $fp is r11, even code is in thumb mode. Current GDB
infrastructure can't handle mapping the same alias to two different
registers. Proposed a new gdbarch took for this in upstreams, in order
to increase the flexibility of GDB. No reply yet.
* LP:616001 gdb.mi/mi-var-cmd.exp failure
Ulrich pointed out it is caused by stack randomization. Confirmed this
by setting "kernel.randomize_va_space" to zero. Figure out why this
case passes on x86, because it is more restricted to turn on stack
randomization on x86.
* LP:615980 Support displaced stepping on Thumb
Understands displaced stepping in GDB/ARM. Find a bug when GDB tries to
execute ARM instruction in copy area, which is in Thumb mode (copy area
starts from "_start + 4", and it is compiled in Thumb mode in Ubuntu).
The fix is an one-line patch, which doesn't update status register when
writing PC in displaced stepping.
Write a test case for arm displaced stepping. Write code in ARM asm
directly for the first time, which is very helpful to remember ARM asm
instructions.
Read ARM ARM and decode 16-bit thumb instructions in GDB for displaced
stepping. It doesn't work so far because breakpoint instruction after
instructions in copy area is still hard-coded to ARM breakpoint insn.
== Misc ==
* Linaro GCC optimization meeting.
== This Week ==
* LP:615980 Support displaced stepping on Thumb
Send my fix and test case to upstreams for review.
Make displaced stepping work on 16-bit instruction.
* Ping other GDB patches.
--
Yao (齐尧)
Mark Mitchell wrote:
> > If Profile Guiding could spot that a particular callsite to say strlen()
> > was often associated with strings
> > of at least 'n' characters we could call a different implementation.
>
> I don't believe this is possible current profile-guided optimization,
> but certainly it could be done.
It looks to me like a case of value profiling, see tree-profile.c, for
the various "stringops" optimizations. Unless I misunderstand David's
idea here or missing something else, it seems that this kind of
optimization should fit in the existing infrastructure without too much
effort.
Ciao!
Steven
Does anyone have any experience of what can be profiled in the profiled
guided optimisations?
One of the problems with some of the string routines is that you can write
pretty neat fast routines that
work well for long strings - but most of the calls actually pass short
strings and the overhead of the
fast routine means that for most cases you are slower than you would have
been with a simple routine.
If Profile Guiding could spot that a particular callsite to say strlen() was
often associated with strings
of at least 'n' characters we could call a different implementation.
Dave
* Linaro GCC
lp:686381: C++ link failure on ARM
Reproduced the bug and posted my findings to the bug report - user error.
Changed the way the Linaro GCC version numbers are handled. Hopefully
the new system should be less distasteful to Matthias. Updated the GCC
release procedure document to match.
Organised and chaired a meeting to discuss GCC optimization
opportunities for ARM. It was well attended, and I think we had some
useful discussion. Spend quite some time preparing beforehand, and
writing it up afterwards. Next step is to come up with some actual plans
to implement something. I imagine we can discuss this at the sprint in
Dallas next month. See
https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations
* Upstream GCC
My upstream patch to fix ARM smlabb has been approved and committed to
GCC 4.6 (mainline). Only another three patches need approval now!
Continued testing upstream GCC 4.6 with both cross and native builds. It
appears to be in a buildable state now, with no extra patches required.
I've updated the Linaro GCC 4.6 branch with the buildable state.
* Other
Updated my ESTA, and added my security details to the airline bookings.
------
Future availability
20th Dec .. 3rd Jan - Vacation/Holiday
4th Jan .. 8th Jan - Business as usual
9th Jan .. 14th Jan - Linaro Sprint, Dallas
15th Jan .. 21st Jan - CodeSourcery/Mentor Annual Meeting, Scottsdale
24th Jan onwards - normal service restored!