RAG:
Red:
Amber:
Green: qemu-linaro RC0 prerelease uploaded
Current Milestones:
| Planned | Estimate | Actual |
first qemu-linaro release | 2011-01-11 | 2011-01-11 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | handed off |
* maintain-beagle-models:
+ went through diffs between qemu-linaro and qemu-meego to
confirm we hadn't dropped any patches by mistake
+ tested qemu-linaro tree on ubuntu netbook image: boots OK
+ investigated and fixed qemu bugs caused by new x-loader
https://bugs.launchpad.net/qemu-linaro/+bug/704484
and new u-boot:
https://bugs.launchpad.net/qemu-linaro/+bug/703094
+ made merge requests to meego for a CRIS compile failure
and the x-loader bugfix
http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/4http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/5
+ Went through the process of doing a qemu-linaro release
with a "RC0" prerelease as a pipecleaning exercise and
to provide a tarball to slangasek for doing packaging. Download:
https://launchpad.net/qemu-linaro/+milestone/2011.02
Release process writeup:
https://wiki.linaro.org/WorkingGroups/ToolChain/QemuReleaseProcess
* merge-correctness-fixes
+ the usual upstream mailing list monitoring and code review
+ tested and sent meego VQ(R)DMULH.s16 fix upstream:
http://patchwork.ozlabs.org/patch/80725/
+ working on a patch to fix decoding of the preload and hint space
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
SPEC
Tried to track down what was going on with lbm; it doesn't seem to
be repeatable on canis1; I'd previously seen it fail at O1 and work at
O0 and tried to chop down the flags between the two; but after adding
all the flags back in on top of -O0 it still worked and then I tried
-O1 again and it worked. Going to try on another machine, but it
might be uninitialised data somewhere.
Panda
Our panda arrived; it's now happily nestling near our Beagles and
running the 0126 headless snapshot (with 0127 hwpack). It seems fine
except
for rather slow USB and SD IO. Tip: Panda's do absolutely nothing (no
LEDs, no serial console activity) unless you put an SD card
in with the firmware on.
Libffi
Wrote the changes for armhf. Tested on arm, armhf, i386, ppc and
s390x - all happy. (Not too unsuspectingly variadic calls just work
on everything other than armhf without the api change)
Mailed Python CType list asking how much of a pain the API change
will be and any hints on what might be affected.
Awaiting sign off for submission of code.
Optimised library routines
Looked at benchmarking 'git'; I'd seen previous discussions where it
had been pointed out that it spends a lot of time in library routines;
and indeed it does spend useful
amounts in memchr, memcpy and friends on a simple git diff v2.6.36
v.2.6.37 > /dev/null of the current kernel tree produces a useful
~25second run.
One interesting observation is that the variation in the times
reported by 'time' - i.e. user, system and real, the variation in
user+system is much less than either user or
system individually and is quite stable (within ~0.7% over 10 runs).
I've just tried preloading my memchr routine in and it does get a
consistent 1-1.2% improvement which does look above the nice.
Also asked on libc-help list for suggestions as to other benchmarks
people actually trust to reflect useful performance increases in core
routines as opposed to totally
artificial ones.
Dave
== GCC ==
* Determined root cause of #Bug 598462 (corrupted profile info
with -O[23] -fprofile-use), identified work-around (using
-fprofile-correction), and verified on GCC 4.5 and 4.6
== GDB ==
* Worked on re-implementation of fix for #615978 (Failure to
software single-step into signal handler)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== GCC ==
* Internal ARM tasks that kept me busy for some time this week.
* Reworked the divmodsi4 patch based on comments. Still some test
failures that need to be investigated.
* Started working on the performance improvement plan for GCC 4.5.
* Number of patch reviews upstream.
* Looking at A9 memset issue. .
* Got an internal bug report about the 4.4 tools from someone using it
which had to do with a missing backport. This is now lp:709329.
Backported and testing.
* Backported a missing upstream patch into GCC 4.5 for bswap patterns.
* Wrote up on how to cross-test with qemu and updated that on the
Linaro wiki. Comments are welcome.
Plans:
* Find out the default number of iterations needed for steady state
with EEMBC.
* Finish testing fix for lp:709329 and get that committed.
* Finish off divmodsi4 patch and get it's testing completed and
finished.
* Submit backport for bswapsi at Os fix (PR44392) for gcc-linaro
* Finish the plan for GCC performance improvements based on what we
discussed at the
sprint.
* Away 2-1/2 days next week for an internal engineering conference.
This week
=========
- On other IBM duties until today. Now finished!
- Looked at the neon failures that Peter reported. I'm testing patches
for the ICE and the "must be a constant" error now.
- In the background, I've been trying to pin down the chromium build
failure. I can only reproduce it when running under dpkg-buildpackage:
if I run the link line manually, it works. This is reminiscent of a
problem that Dave saw elsewhere: the linker segfaulted only when run
through the normal build system.
Next week
=========
- Bernd Schmidt has committed our combined patch for #695302.
Will backport to our sources.
- Submit the neon fixes above.
- More on STT_GNU_IFUNC. There are some more tests I want to write.
Richard
Some news from the qemu mailing list that I think might be
of interest to gcc folks here:
Christophe Lyon from ST has kindly released a large
set of test cases of Neon intrinsics:
http://gitorious.org/arm-neon-tests/arm-neon-tests
(the tests themselves are more aimed at testing qemu,
so they just produce output to be compared against a
reference generated from running on hardware).
However they don't currently compile with gcc (but
are ok with armcc). From the README:
# The tests currently fail to build with GCC/ARM:
# - no support for Neon_Overflow/fpsrc register
# - ICE when compiling ref_vldX.c, ref_vldX_lane.c, ref_vstX_lane.c
# - fails to compile vst1_lane.c
# - missing include files: dspfns.h, armdsp.h
Maybe it's worth somebody having a look at this,
at least enough to find out whether the ICEs are
things we already know about or have perhaps
already fixed in linaro gcc?
thanks
-- PMM
Hi,
I am working on implementation of interleave_high/low and
extract_even/odd for NEON. The pairs of high/low (even/odd) are
"magically" united into single vzip (vuzp) instruction in the back
end, so there is no need in special support from the tree level. There
are still some test failures that I need to solve.
Ira
Hi,
I've written up a wiki page finally on the cross-testing with qemu
here. It's taken me slightly longer than expected as I was playing
around with moin-moin syntax. These are based on qemu-0.13.0 which is
what I tried when I wrote this up for some testing of a patch that I
was playing with.
https://wiki.linaro.org/WorkingGroups/ToolChain/CrossTestingQemu
Comments are welcome .
Ramana
Hi,
* I looked into the perf utility with regard to ARMv7 and raw event support
* https://wiki.linaro.org/KenWerner/Sandbox/perf
* testsuite fixes for the OpenCL GDB
* started to setup the pandaboard (currently the headless snapshot hangs
shortly after I got the bash prompt - I'm not sure what's going on here)
* On Friday I'll attend a class.
Regards
Ken
Hello,
* Submitted to mainline the patch to model Doloop for ARM
(http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) .
The ARM back-end part was reviews by Richard Earnshaw.
* Looking into EEMBC/DENbench for opportunities applying Modulo-Scheduling.
Based on partial profiling information there are SMSed hot loops. My next
step is to generate complete profiling information (including gcov info)
and execute the benchmarks on ARM machine.
Thanks,
Revital
Hi Vijay,
On Sat, Jan 22, 2011 at 9:59 AM, Vijay Kilari <vijay.kilari(a)gmail.com> wrote:
> Hello Dave,
>
> Thanks for this info.
>
> I have few more queries after looking at the results of memset on A9 & A8.
> I agree that externel bus speed matters in comparision across platforms.
>
> 1) Why memset is performance is good on A8 than A9?. any justification?
I've CC'd the linaro-toolchain list who have been working on this
topic and may be able to provide you with more information.
Cheers
---Dave
Hi there. I've had a think and done a write-up on the causes behind
the 2011.01 release failure and what should be changed. See:
https://wiki.linaro.org/WorkingGroups/ToolChain/Incidents/2011.01-X86_64
The changes involve being more explicit in the release process
document and changing the continuous build to give earlier warning.
Comments are appreciated.
-- Michael
Hello,
I have a patch for ARM that I want to test and I'm not sure what's the
procedure of testing in Linaro nor to where
it should be committed. (GCC trunk? it's currently under bug fixes mode
only).
I appreciate help with that.
Thanks,
Revital
Hello,
could you please provide some comments about the state of "-Os"
(optimising for size) in the gcc 4.5.x versions of Linaro's tool
chain?
It appears there are a number of issues with recent versions of GCC
that get triggered when optimising for size, for example
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45052http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44392
Some other projects like the Linux Foundation driven Poky (resp.
Yocto project) capitulated and stopped using -Os, see for example
here:
http://thread.gmane.org/gmane.linux.embedded.poky/2311/focus=2565
On the other hand, I can see that Linaro even adds improvements for
"-Os", see for example here:
http://thread.gmane.org/gmane.linux.linaro.toolchain/367
So I wonder what the state of these problems with "-Os" is in the
Linaro tool chain? Have these issues been solved, and is "-Os"
reliably working with the Linaro tool chain?
Thanks in advance.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd(a)denx.de
The optimum committee has no members.
- Norman Augustine
I assume we're moving the toolchain calls over to the
new confcall numbers -- but just to check, are we doing
so for this Wednesday's status call? The calendar
entry still has the old numbers...
-- PMM
==GCC==
Progress:
* Got invited to the Toolchain WG and was at the sprint in Dallas.
* Worked on fixing inconsistencies in the A9 scheduler. Now fixed up
upstream and submitted merge request for that.
* Introduced to other members of the Toolchain WG and had
conversations with everyone about what was going on within the
Toolchain WG.
* Spent some time trying to set up a Pandaboard that I borrowed
(Thanks Wookey) but the Panda didn't work. Turns out both the SD cards
were broken and or the versions of the boot loader / firmware were
broken.
* Worked through the patch list with Andrew trying to understand the
patches that exist in the backend for the delta between gcc-linaro and
upstream.
* Did some patch review upstream to help ease the backlog.
* Flushed some patches out from my patch queue upstream.
* Fixed the scheduler issue upstream and submitted a merge request for
it in linaro gcc-4.5 since it really fixes the A9 scheduler.
* Some ARM internal tasks with respect to transition into Linaro.
* Worked on the divmodsi4 issue and testing my patch with trunk.
Pushed my branch into launchpad if someone wants to see the code. Does
the right thing for the case we want but won't help with the SPEC2k
case really because that's a case with Os. But a good size improvement
anyhow and something that will help with divmod cases with 2 variables
rather than where a constant is used.
* Set up a natty chroot on one of our ve2 boxes which is now running a
bootstrap for all languages we are interested in and collecting test
results from trunk upstream.
* Worked through the new starter guide and still have an issue with
accessing wiki.linaro.org
Absences:
February 1-3:ARM Internal engineering conference.
== GCC ==
* Analyzed root cause of #685352 (libplymouth2_0.8.2-2ubuntu6 and
later give ragged splash and text rendering), opened mainline
bugzilla PR rtl-optimization/47299
* Submitted backport merge request for the #685352 fix
* Successfully completed profiled-bootstrap run on ARM
* Investigated profiled pyhton package build problems
== GDB ==
* Forward-ported GDB patch to support ARM hardware watchpoints
* Determined root cause of #615978 (Failure to software single-step
into signal handler), started discussion of proposed fix
* Miscellaneous Launchpad bug maintenance
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
=A9 Qemu=
I've spent most of the week looking at QEmu emulation of SMP A9. The
model (of a realview-pbx-a9) doesn't have any working block IO;
I spent some time looking at trying to get SD working and got part
way, but I fell back to using NFS root.
It seems to work OK for basic CPU emulation, and SMP 'works' in the
sense that the guest sees multiple CPUs;
however QEmu is restricted to only using one host CPU core for
multiple guest CPUs, so it's of limited
help in debugging SMP code.
Video doesn't seem to work either.
To get to that point it does need a bunch of patches to QEmu, most of
which Peter Maydell already knows of;
I've put some notes here :
https://wiki.linaro.org/Internal/People/DaveGilbert/QEMUA9SMP
Note that the realview-pbx doesn't currently have a Linaro hardware
pack; I used a kernel from ARMs website
and a 2.6.37 I built myself.
=SPEC=
SPEC ref got quite a far way through on Canis (with half-duplex
ether), however 'lbm' failed when running
in ref mode (while having worked in test and train) giving a different
output; it takes quite a long time to fail.
= Perf =
I sent an updated version of my patch for perf's thumb annotation upstream.
and apparently we have a Panda on the way.
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | handed off |
first qemu-linaro release | 2011-01-11 | 2011-01-11 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
* submitted meego patch upstream to print instruction
boundaries in TCG op-level debug dumps:
http://patchwork.ozlabs.org/patch/79298/
* reviewed a couple of patches by Christophe Lyon from ST
which fix some problems with VMULL and friends
(together with a qemu-meego patch in the same area)
* lots of rebasing work to get a qemu-linaro tree into
shape. Mostly putting together my exploded set of
ARM patches from the meego tree with a rebased set
of omap patches. I have something which I think is about
right but I need to test it.
* patches submitted to qemu upstream:
+ add -version option to qemu-user:
http://patchwork.ozlabs.org/patch/79635/
+ fix writing default vector address in PL190:
http://patchwork.ozlabs.org/patch/79712/
+ print instruction boundaries in TCG op-level debug dumps:
http://patchwork.ozlabs.org/patch/79298/
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
I had a few spare cycles this afternoon on my A9 board so I got it to
build and run 72 variants of CoreMark with the 2011.01 compiler. The
results are here:
https://wiki.linaro.org/MichaelHope/Sandbox/CoreMark1
I haven't looked deeply into it and the usual disclaimer about
benchmarks applies, but there's some interesting stuff there. The ARM
to Thumb-2 drop is more than expected. Tuning for Cortex-A9 generally
makes things worse.
-- Michael
Hi,
One of the things that came up in yesterday's chat was about ARM vs
Thumb2. If some folks are interested in a high level overview that
doesn't go into too many details with respect to the ISA that I know a
compiler writer will be interested in you could read this presentation
unless you've seen it. At the minute the canonical way of figuring out
more about Thumb2 is really to read the ARM-ARM.
https://wiki.ubuntu.com/Specs/M/ARMGeneralArchitectureOverview?action=Attac…
This was a presentation that Dave Rusling gave at the UDS in Belgium
last year, it roughly covers the architecture at a high level and
talks about Thumb2 vs ARM , why Thumb2 , the evolution of the ARM
architecture etc. It's not a lot of detail but quite a high level overview.
HTH
cheers
Ramana
Hi,
* finished SLP for reduction patch. The loop in DenBench that needs
this feature also requires support of load permutation. I am
considering to implement that too. I looked for other occasions that
need this feature, but only found loops that are not vectorizable. So,
I am not sure I'll proceed in this direction.
* looked into extract_even/odd and interleave_high/low implementation
on ARM as a backup plan for the case we don't have special load/store
support on time for the next release. Even though NEON VZIP and VUZP
instructions can perform both even and odd (and high and low)
computations simultaneously, I don't see how we can express that at
the tree level.
* looking into ffmpeg
* non-Linaro issues
Ira
Hi ,
So rather than manually backporting a patch from upstream "trunk" I
decided to try using bzr for this yesterday. Even though this was a
slow process and thanks to some help from Michael on IRC last night I
think I got it right finally. However I can't get bzr visualize to
show me the revision history correctly with the merge from the
upstream repository.
So what I did was to do the following
bzr branch 4.5 new-branch
cd new-branch
bzr merge -c XXX lp:gcc where XXX is the bzr revision number of the
svn commit that I'm tracking upstream.
gcc/Changelog merge fails - there is a conflict. Expected because you
are merging trunk's gcc/Changelog into a version of the Changelog file
in our tree
Thus I reverted changes that were brought in by copying in a backup of
gcc/Changelog that I had made before the merge
bzr resolved gcc/Changelog
edit Changelog.linaro
Add changelog entry
bzr commit
bzr push lp:gcc private repository
Create merge request upstream.
Does this look sensible to people or do folks follow other recipes?
cheers
Ramana
Hi there. I've cancelled Monday's meeting as the CodeSourcery people
are at their annual meeting and others are travelling back from the
sprint. Please come to the Wednesday stand up call if you are
available.
-- Michael
Got a complete run of SPEC Train on Orion board - all working.
Kicked off a SPEC ref run - canis1 died.
Gathered a full set of 'perf record's for all of SPEC on silverbell;
and had a quick look through them;
there aren't too many surprises; a few things that might be worth a
look at though.
(Not as much using libc functions as I hoped).
There are some odd bits - chunks of samples landing apparently outside libraries
that aren't obvious what's going on.
Sent tentative patch for Thumb perf annotate issue (bug 677547) to
lkml for comments.
Started on libffi variadic fixing.
Caught the qemu pbx-a9 testing from PM; got qemu built and getting a
handful of lines of output
both from a kernel from arm.com's site and a linaro-2.6.37 that I built for it.
Dave
RAG:
Red:
Amber:
Green: issues I wanted to nail at this sprint all handled;
a couple of blueprints handed off to other people
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | |
At the Linaro/Ubuntu sprint/rally in Dallas this week
* qemu-continuous-integration
** spoke to Paul Larson (in Linaro Validation team) and
handed this blueprint off to him (with a clarification
of the requirements from qemu's point of view)
* maintain-beagle-models
** we've agreed to start doing official "Linaro Qemu"
releases which are essentially going to be the meego
tree plus point fixes for things. These will be every
month with the rest of the toolchain group releases.
Ubuntu will also take these releases to replace the
current use of the qemu-kvm tree to provide ARM models
* verify-a9-pbx-support
** this blueprint has been handed off to David Gilbert
* merge-correctness-fixes
** wrote and posted patches:
*** to restore IT bits after unexpected exceptions
(including fix of Linux usermode bug where it wasn't
clearing the IT bits when entering a signal handler)
*** to include opcode hex in disassembly of ARM insns
*** fixing a compile failure for a previous change when
the host linux system didn't have linux/fiemap.h
** I have managed to split the huge "lots of ARM TCG fixes"
commit in the meego tree up into 65 more self-contained
commits. These now need reordering and possibly some
may be recombined.
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
2011: Holiday 21 Jan, 22 Apr - 2 May.
Hi there. Unfortunately the Linaro GCC 4.5 2011.01-0 release has a
failure in the x86_64 compiler causing it to fail during the initial
build. We're working on triaging the problem at the moment.
ARM and i386 targets are not affected. I'll send a new announcement
when the problem has been fixed and the replacement 2011.01-1 release
is available.
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.4 and Linaro GCC 4.5.
The Linaro Toolchain Working Group is pleased to announce the release of
both Linaro GCC 4.4 and Linaro GCC 4.5.
Linaro GCC 4.4 is the sixth release in the 4.4 series. Based off the
latest GCC 4.4.5, it is a maintenance release that fixes one problem
found through use.
Linaro GCC 4.5 is the sixth release in the 4.5 series. Based off the
official GCC 4.5.2 release, it includes many ARM-focused performance
improvements and bug fixes.
Interesting changes include:
* Improved optimization of multiple load instructions, and
multiply-and-accumulate.
* -fshrink-wrap optimization for better use of function prologues and
epilogues.
* plus, various other bug fixes, and minor improvements.
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.5-2011.01-0
and
https://launchpad.net/gcc-linaro/+milestone/4.4-2011.01-0
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Hi,
* Continued with testing and implementation of reduction support in SLP
* Found a major problem in vectorization of if-converted data
accesses. Looked into other ways to solve the problem.
* Spent some time on non-Linaro vectorization plans
* Unsuccessfully tried to make the board work
Ira
== Last Week ==
* Displaced stepping support for 32-bit Thumb insns are done, unless
new bugs are found.
Decode writeback for str/ldr in Thumb 32-bit. Fix bugs in supporting
LDR/STR/LDRH/STRH in both ARM and Thumb.
Test GDB to handle various 32-bit Thumb insns in displaced stepping.
Code cleanup and write some new test cases.
* One day New Year holiday on Jan. 3rd.
== This Week ==
* Linaro Sprint.
* Get patches for Linaro GDB bugs approved on upstreams as many as
possible.
--
Yao Qi
Hello,
While testing SMS on Crotex-A9 I see that the latency of load instruction
is 1
cycle when compiling with -mcpu=cortex-a9 -mthumb -mtune=cortex-a9 -O3.
Below is a snippet from the SMS dump file showing the DDG, created for the
loop in foo function, which depicts the edge between the load of input[i]
(insn 181) and the mult instruction (insn 184).
[181 -(T,1,0)-> 184] is the true dependence edge created between the
two insns; with latency of 1.
On Crotex-A8 the latency of the load is 3 as expected.
I've read in crotex-a9.md file that loads should have a latency of 4 cycles
so I just wanted to check if I should have used other combination of flags
for Crotex-A9 or the load latency should indeed be of 1 cycle here.
Thanks,
Revital
int foo (int max, signed short *input, int y)
{
int i, accum;
for (i = 0; i < max; i++) {
accum += (signed int) input[i] * (signed int) input[i+y];
}
return accum;
}
The snippet from the DDG:
Node num: 2
(insn 181 178 184 13 (set (reg:SI 216 [ D.2019 ])
(zero_extend:SI (mem:HI (plus:SI (reg:SI 319 [ ivtmp.34 ])
(reg:SI 345)) [2 MEM[base: D.2076_257, index:
D.2079_226, offset: 0B]+0 S2 A16]))) tmp.c:7 714
{*thumb2_zero_extendhisi2_v6}
(nil))
OUT ARCS: [181 -(A,0,1)-> 176] [181 -(T,1,0)-> 184]
IN ARCS: [184 -(A,0,1)-> 181] [176 -(T,1,0)-> 181]
Node num: 3
(insn 184 181 234 13 (set (reg/v:SI 209 [ accum ])
(plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 212
[ D.2013 ]) 0))
(sign_extend:SI (subreg/s/u:HI (reg:SI 216 [ D.2019 ]) 0)))
(reg/v:SI 209 [ accum ]))) tmp.c:7 64 {maddhisi4}
(expr_list:REG_DEAD (reg:SI 216 [ D.2019 ])
(expr_list:REG_DEAD (reg:SI 212 [ D.2013 ])
(nil))))
== Last Week ==
* Worked to improve libunwind test suite results using new ARM-specific
unwinding. Managed to reduce the number of failures, but some tests
still fail.
* Tested latest ltrace upstream tree and determined that It Is Good.
Hopefully, this marks the cusp of a new release that includes my work.
== This Week ==
* Finish up work with libunwind. Update wiki brain dump.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== Last week ==
* Some discussion about libffi variadic function support. The current
proposed design is an additional API function to prepare a new call
interface (CIF) structure with the argument number settings for each
variadic call site. This IMHO, is slightly less flexible that doing a
call-time placement style interface, but should be faster and easier to
implement.
* Collecting a few cases of ARM/Thumb-2 integration in the GCC ARM backend.
* Prepared and traveled to Dallas for the Linaro sprint.
== This week ==
* The Linaro sprint.
== Linaro GCC ==
* Continued hacking on (cleaning up, testing) NEON
element/structure load/store (etc.) intrinsics-improvement patch. This
now passes all the auto-generated neon.exp tests (admittedly only a
very weak test for correctness), and the small number of hand-written
tests I threw at it, so hopefully we can declare victory soon. (This
patch provides huge improvements to generated code for the "fancy" NEON
load/store intrinsics and vtbl/vtbx lookup instructions, and hopefully
also provides part of the solution for allowing the vectorizer to use
the fancy loads/stores also.)
== Misc ==
* Public holiday Monday.
== GCC ==
Pulled down new commits from upstream GCC. My test build failed due to a
new cross-build configure problem. Found the problem in GCC Bugzilla,
and rolled back to the revision before the problem one. Those sources
built ok, so I've pushed the changes to Linaro GCC 4.6 branch. It it now
updated to 31st December.
Discussed cross-build problems with Alexander Sack on IRC.
Merged, tested and pushed all the outstanding Launchpad merge requests
into GCC 4.4 and 4.5.
Merged, tested and pushed all new CS SG++ patches into Linaro GCC 4.5.
Merged, tested and pushed GCC 4.5.2 from upstream into Linaro 4.5.
Spun releases for both Linaro GCC 4.4 and 4.5.
Brought the Linaro patch tracker up to date.
== Other ==
Catch up with lots of email following my two week holiday.
Updated some of the CS patch tracker.
Travel to Dallas for the Linaro sprint.
-------
Next week (10th-14th Jan):
Linaro sprint in Dallas
Following week (17th-21st Jan):
CS annual meeting in Phoenix
Got h264ref working in SPEC; this was another signed-char issue (very
difficult one to find - it didn't crash, it just came out with subtly
the wrong result);
compiling that and sphinx3 with -fsigned-char and they seem happy.
With Richard's fix for gromacs that leaves just the zeusmp binary
that's too large to run on silverberry; it seems to startup on canis1
(that has more RAM).
So that should be a full set; I just need to get the fortran stuff
going on canis1.
Kicked off discussion on libffi-discuss about variadic calls; people
seem OK with the idea of adding it (although unsure exactly how many
things really use it
- even though there are examples in Python documentation). Also
kicked off some of the internal paperwork to contribute code for it.
Dave
== This week ==
* Away Monday, and a fair bit of time on non-Linaro duties.
* Looked at Dave's gromacs bug (693502). Turned out to be a reload
inheritance problem. Tested a patch. Spent some time coming up with
a brute-force testcase that I can submit with the patch.
* Found a bug in the x86 and x86_64 ifunc support that would affect
ARM too if we weren't careful. Came up with an example testcase
and filed the bug upstream:
http://sourceware.org/bugzilla/show_bug.cgi?id=12366
It has been fixed by H.J. Lu. I'm wondering about taking a
slightly different approach for ARM.
* More ifunc work.
* Tried again to reproduce the chrome failure, making sure to use
DEB_BUILD_HARDENING=1. (I hadn't realised first time round that,
in reaction to this bug, debian/rules specifically excluded armel
from the automatic DEB_BUILD_HARDENING=1 setting.) The build
takes a couple of days on my BeagleBoard.
Heh, and just as I wrote that, the build failed with the reported
link error. Neat.
== Next week ==
* Finish testing the patch for 693502 and submit it upstream.
Backport the patch to our tree once accepted.
* Look at the chromium problem.
* More ifunc.
Richard
RAG:
Red:
Amber:
Green: qemu git pull request accepted, patches seem to be flowing
into qemu upstream more freely now
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | |
Bonus extended holiday edition:
This report includes a number of things that happened over
the Christmas holidays as well as this week (which is
a short one, only 3 days).
Progress:
* merge-correctness-fixes:
** my git pull request for various ARM qemu patches was merged!
** a number of other patches were merged to qemu master:
+ implement correct NaN propagation rules
+ rename softfloat float*_is_nan() functions
+ fix UMAAL (Aurelien's patch, reviewed by me)
+ VQSHL (reg) patchset
** diagnosed the segfault in
https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/604872
and wrote a patchset which fixes it (and related problems):
http://patchwork.ozlabs.org/patch/77887/ (n/7)
** wrote and posted a patchset which implements save/restore
for the versatile platform (so I didn't have to wait 10
minutes for the test case to reach the segfault)
http://patchwork.ozlabs.org/patch/76529/
** finished and posted a patchset implementing flushing of
denormals to zero on input:
http://patchwork.ozlabs.org/patch/77798/ (n/3), now committed
** reviewed/tested Aurelien's SMMLA/SMMLS patch (now committed)
** wrote and posted patches which implement the FS_IOC_FIEMAP
ioctl (http://patchwork.ozlabs.org/patch/77725/) and
the file_sync_range{,2} syscalls
(http://patchwork.ozlabs.org/patch/77723/) -- these are used
by apt, and so linaro-media-create was generating a lot of
warnings from qemu about their lack of implementation
** posted patch to clean up NaN handling in linux-user NWFPE
emulation (follow-on from earlier NaN cleanups):
http://patchwork.ozlabs.org/patch/77795/
* maintain-beagle-models:
** implemented minimal ARM cp14 debug registers
and a random register in the TWL4030 emulation (both needed
to get recent Linaro kernels to boot on the beagle model)
Submitted merge request upstream:
http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/3
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Meetings: toolchain standup, pdsw doughnuts
Absences:
2011: Dallas Linaro sprint 9-15 Jan. Holiday 21 Jan, 22 Apr - 2 May.
Hi,
* implemented reduction support in SLP, I'll check if it helps
DenBench next week
* helping Sebastian Pop with if-conversion for vectorization
improvements (BTW, Sebastian's goal is to vectorize kernels from
ffmpeg)
* fixed GCC PR47139
Ira
Hi All,
Thanks for attending the call. I think we had some interesting discussions.
I've posted the minutes from the call on the same page as before:
https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations
I'll try to get the audio posted somewhere for anybody that's interested.
Andrew
You may have noticed that I have created a new BZR/Launchpad branch for
Linaro GCC 4.6:
lp:gcc-linaro/4.6
https://code.launchpad.net/~linaro-toolchain-dev/gcc-linaro/4.6
Up until now, this has not been buildable due to unfixed bugs. However,
upstream GCC have now straightened out the problems, so I have pushed a
buildable version into the branch.
I shall attempt to keep this branch as up-to-date as I can (at least, I
will once the holiday season and January travel are over), but I'll only
push updates if they build for me, so hopefully the branch should remain
fairly stable, at least for our purposes.
Note that so far I've only tested build-ability. Right now I'm not
making any promises about the quality of the compiler.
At some point, we'll want to use this branch to hold our own patches
(both those that will never go upstream, and those that are queued for
GCC 4.7), so it will diverge from upstream 4.6 a bit. For the moment,
it's merely a mirror.
Andrew
== Last Week ==
* Got a new ARM-specific unwind test case working, so its integration
with libunwind stands a marginally better chance of working (once it's
finally finished).
* Sent a ping to binutils mailing list about a patch to improve readelf
that I had sent in at the beginning of December. Still no response.
* Holidays and Vacation
== This Week ==
* Try to finish libunwind integration, as my time with Linaro is nearly
over. Update documentation on wiki to reflect current status.
* Ping ltrace list about a new release. It seemed so close, then the
list went abruptly silent.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== Linaro GCC ==
* Continued looking at element/structure load/store intrinsics
improvements. Some good initial results: it looks like the plan of
using "extra-wide" vectors for returning struct results works fine (at
least to a first approximation). Sent off WIP patch (internally to CS
only, so far).
Incidentally this looks like it'll be a good stepping-stone for the
"RTL half" (vs. the "tree half") of the representation of
element/structure loads/stores (e.g. vld2/vst2) also. The return type
(for loads) and argument type (for stores) of the RTL patterns for such
instructions is changed from the current wide-integer representation
(OImode, etc.) to a suitable wide vector instead (e.g. V16QImode). This
change might help lead to a more meaningful mapping from an equivalent
tree form -- though we haven't quite got the whole picture yet, as the
middle-end won't want to know about the ARM-specific and
non-standard-named patterns for the element/structure loads/stores.
(One might imagine a new standard-named RTL expander taking care of that
though.)
== Vacation ==
* Vacation Dec 20th-Jan 4th.
== GCC issues ==
* PR44557, Thumb-1 ICE, looked at the ARM specific secondary reload
parts, as well as some general reload internals context. Concluded that
concerns on Thumb-2 about my submitted patch should be unneeded, as the
reload_in/out patterns should never be used for Thumb-2. Also looked a
bit on how we should upgrade ARM to use TARGET_SECONDARY_RELOAD.
* PR45416, ARM code regression. Started working on this again, cleaning
up patch to submit.
* Had some email discussion with Revital Eres on Swing Modulo Scheduling
(SMS) for ARM issues, mainly on how the doloop_end pattern should be
done on ARM.
* Submitted and committed an obvious small patch for a VFP testsuite case.
== This week ==
* Continue on GCC issues.
* Flying to Dallas on Sunday, prepare for trip.
Hi,
* continued with my attempts to vectorize Viterbi:
- finished implementation of conditional store sinking in cselim
pass (I did only limited testing).
- reconsidered the idea of safe load if-conversion if an adjacent
field of the same structure is accessed unconditionally - this may be
incorrect. Instead I tried the last, not yet committed, patch by
Sebastian Pop that implements if-conversion for such cases of not-safe
data accesses. His patch if-converts the loop in Viterbi, however, it
also makes the loop not vectorizable - additional work should be done
in the data-refs analysis and the vectorizer to make it work.
Sebastian is working on the first part, and I'll help him with the
vectorizer part if necessary.
* analyzed EEMBC DenBench, couldn't find any action items for now. But
vld/vst support of strided data accesses should be very useful for
these benchmarks.
* fixed GCC PR testsuite/47057
* looking into SLP of reduction as in PR 41881. I saw similar patterns
several times in DenBench, but I'm not sure that SLP of reduction is
enough to vectorize all of these cases.
Happy New Year,
Ira
== Last Week ==
* Continue with libunwind. Wrote a new unit test for ARM-specific
unwinding code to help debug that new code's problems. Almost got it
working, which I hope means its integration with libunwind may be
nearing completion.
== This Week ==
* Try to finish ARM-specific improvements to libunwind. Famous Last Words.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== GCC related ==
* Launchpad #693686, GCC ARM segfault ICE when building Chromium in V8.
Spent some time reproducing; this ICE seems to be in the maverick
gcc-4.5, at the vectorizer phase. As the ICE happens in
tree-vect-stmts.c:supportable_widening_operation(), I'm suspecting
(without further verification yet) this might be due to vmovn not
backported? (Linaro 4.5 does has this ported I think)
* PR44557, Thumb-1 ICE. Looking further after seeing Richard Earnshaw's
comment on my patch. It would be nice if we could upgrade the entire
secondary reload bits, looking into this.
== This week ==
* Look into more GCC issues.
* Get some backports done.
== Linaro GDB ==
* LP:615972
Get patch approved upstreams. Committed to FSF tree. Propose merge
request to Linaro GDB tree.
* LP:616003 gdb.mi/mi-var-display.exp failure
Discussed in upstreams on how to handle fp in ARM/Thumb mode. Finally
work out a one-line patch. Approved and committed to FSF tree. Propose
merge request to Linaro GDB tree.
Draft another patch to clean up ARM register alias. Pending on upstreams.
* LP:616000 Handle -fstack-protector prologue code
Revise patch per Joel's comments. Approved, and committed to FSF tree.
Draft two patches to handle -fstack-protector prologue code on i386.
Sent them out for review. Due to lack of knowledge on i386 prologue
generate, not very confident on one of these patches.
* LP:615980 Support displaced stepping on Thumb
Get my test case to arm displaced stepping approved, and committed to
FSF tree.
A patch about supporting displace ARM insn in Thumb area is pending
upstreams. Tried the 2nd approach since the 1st approach is not
acceptable to upstreams reviewers. Without this patch, ARM displaced
stepping doesn't work on Linaro.
Support another three PC-related 16-bit Thumb insns (adr, ldr, and
cbz), and add test cases for them accordingly.
Spend some time splitting my big patch into three relatively small
patches in order to make them easier to be reviewed. Patches on
supporting Thumb 16-bit displaced stepping are sent out upstreams for
review.
== This Week ==
* Work from Mon. to Wed.
** Backport some approved upstreams patches to Linaro GDB
** Anything I should do for my pending patches.
* Vacation on Thu. and Fri. 3rd Jan. is China public holiday. Back to
work on 4th. Jan.
--
Yao (齐尧)
Khem Raj <raj.khem(a)gmail.com> wrote:
> The bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46883 files
> against GCC trunk also happens with linaro gcc 4.5
> My guess is that there is a backported patch from trunk into linaro
> 4.5 tree thats causing this ICE
>
> This ICE does not happen on upstream gcc-4.5 branch
Thanks for the bug report!
> I havent figured out the commit yet.
It looks like the regression was introduced by Bernd Schmidt's
patch to improve zero-/sign-extensions (PR 42172), which we
did indeed backport to Linaro GCC 4.5. (I've updated the
PR 46883 bugzilla with more details.)
> Should you need a bug in linaro
> bug tracker I will be happy to file one
Yes, please do so; this makes it easier to track the problem
on the Linaro side. Thanks!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== GCC ==
* Checked in mainline fix for #617384 and submitted backport merge
requests (.debug_line is wrong with -fpic)
* Submitted backport merge requests for the fix for #662324
(Pointer type information lost in 4.5 debuginfo)
* Checked in mainline fix for #693425 and submitted backport merge
request (SPU back-end incompatible with extension elimination pass)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I was on vacation on Sunday and starting from Tuesday stayed home with
a sick child, so I only had a couple of days to work.
* vectorization of Viterbi:
- continued implementing conditional store sinking in cselim pass
- made if-conversion to work on loads of structure fields if other
field from the same structure is accessed unconditionally
* fixed GCC PR 47001.
Ira
Continued looking at SPEC 2006.
The two ICEs I mentioned last week are gone on the Natty version of the
compiler, however the 4 programs that run and give the wrong
results still happen with the Natty version and the latest version from bzr.
The 4 failures are:
h264ref - still fails on bzr 99447 with -O2 or -O0
sphinx3 - still fails on bzr 99447 with -O2 or -O0
gromacs - still fails on bzr 99447 with -O2 but works with -O1; I've
followed this through and detailed it in bug 693502; it looks to me like
a post-increment gone wrong (it's split so it's not
actually a post increment and the original rather than post inc'd value gets
used)
zeusmp - this fails to load the binary; it's got a >1GB bss section.
Interestingly it gets further on my beagle with less memory but a bit of
swap,
even though I think it's not really using all of the BSS
in the config I'm using.
I'm hoping to leave a 'ref' run going over the new year.
The canis1 Orion board I was also running Spec on last weekend died during
the run and hasn't come back.
perf
We now have silverberry using the -proposed kernel which has the fixed
PERF_EVENT config, and perf seems to work fine.
libffi
I've started building the page
https://wiki.linaro.org/WorkingGroups/ToolChain/FFIusers listing things
that use FFI; (generated by a bit of apt wrangling).
There are basically 3 sets:
a) Apps that just use ffi for something specific
b) Languages that then let the users of those languages have varying
degrees of freedom in themselves
c) Haskell - While some of the packages are actually probably ffi
users, I think a lot of these are false dependencies; almost every haskell
user seems
to gain a dependency on libffi directly.
I'm back on the 4th January.
Dave
Hi,
I continued looking into EEMBC benchmarks:
- telecom fft is not vectorized because unknown number of iterations.
It has both non-constant step and its loop bound may overflow. I
think, the solution here could be loop versioning, but since
versioning increases code size, this kind of optimization can be less
beneficial.
- telecom viterbi (vectorization potential gain is 4x) requires
conditional store sinking and load hoisting to enable if-conversion. I
worked on implementation of store sinking this week.
Ira
Ulrich Weigand/Germany/IBM wrote on 12/20/2010 06:01:21 PM:
> Mark Mitchell <mark(a)codesourcery.com> wrote:
> > On 12/20/2010 8:35 AM, Ulrich Weigand wrote:
> > > Now, I guess there's two ways forward: either the outcome of the
ongoing
> > > discussions on gcc-patches is that it is in fact not a good idea to
> > > generate such sets, and the EE pass is subsequently rewritten to
avoid
> > > them; or else, if those instructions are considered valid, I'll have
to
> > > extend the SPU move expander to handle them. Thoughts?
> >
> > I haven't participated in the upstream discussion -- I'm way behind on
> > that list :-( :-( -- but I think such sets should be considered valid.
>
> OK, I'll have a look at fixing the SPU back-end then.
I've now fixed this problem in the back-end upstream:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01694.html
I've also created a back-port to Linaro GCC 4.5 and proposed the
branch for merge; you can find the details at:
https://bugs.launchpad.net/gcc-linaro/4.5/+bug/693425
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== Last Week ==
* Continued working on ARM unwinding in libunwind. Produced a draft
write-up of my progress in the event that I don't finish this work
before being swapped out of Linaro.
* (Re-)submitted patches to fix ltrace test suite. Hopefully, these will
be the last changes before the new release.
== This Week ==
* Continue working on libunwind.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
Hi,
We would like to build Android with the Linaro tool chain. Do any of you
know what kind of work will be needed to adapt Linaro gcc to Android?
Regards,
Patrik
== GCC related ==
* CS Issue #10201 / PR46883, unrecognizable insn ICE when compiling
Samba. Fixed this by changing the predicates of two split patterns.
Patch reviewed in CS internally and upstream, committed upstream, will
backport to SG++ and Linaro soon.
* LP:641397/PR46888: bitfield insert optimization. Andrew Pinski found a
testcase that escapes the CSE patch gets handled by combine, and also
found another bug with REG_EQUIV notes. Only looked at this minimally
last week, will really work on this later.
* LP:687406/PR46865, -save-temps creating different code. Backported and
bzr-pushed the upstream fix by Jakub Jelinek.
* PR45416, ARM code regression. Mostly can generate what I wanted by
now, under ARM and x86, although patch is still not in a submittable state.
* VFP index patch. Uncommitted GCC patch of mine from last year; added
Thumb-2 bits and corrected some things in the testcase. Committed upstream.
== This week ==
* Really get January travel stuff nailed.
* Upstream patch review is probably going to start getting
slow/suspended this week. Will probably do some study stuff on larger
projects.
* Continue to look at GCC issues.
== Linaro GDB ==
* LP:615972 Different output of 'info register' w/ and wo/ corefile.
Understands gcore impl in gdb. Two patches are reviewed in upstreams.
One is approved by Dan, and the other is still be reviewed.
Evaluate the two approaches for NEON registers in corefile. Resume the
discussion on kernel support for dumping NEON registers. Need a
decision with kernel side, but no progress on it.
* LP:685494 Revise patch per Pedro's suggestion. Waiting for someone
to approve it.
* LP:685702 Get it approved for FSF GDB 7.2 branch. Committed to both
FSF 7.2 branch and Linaro tree.
* LP:616003 gdb.mi/mi-var-display.exp failure
GDB always assumes $fp is r11, even code is in thumb mode. Current GDB
infrastructure can't handle mapping the same alias to two different
registers. Proposed a new gdbarch took for this in upstreams, in order
to increase the flexibility of GDB. No reply yet.
* LP:616001 gdb.mi/mi-var-cmd.exp failure
Ulrich pointed out it is caused by stack randomization. Confirmed this
by setting "kernel.randomize_va_space" to zero. Figure out why this
case passes on x86, because it is more restricted to turn on stack
randomization on x86.
* LP:615980 Support displaced stepping on Thumb
Understands displaced stepping in GDB/ARM. Find a bug when GDB tries to
execute ARM instruction in copy area, which is in Thumb mode (copy area
starts from "_start + 4", and it is compiled in Thumb mode in Ubuntu).
The fix is an one-line patch, which doesn't update status register when
writing PC in displaced stepping.
Write a test case for arm displaced stepping. Write code in ARM asm
directly for the first time, which is very helpful to remember ARM asm
instructions.
Read ARM ARM and decode 16-bit thumb instructions in GDB for displaced
stepping. It doesn't work so far because breakpoint instruction after
instructions in copy area is still hard-coded to ARM breakpoint insn.
== Misc ==
* Linaro GCC optimization meeting.
== This Week ==
* LP:615980 Support displaced stepping on Thumb
Send my fix and test case to upstreams for review.
Make displaced stepping work on 16-bit instruction.
* Ping other GDB patches.
--
Yao (齐尧)
Mark Mitchell wrote:
> > If Profile Guiding could spot that a particular callsite to say strlen()
> > was often associated with strings
> > of at least 'n' characters we could call a different implementation.
>
> I don't believe this is possible current profile-guided optimization,
> but certainly it could be done.
It looks to me like a case of value profiling, see tree-profile.c, for
the various "stringops" optimizations. Unless I misunderstand David's
idea here or missing something else, it seems that this kind of
optimization should fit in the existing infrastructure without too much
effort.
Ciao!
Steven
Does anyone have any experience of what can be profiled in the profiled
guided optimisations?
One of the problems with some of the string routines is that you can write
pretty neat fast routines that
work well for long strings - but most of the calls actually pass short
strings and the overhead of the
fast routine means that for most cases you are slower than you would have
been with a simple routine.
If Profile Guiding could spot that a particular callsite to say strlen() was
often associated with strings
of at least 'n' characters we could call a different implementation.
Dave
* Linaro GCC
lp:686381: C++ link failure on ARM
Reproduced the bug and posted my findings to the bug report - user error.
Changed the way the Linaro GCC version numbers are handled. Hopefully
the new system should be less distasteful to Matthias. Updated the GCC
release procedure document to match.
Organised and chaired a meeting to discuss GCC optimization
opportunities for ARM. It was well attended, and I think we had some
useful discussion. Spend quite some time preparing beforehand, and
writing it up afterwards. Next step is to come up with some actual plans
to implement something. I imagine we can discuss this at the sprint in
Dallas next month. See
https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations
* Upstream GCC
My upstream patch to fix ARM smlabb has been approved and committed to
GCC 4.6 (mainline). Only another three patches need approval now!
Continued testing upstream GCC 4.6 with both cross and native builds. It
appears to be in a buildable state now, with no extra patches required.
I've updated the Linaro GCC 4.6 branch with the buildable state.
* Other
Updated my ESTA, and added my security details to the airline bookings.
------
Future availability
20th Dec .. 3rd Jan - Vacation/Holiday
4th Jan .. 8th Jan - Business as usual
9th Jan .. 14th Jan - Linaro Sprint, Dallas
15th Jan .. 21st Jan - CodeSourcery/Mentor Annual Meeting, Scottsdale
24th Jan onwards - normal service restored!
Got SPEC2006 building on Silverbell (VExpress) and Canis1 (Orion). There
are still some issues;
The builds are still going (6 hours so far on a 1GHz A9 for a build and
'test' case), and the Silverbell one has hit an ICE on one of the tests that
looks like 635409,
and also looks like it needs some help getting Perl to work. The build on
Canis has only just started,
but hasn't got Fortran installed.
(The SPEC2006 tools build also failed in the Perl testsuite on sprintf.t and
sprintf2.t which seem to test integer
overflow cases in sprintf % fields)
Added a few of the kernel string/memory routines and bionic routines into my
string/memory graphs and
also ran the tests on the Orion board (similar to other A9 performance - no
surprise).
Wrote up a draft of an email to libffi-dev describing the varargs state; and
as I was doing it realised that
one of the ways didn't quite work and was more messy.
Using rdepends to find all packages using ffi, need to figure out if any
actually care about varargs.
Dave
== GCC ==
* Completed first successful bootstrap and regression test run
of GCC mainline on my IGEPv2 board.
* Worked on implementing fix for #617384
(.debug_line is wrong with -fpic)
* Worked on backporting fix for #662324
(Pointer type information lost in 4.5 debuginfo)
* Analyzed root cause of PR target/46883
(GCC ICE with error: unrecognizable insn)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi there. I've cancelled the weekly and standup calls for the next
two weeks. The next scheduled call is the standup call on Wednesday
the 5th of January. Please attend if you can as it's our last one
before the sprint.
See you then!
-- Michael
Hi there. The sprint is just around the corner and it's a good time
to think about how we can make best use of the week. I've put some
topics up at:
https://wiki.linaro.org/Events/2011-01-LinaroSprint/ToolChainWG
Please feel free to add to it. Have a think about anything that's
easier to do while everyone is in the same room - things like
discussions, kicking off some work, a bit of pair programming on a
problem, or anything that overlaps with another group or Ubuntu.
-- Michael
RAG:
Red:
Amber:
Green:
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 |
complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
Progress:
* merge-correctness-fixes:
** Submitted patchset upstream to fix NaN propagation to
follow ARM ARM rules rather than x87 semantics:
http://patchwork.ozlabs.org/patch/75742/http://patchwork.ozlabs.org/patch/75743/
* maintain-beagle-models:
** Finished implementation of the OMAP NAND prefetch/postwrite
engine including its DMA support. Patches submitted to the
qemu-maemo upstream tree and merged by Juha:
http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/1
** Fixed the (cosmetic) bug
https://bugs.launchpad.net/qemu-maemo/+bug/622408
where we were complaining about "Unknown CMD52" when Linux probed
for the presence of SDIO cards. Fix merged into qemu-maemo:
http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/2
* qemu-continuous-integration:
** Discussion with Loic about setting up jobs on his Hudson
instance for testing qemu against snapshots/hwpacks.
* packageselection-arm-n-more-stable-vm-solution-for-arm
** Discussion about Ubuntu moving to using a qemu-maemo based
qemu for ARM purposes. The Ubuntu blueprint is
https://blueprints.launchpad.net/ubuntu/+spec/packageselection-arm-n-more-s…
We need to come to agreement about what parts of this are
going to be done by variously Linaro toolchain, Linaro
foundations and Ubuntu.
** I'm going through doing another rebase-and-package of
the Linaro qemu, and finishing off writing up the notes
on the process:
https://wiki.linaro.org/WorkingGroups/ToolChain/QemuReleaseProcess
Meetings: toolchain, pdsw-tools, pdsw-tools xmas lunch :-)
Issues:
* a number of qemu patches in progress are logjammed behind
the outstanding git pull request
* the dbgsym debug packages for linaro kernels seem to have
vanished:
https://bugs.launchpad.net/linaro-images/+bug/691192
Absences: (complete to end of 2010)
Fri 17 Dec - Tue 4 Jan inclusive.
2011: Dallas Linaro sprint 9-15 Jan. Holiday 22 Apr - 2 May.
Hi,
* I've spent some time for testing the patches that allow the GCC trunk
to bootstrap again on ARM and posted the results to gcc-testresults
* finally tested and posted the patch that optimizes the __sync_*
builtins (#681138) on gcc-patches
* investigated on the state of the crash utility on ARM (or rather its
prerequisites like kexec)
https://wiki.linaro.org/KenWerner/Sandbox/crash-utility
* I'm on holiday now :)
Regards
Ken
Hi,
On Wed, Dec 15, 2010 at 1:44 AM, Michael Hope <michael.hope(a)linaro.org> wrote:
> On Wed, Dec 15, 2010 at 1:05 PM, Steve Langasek
> <steve.langasek(a)linaro.org> wrote:
>> Hi Michael,
>>
>> On Wed, Dec 15, 2010 at 09:29:38AM +1300, Michael Hope wrote:
>>> Hi Steve. I'd like to hand the rest of this over to you if that's OK.
>>
>> Yep, we can take it from here. To be clear, is this an additional change
>> above and beyond what Matthias reports is currently in Ubuntu gcc
>> (http://lists.linaro.org/pipermail/linaro-toolchain/2010-November/000441.html),
>> and if so, in what version of Linaro GCC is it going to become effective?
>> Do we have documentation of what the relevant failure modes caused by this
>> change *look* like, so that we can at least be triaging them appropriately
>> until there's some documentation on how to fix the resulting bugs?
>
> There will be many failures in many packages. The problem is when you
> use conditional suffixes on instructions: previously the compiler
> would insert an implicit instruction before that; now we have to be
> explicit.
>
> The failures are easy to diagnose and fix. The build will fail with a
> message from the assembler along the lines of 'xxx instruction outside
> an IT block'. The fix is to find the inline assembly code, insert the
> appropriate IT instruction, and re-build. The assembler will validate
> the IT instruction against the following conditional instructions so
> the change is quite safe.
Did someone manage to find out which versions of binutils can silently
accept the IT instructions when assembling for ARM?
This affects what advice we should give on how to avoid breaking
upstream with our additions. The safest approach is #ifdefs, but it
will be better for maintenance if we can avoid this, since it will
render the code very messy.
Cheers
---Dave
Hi there. Some of the tr-* blueprints had work items in them and this
was interfering with the tools that the PM guys use. I've created new
engineering blueprints, pulled the work items across into them, and
added the new engineering blueprint as a dependency of the old TR.
Sorry for the blueprint spam. In most cases the new blueprint has the
same name and subject as the TR one, such as the TR:
https://blueprints.launchpad.net/linaro/+spec/tr-toolchain-4.5-in-distros
which is backed by the engineering blueprint:
https://blueprints.launchpad.net/gcc-linaro/+spec/4.5-in-distros
-- Michael
Hi Richard,
Recapping on this earlier conversation:
http://lists.linaro.org/pipermail/linaro-toolchain/2010-July/000030.htmlhttp://lists.linaro.org/pipermail/linaro-toolchain/2010-July/000035.html
Is it worth another attempt to make a case to upstream for supporting
passing -mimplicit-it=thumb by default to gas?
According to my understanding of this issue, my argument would go as follows:
* gcc currently estimates the size of asm blocks, rather than
determining the size accurately.
* gcc cannot guarantee the right answer for asm block size when asm
blocks contain directives etc., however use of directives in asm
blocks is widespread
* gcc cannot guarantee the right answer for asm block size in
Thumb-2. gcc conservatively overestimates the size by assuming that
each statement of the asm block expands to 4 bytes.
* All of Ubuntu lucid and maverick has been built with
-mimplicit-it=thumb passwd by default, with no known build or runtime
failures arising from this (so size issues aside, we have confidence
that the resulting code generation is sound)
* -mimplicit-it=thumb -mthumb makes the asm block size estimation
unsafe: the asm block can exceed the estimated size even in the
absence of directives, which may lead to fixup range errors during
assembly.
* Following the principles already established for Thumb-2 in
general the estimation can be made safe (or, as safe as the
established Thumb-2 behaviour) by raising the assumed maximum
statement expansion size for asm blocks to 6 bytes, since
-mimplicit-it will add as most a single (16-bit) IT instruction to
each statement.
* The vast majority of all asm blocks are small (< 20 instructions,
say), so the overall overestimate in sizes will generally be modest
for any given compilation unit.
* -mimplicit-it is already _required_ by the Linux kernel and
possible other projects.
...so...
* With -mimplicit-it=thumb and a 6-byte asm block statement
expansion size estimate, we have toolchain behaviour which is as
reliable, and as correct, as it is in upstream at present.
* Layout of data in the compiler output will be more optimal in some
cases, and less optimal in other cases, compared with the the current
Thumb-2 behaviour, due to differing asm block size estimates. The
exact behaviour will depend on the distribution of conditional
instructions within asm blocks.
* Taken over a whole compilation unit, the total code size
overestimate (and therefore the impact on object layout) will normally
be modest, due to the small typical size of asm blocks.
* Behaviour for -marm will not be impacted at all.
If gcc currently estimated asm block code size accurately, then I
could understand upstream's objection; but as it stands it seems to me
we wouldn't be making anything worse in practice with the proposed
change; and there is no compatibility impact (other than positive
impact).
Of course, I may have some wrong assumptions here, or there may be
some background I'm not aware of...
Comments?
Cheers
---Dave
== Linaro GCC ==
* Finish testing for big-endian/quad-word patch on mainline, and
send upstream. Not yet reviewed by an ARM maintainer, but Joseph
suggested tweaking DejaGnu's target-supports to better reflect the new
capabilities of the vectorizer in big-endian mode. I've not looked into
that yet.
* Started looking at improving element/structure load/store intrinsics.
Made it so that the structs used for loads/stores are created in the
backend so that the types can be used directly by the builtins, but
discovered that the front-end/middle-end would not play along with that
plan as they are. Thought about ways to fix that.
* Some time spent on other CodeSourcery stuff.
The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.4 and Linaro GCC 4.5.
Linaro GCC 4.5 is the fifth release in the 4.5 series. Based off the
latest GCC 4.5.1+svn167157, it includes many ARM-focused performance
improvements and bug fixes.
Linaro GCC 4.4 is the fifth release in the 4.4 series. Based off the
latest GCC 4.4.5, it is a maintenance release that fixes one problem
found through use.
Interesting changes include:
* A new performance focused extension elimination pass
* Speed and size improvements when loading constants
* Performance improvements on compound conditionals
* A range of correctness improvements
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.5-2010.12-0
and
https://launchpad.net/gcc-linaro/+milestone/4.4-2010.12-0
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
No changes have been committed to Linaro GDB 7.2 this month.
-- Michael
* Linaro GCC 4.4/4.5
Merged the latest CS patches and Linaro merge requests into Linaro GCC
(4.4 and 4.5). Ran regression tests. Yao's patch failed so I backed it
out, and made the release tarballs. Uploaded the releases to Michael
Hope for release.
lp:686381: luatex fails to build with gcc-4.5
Fired off a test build to reproduce the problem. Will come back to this
next week.
* GCC 4.6/4.7
Posted my various queued patches up to gcc-patches(a)gcc.gnu.org for review.
Looked at the state of the GCC 4.6 upstream build. There are currently
two problems:
1. libquadmath must be disabled in a cross-build for the bootstrap phases.
2. libstdc++ doesn't build. There is a patch for it on the mail list,
but it's not applied yet.
Once gcc 4.6 builds cleanly, I shall update the Launchpad 4.6 branch,
and declare that the baseline for our development. We'll then have
somewhere to commit and track patches awaiting GCC stage 1 development.
* Other
Caught up with email following my holiday.
Yet again, my IGEPv2 board suffered a corrupt file system. I've now
upgraded the kernel and configured it to use an NFS root. The board is
now somewhat less mobile, but should work more reliably.
Continued organizing the a brain-storming session for GCC optimization
improvements.
Organised flights and hotel for both the Linaro Sprint and CodeSourcery
annual meeting in January.
== Last Week ==
* Spent time tracking down a couple of regressions that appeared in the
new ltrace release-candidate tree. Submitted a bunch of patches to fix
the issues that were discovered during that process; most have been applied.
* Finished writing fairly generic code for handling ARM-specific unwind
tables, from lookup through decoding and dispatch. It uses a few
definitions specific to libunwind, but those probably could be
eliminated with more work.
== This Week ==
* Integrate new ARM-specific bits into libunwind framework.
* Rewrite part of a portability patch for ltrace and hope that those
changes reflect the very last effort that will be required for that
particular task.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
On Fri, Dec 3, 2010 at 9:06 AM, Yao Qi <yao.qi(a)linaro.org> wrote:
> Hi, Kernel WG,
> Can recent kernel handle NEON registers in corefiles?
>
> Seems we've had plan for this in "Ensure full NEON debug support" in
> https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Specs/BSPInvestig…
> Any progress on this piece of work? We want to handle NEON registers in
> corefiles from GDB, which required kernel dump them in corefile first.
Hmmm, actually that bullet may have ended up in the wrong place ...
since it's not a BSP-specific feature.
Anyway, looking at the kernel code, it looks like the VFP/NEON state
is not dumped into the core file. If it makes you feel better, the
state of the obsolete FPE extension registers is dumped, if used :/
My guess is that it shouldn't be hard to dump the VFP/NEON state, but
GDB and the kernel need to agree on the format.
Rather that trying to hack the existing register dump format in a
compatible way, I suggest it's simplest if the kernel creates an extra
section in the dump containing something like:
.long format_version /* reserved for future expansion - must be 0 */
.long FPSID
.long FPSCR
.long MVFR0 /* or 0 if not present in the hardware */
.long MVFR1 /* or 0 if not present in the hardware */
.long d0
.long d1
/* ... d2-d14 ... */
.long d15
If 32 D-registers in the hardware [
.long d16
.long d17
/* ... d18-d30 ... */
.long d31
]
I believe we don't need any extra flags to indicate whether the MVFRx
fields are valid, since 0 in these registers indicates the
VFPv2/legacy behaviour anyway. Note that some VFPv2 implementations
(such as ARM1176) do provide these registers, and where the hardware
has them, the kernel can fill them in when doing the coredump.
We _should_ be prepared to ignore these fields (or interpret them
differently) if a vendor-specific VFP subarchitecture is specified (by
(FPSID & 0x4000) == 0x4000)
The number of D-registers can be deduced from the FPSID and MVFRx
registers, so we don't need to record it explicitly.
When MVFRx are not present, there are 16 D-registers.
When MVFRx are present, and (MVFR0 & 0xF) >= 2, there are 32 (or more)
D-registers
This is just a sketch -- the ARM ARM is the authoritative reference on
the meanings of these bitfields.
Any views on this?
Cheers
---Dave
>
> _______________________________________________
> linaro-dev mailing list
> linaro-dev(a)lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev
>
== GCC ==
* Tracked down root cause of GCC mainline bootstrap failure
on ARM (PR 46040 - "__DTOR_LIST__ undeclared").
== Miscellaneous ==
* Set up IGEP v2 board (w/ local disk, network, ...) as
native GCC / GDB build environment.
* Gave talk on Linaro at the "2010 Linux Community Event" at
Siemens Munich (w/ Arnd Bergmann).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
You have been invited to the following event.
Title: TWG GCC Optimization
Discuss ideas for improving GCC optimization for ARM. Open to anybody who
wants to contribute.
https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations
International: +44 1452 567 588
UK: 0844 493 3801
Brazil: 08008912092
China: 108007121533
India: 0008001006354
Taiwan: 00801126472
United States: 18666161738
Conference code 2634417169#
When: Wed 2010-12-15 9am – 10am London
Where: Conference call code 263 441 7169
Calendar: linaro-toolchain(a)lists.linaro.org
Who:
* andrew.stubbs(a)linaro.org - creator
* Ken Werner - optional
* stevenb.gcc(a)gmail.com - optional
* Michael Hope - optional
* David Gilbert - optional
* Peter Maydell - optional
* ulrich.weigand(a)de.ibm.com - optional
* Yao Qi - optional
* Julian Brown - optional
* paul(a)codesourcery.com - optional
* Ira Rosen - optional
* richard.earnshaw(a)arm.com - optional
* Chung-Lin Tang - optional
* mark(a)codesourcery.com - optional
* linaro-toolchain(a)lists.linaro.org - optional
* Richard Sandiford - optional
* Marcin Juszkiewicz - optional
* zwelch(a)codesourcery.com - optional
Your attendance is optional.
Event details:
https://www.google.com/calendar/event?action=VIEW&eid=aXJlc2Z2bWZmZzQ5cGQ4b…
Invitation from Google Calendar: https://www.google.com/calendar/
You are receiving this courtesy email at the account
linaro-toolchain(a)lists.linaro.org because you are an attendee of this event.
To stop receiving future notifications for this event, decline this event.
Alternatively you can sign up for a Google account at
https://www.google.com/calendar/ and control your notification settings for
your entire calendar.
== GCC related ==
* PR44557, Thumb-1 ICE: found two small needed corrections in the ARM
backend to fix this. Sent patch upstream.
* LP:641397/PR46888: bitfield insert optimization. Posted patch along
with Andrew Stubbs' CSE patch upstream. The two-patch situation seemed
to stir some discussion :) It seems both patches were deemed okay,
though it would be better if there was some testcase that passed through
unhandled by Andrew's CSE patch, but processed by my combine fix later
in the pass pipeline. Both patches queued at GCC bugzilla page, to be
handled in the next stage1.
* LP:687406/PR46865, -save-temps creating different code. Analyzed
problem, though slow to send fix upstream. Had some discussion on the
list on how the fix should be like.
* PR46667, section type conflicts. Tested and sent mail to gcc-patches.
Jan Hubicka picked it up and pinged for an approval again. Hope this get
resolved soon.
* PR45416, ARM code regression. ARM considerations are mostly okay, main
issue remaining is how to solve the x86 regression. The current expand
code does a full DImode shift just to obtain a single bit, might be
point of improvement to solve this.
* LP:685534, ftbfs with gcc-linaro 4.5 on amd64. Found to be another
erroneous inline asm case. Fixed and updated on LP.
== This week ==
* More upstream and Linaro GCC issues.
* Start dealing with January travel.
* Think more about larger (Linaro) GCC optimization projects.
== Linaro GDB ==
* LP:616000 Handle -fstack-protector prologue code
Understand how frame affect expression validation in GDB. Improve i386
prologue parsing to handle 'and/add' sequence. Revise i386 prologue
parsing for stack protector. Patch is not submitted since still lack of
i386 prologue knowledge, and not very confident on that patch.
Understand prologue-value used in ARM prologue parsing and relationship
of symbol and frame in GDB. Make ARM prologue parsing understands stack
protector code by identify the code sequence. Improve the patch by
supporting ARM mode and ARMv5T, in order to make this patch accepted by
upstreams.
* LP:615972 gdb.base/gcore.exp failure.
The failure is about "corefile restored general registers", which is not
related to NEON register support in corefile. It is caused by
inconsistent register types and names between tdesc and arm. The cause
of this failure is found, but upstreams reviewer doesn't agree on one of
my fix. As he suggested, arm-core.xml is modified to add "type=XXX",
but get some errors when regenerate arm-with-iwmmxt.c. Filed GDB PR 12308.
== GCC ==
* register rename improvements (LP:633243)
Finally, got both middle-end part and ARM part approved. Committed to
upstreams mainline. Some benchmarks in EEMBC shows 0.1%~0.2% code size
reduction.
== This Week ==
* Ping my GDB patches.
* Fix GDB PR 12308, which blocks my fix for LP:615972.
* Backport my approved patches to Linaro GDB if any. Fix other GDB bugs.
* Pass one GCC patch in my queue to review, if I have extra time.
== Vacation ==
Take vacation on Dec 30th and 31st. Travel to ChengDu, and back on 3rd
Jan (It is public holiday in China). Back to work on 4th Jan.
--
Yao (齐尧)
== This week ==
* Away Monday and Tuesday.
* Very little the rest of the week due to other IBM commitments.
I've just finished the main part of that work, so all being well,
it should only need a bit of nannying next week. Most of the week
should be Linaro.
* Started trying to reproduce #641126, but realised that I'd need
to set myself up for general Ubuntu cross package building first.
Started to look at what's involved.
== Next week ==
* Get stuck into #641126.
* More STT_GNU_IFUNC and vectors.
Richard
RAG:
Red:
Amber:
Green:
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 |
complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
Progress:
* merge-correctness-fixes:
** I have sent out an updated ARM fixes pull request, all
of whose components have been Reviewed-by: Nathan Froyd;
I expect this to be merged shortly.
** vqshl(reg) patch posted to list; I have an update which
also addresses vqshl{,u}(imm) which I'll send out as v2
once the first part has been reviewed.
** reviewed and retransmitted Wolfgang's semihosting commandline
patches since he is having trouble sending unmangled mail to
the list
** went through the monster qemu-maemo commit "Lots of ARM TCG changes"
http://meego.gitorious.org/qemu-maemo/qemu/commit/3f17d4e1cb
identifying what fixes it includes
** started looking at a VRSQRTS patch. This uncovered a number
of qemu issues: NaN propagation is wrong, flush-to-zero handling
is only flushing output denormals, not input denormals, and we
don't handle the Neon "standard FPSCR value" but always use the
real FPSCR. I have some preliminary patches for at least some
of this, but since they affect a number of the same bits of
code that are touched by existing not-yet-committed patches I'm
waiting for those to be committed first.
* verify-emulation:
** wrote a README for risu and made it public at
http://git.linaro.org/gitweb?p=people/pmaydell/risu.git;a=summary
* maintain-beagle-models:
** the ubuntu maverick netbook image doesn't boot on qemu because
it uses the OMAP NAND prefetch/DMA, which isn't modelled
https://bugs.launchpad.net/qemu-maemo/+bug/645311
I've started on this and am perhaps halfway through (basic
prefetch code implemented, but DMA and debugging still to go)
* other:
** took part in an OBS mini-sprint where we were walked through
how the OBS buildsystem works and can be used to do test rebuilds
of Meego with new versions of the toolchain.
Meetings: toolchain, pdsw-tools
Plans:
* finish omap NAND prefetch engine work
* make sure ARM changes get committed to qemu...
Absences: (complete to end of 2010)
Fri 17 Dec - Tue 4 Jan inclusive.
2011: Dallas Linaro sprint 9-15 Jan. Holiday 22 Apr - 2 May.
Hi,
* created custom kernel deb packages from the linaro-linux tree in order to
* test the various ftrace tracers and profilers available on ARM
* results at: https://wiki.linaro.org/KenWerner/Sandbox/ftrace
* started to look into crash (kexec, kdump) but wasn't able to generate a
kernel dump yet
Regards
Ken
Dear All
Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers
1.. Gentoo Complier(part of Chrome OS Build Environment)
2.. GCC 4.4.1 (Code Sourcery).
3.. Linaro (gcc-linaro-4.5-2010.11-1)
Flags used to Build Linaro Tool chain
used Michael Hope Script .Just modified "GCCFLAGS = --with-mode=thumb --with-arch=armv7-a --with-float=softfp --with-fpu=neon --with-fpu=vfpv3-d16"
a.. Using the above three tool chains we compiled the kernel of Chrome OS and did Coremark Performance test.(With same optimisation flag mentioned in the attachment)
b.. Test Environment for all the three are the same.
My Questions
1.. Is there any build options that I am missing while I am building the Cross Compiler?
2.. Else is this performance degradation is a know issue and is the tool chain group working on it?.(If so whom to contact?)
Any Pointers from you would be of great help to me.
If you need any further details also do ping me
Regards
Prashanth S
Hi All,
As we discussed on Monday, I think it might be helpful to get a number
of knowledgeable people together on a call to discuss GCC optimization
opportunities.
So, I'd like to get some idea of who would like to attend, and we'll try
to find a slot we can all make. I'm on vacation next week, so I expect
it'll be in two or three week's time.
Before we get there, I'd like to have a list of ideas to discuss. Partly
so that we don't forget anything, and partly so that people can have a
think about them before the day.
I'm really looking for bigger picture stuff, rather than individual poor
code generation bugs.
So here's a few to kick off:
* Costs tuning.
- GCC 4.6 has a new costs model, but are we making full use of it?
- What about optimizing for size?
- Do the optimizers take any notice? [1]
* Instruction set coverage.
- Are there any ARM/Thumb2 instructions that we are not taking
advantage of? [2]
- Do we test that we use the instructions we do have? [3]
* Constant pools
- it might be a very handy space optimization to have small
functions share one constant pool, but the way the passes work one
function at a time makes this hard. (LP:625233)
* NEON
- There's already a lot of work going on here, and I don't want it
to hog all our time, but it might be worth touching on.
What else? I'm not the most experienced person with GCC internals, and
I'm relatively new to the ARM specific parts of those, so somebody else
must be able to come up with something far more exciting!
So, please, get brain-storming!
Andrew
[1] We discovered recently that combine is happy to take two insns and
combine them into a pattern that matches a splitter that then explodes
into three insns (partly due to being no longer able to generate
pseudo-registers).
[2] For example, I just wrote a patch to add addw and subw support (not
yet submitted).
[3] LP:643479 is an example of a case where we don't.
Mostly more working with libffi; swapping some ideas back and forwards with
Marcus Shawcroft and it looks like we have
a good way forward.
Got an armhf chroot going, libffi built.
Got a testcase failing as expected.
Trying to look at other processors ABIs to understand why varargs works for
anyone else.
Cut through one layer of red tape; can now do the next level of comparison
in the string routine work.
Started looking at SPEC; hit problems with network stability on VExpress
(turns out to be bug 673820)
long long weekend; short weeks=2;
Back in on Tuesday.
Dave
Hi,
Those of you use silverbell may be glad to know it's back up.
Be a little careful, if you shovel large amounts of stuff over it's network
the network tends to disappear.
(Not sure if this is hardware or driver)
Dave
Hi,
As mentioned on the standup, I just got an armhf chroot going, thanks to
markos for pointing me at using multistrap
I put the following in a armhfmultistrap.conf and did
multistrap -f armhfmultistrap.conf
Once that's done, chroot in and then do
dpkg --configure -a
it's pretty sparse in there, but it's enough to get going.
Dave
==============================================
[General]
arch=armhf
directory=/discs/more/armhf
cleanup=true
noauth=true
unpack=true
explicitsuite=false
aptsources=unstable unreleased
bootstrap=unstable unreleased
[unstable]
packages=
source=http://ftp.de.debian.org/debian-ports/
keyring=debian-archive-keyring
suite=unstable
omitdebsrc=true
[unreleased]
packages=
source=http://ftp.de.debian.org/debian-ports/
keyring=debian-archive-keyring
suite=unreleased
omitdebsrc=true
Hi. As part of my work on qemu I've written a simplistic random instruction
sequence generator and test harness. To quote the README:
risu is a tool intended to assist in testing the implementation of
models of the ARM architecture such as qemu and valgrind. In particular
it restricts itself to considering the parts of the architecture
visible from Linux userspace, so it can be used to test programs
which only implement userspace, like valgrind and qemu's linux-user
mode.
I don't particularly expect this tool to be of much general interest outside
people developing either qemu or valgrind or similar models, but I have
in any case made it publicly available now:
http://git.linaro.org/gitweb?p=people/pmaydell/risu.git;a=tree
-- PMM
Hi all,
I'd be interested in people's views on the following idea-- feel free
to ignore if it doesn't interest you.
For power-management purposes, it's useful to be able to turn off
functional blocks on the SoC.
For on-SoC peripherals, this can be managed through the driver
framework in the kernel, but for functional blocks of the CPU itself
which are used by instruction set extensions, such as NEON or other
media accelerators, it would be interesting if processes could adapt
to these units appearing and disappearing at runtime. This would mean
that user processes would need to select dynamically between different
implementations of accelerated functionality at runtime.
This allows for more active power management of such functional
blocks: if the CPU is not fully loaded, you can turn them off -- the
kernel can spot when there is significant idle time and do this. If
the CPU becomes fully loaded, applications which have soft-realtime
constraints can notice this and switch to their accelerated code
(which will cause the kernel to switch the functional unit(s) on).
Or, the kernel can react to increasing CPU load by speculatively turn
it on instead. This is analogous to the behaviour of other power
governors in the system. Non-aware applications will still work
seamlessly -- these may simply run accelerated code if the hardware
supports it, causing the kernel to turn the affected functional
block(s) on.
In order for this to work, some dynamic status information would need
to be visible to each user process, and polled each time a function
with a dynamically switchable choice of implementations gets called.
You probably don't need to worry about race conditions either-- if the
process accidentally tries to use a turned-off feature, you will take
a fault which gives the kernel the chance to turn the feature back on.
Generally, this should be a rare occurrence.
The dynamic feature status information should ideally be per-CPU
global, though we could have a separate copy per thread, at the cost
of more memory. It can't be system-global, since different CPUs may
have a different set of functional blocks active at any one time --
for this reason, the information can't be stored in an existing
mapping such as the vectors page. Conversely, existing mechanisms
such sysfs probably involve too much overhead to be polled every time
you call copy_pixmap() or whatever.
Alternatively, each thread could register a userspace buffer (a single
word is probably adequate) into which the CPU pokes the hardware
status flags each time it returns to userspace, if the hardware status
has changed or if the thread has been migrated.
Either of the above approaches could be prototyped as an mmap'able
driver, though this may not be the best approach in the long run.
Does anyone have a view on whether this is a worthwhile idea, or what
the best approach would be?
Cheers
---Dave
== Linaro GCC ==
* Worked on quad-word/big-endian fixes patch. Sent off a version
on Tuesday which worked OK, but which made some awkward changes to the
middle-end. Tried to re-think those parts, but without much luck: came
to the conclusion that spending more time trying to fix
element-ordering-dependent operations on quad-word vectors in
big-endian mode was probably not worth the effort (since we plan to be
changing things in that area anyway). Wrote a much-simplified patch
which simply disables those patterns, and ported it to mainline.
* Then, spent some time trying to set up big-endian testing with a
mainline build, since the lack of such an option is partly why we got
into this mess to start with. My current plan (as well as testing the
above patch) is to create an upstreamable patch to easily enable
big-endian (Linux) multilibs, in the hope that it'll generally make
big-endian testing easier. (Of course people will still need test
harness configurations which will allow running big & little-endian
code, which most won't have.)
* Also, ping lp675347 (volatile bitfields vs. QT atomics), and do some
some extra checks suggested by DJ Delorie, which seemed to work out
fine. Backported patch for lp629671 to Linaro 4.4 branch, and ran tests
(also fine).
* Continued discussion of internal representations for fancy vector
loads/stores in GIMPLE/RTL on linaro-toolchain.
== Last Week ==
* Continued implementing support for ARM unwind tables in libunwind.
* Sent patches upstream to improve binutils's readelf, adding support
for all remaining unwind table instructions (i.e. VFP/NEON and WMMX).
When used on ARMv7a, provides meaningful output for previously
'unsupported' opcodes that get used in some libraries (e.g. glibc).
== This Week ==
* Continue working on libunwind.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== GDB ==
* Posted updated implementation of #661253 (Improve
backtrace by using ARM exception tables) to gdb-patches,
which includes several changes requested by reviewers
* Posted updated patch to further improve backtrace
(in the absence of debug info) to gdb-patches
* Commented on a couple of GDB LP bugs
== Miscellaneous ==
* Started setting up IGEP v2 board
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Since it came up in the toolchain meeting this morning, some links to
issues people are having doing ARM scratchbox-style builds because of
generic linux-user issues:
http://bugs.meego.com/show_bug.cgi?id=10529 # linux-user's mmap
implementation isn't very smart
https://bugs.launchpad.net/qemu/+bug/668799 # qemu locking issues
which can cause build failures (sometimes)
-- PMM
https://blueprints.launchpad.net/ubuntu/+spec/other-linaro-n-cross-compilers:
- wrote patches for creating backports PPA
- each component [1] generates versioned -source binary package
(eglibc-2.12.1-source etc)
- a-c-t-b [2] got "PPA" boolean variable in rules to have one source package
for archive and for backports
- I built a-c-t-b with all components backports from natty in lucid pbuilder
Bugs:
- 684625 - libc6 is compiled for armv5 instead of armv7a
- confirmed, wrote fix, will sent for review and merge
- 683832 - gcc fails to cross compile Qt
- confirmed in maverick for cross gcc 4.4/4.5
- need to check with fixed (bug 684625) toolchain
- FTFBS of armel-cross-toolchain-base 1.53/natty
- issue is lack of LTO plugin built in gcc/stage2
- have first patches for it, need to test
1. component = eglibc, gcc-4.4/4.5, binutils, linux
2. a-c-t-b = armel-cross-toolchain-base
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
== GCC related ==
* PR44557, Thumb-1 ICE: originally thought a fix of constraint will
work, however after simplifying the testcase, received another ICE in
postreload, due to a load of IP, which is not permitted in Thumb-1.
Looking at some reload internals as part of fixing this.
* PR45416, ARM code generation regression. First fix from last week hit
an assert FAIL in the alias-oracle due to ARRAY_REFs not being handled
there. Also, further found some expand code quality regressions due to
this change. Turned to a more conservative fix by adding the related TER
substitution to expr.c:do_store_flag(), which produced more focused
results. However, 32-bit x86 slightly regressed in the same flag storing
code (did not use the 'testl' insn after the change). Still WIP.
* PR46667, submitted a section type conflict bug fix upstream, see
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00137.html , which
supposedly fixes the upstream ARM-Linux C++ build. Jan Hubicka later
gave another fix, so still in discussion.
* PR45886, this PR is call for backporting the __ARM_PCS* preprocessor
symbols to gcc-4_5-branch. Submitted a mail to ask for approval, no
response.
== libffi VFP hard-float ==
* PR46508, libffi VFP assembly error. I missed this earlier due to using
a compiler configured with --with-fpu=vfp. Submitted assembly fix to add
the needed FPU directives. Committed to upstream trunk.
== This week ==
* Hope to wrap up the above in-progress PRs, as well as continue to look
at other PRs of interest.
* LP #685534 popped up on Sunday, and manifests on upstream trunk too.
Add this to queue.
* Think about GCC performance opportunities (Linaro)
== Linaro GCC ==
* Reproduce regression of my ldm/stm backport on 4.5, which is caused by
the other two merged patches in ifcvt.c. Fix them. Propose merge
request again. Learn how to sync/merge changes from one branch to the
other branch.
* Fix VFP_D0_D7 handling in predicate vfp_register_operand. Approved
and committed upstreams.
* Test new regrename improvement patch on x86_64, and measure
effectiveness of it on ARM. Code size of bash-3.2 is reduced 0.2% with
option "-march=armv7a -mthumb -O2 -frename-registers". Eric B. is
almost OK with this patch except some wording in comments.
== Linaro GDB ==
As discussed in UDS, I'll move to GDB work for gdb correctness.
http://ex.seabright.co.nz/helpers/planner#tr-toolchain-gdb-correctness
In this month, I'll focus on GDB testsuite failures fixing.
* Analyze LP:615978, failures in gdb.base/annota3.exp.
Signal is not delivered to child while software single-stepping. The
same as LP:649121.
* Fix failure in gdb.xml/tdesc-regs.exp. LP:685494
It is caused by a target triplet matching error, when target is set to
"armv7l-linux-gnueabi". Target triplet matching in test cases should be
changed. Patch is being reviewed in upstreams.
* LP:616000 failures caused by -fstack-protector.
Homework to understand frame-related code in GDB. Got some big picture
of usage of some key data structures inside GDB on frame. Compared with
prologue with and without stack-protector, find some difference there.
Still no clue on how to educate GDB to identify whether stack-protector
is turned on or off.
* Fix one failure in printcmds.exp. LP:685702
This test case on 7.2 branch is a little bit out of date, compared with
GDB trunk. Backport one patch on trunk to 7.2 branch can fix this
problem. Backport patch is being reviewed in upstreams.
* Neon registers in kernel dump file. LP:615972
Ask Linaro kernel WG to see how to move forward on this. Discussion is
still ongoing.
== This Week ==
* Report the rest of GDB testsuite failures.
* Pick up some of them, and fix.
* Pass gcc patches in my queue one by one to gcc-patches to review.
--
Yao (齐尧)
RAG:
Red:
Amber:
Green:
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 |
complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
Progress:
* merge-correctness-fixes:
** Nathan Froyd (CodeSourcery) has reviewed a lot of
my ARM patches. Most were OK, one or two needed tweaking
We seem to have come to agreement on how best to treat
the API between qemu and the softfloat library, and I
have a V2 patchset ready to mail as soon as Nathan has
commented on the final patch.
** posted a patch to rename a very misleading _is_nan()
function
** identified list of correctness patches in meego and
samsung trees and issues noted within ARM
** qemu: posted patch to remove an unused function
** started looking at the first patch in the meego tree,
which fixes VQSHL. I have already discovered a bug in
this insn not covered by the meego patch...
Meetings: toolchain, PD update, ARM 20th birthday party
Plans:
- qemu consolidation
Issues:
* Locking in qemu is definitely insufficient, especially
(but not exclusively) when running multi-threaded
programs in linux-user mode.
https://bugs.launchpad.net/qemu/+bug/668799
has an example problem and some discussion; I'm hoping
some other qemu developers have an opinion, but the
nicest approach IMHO would involve fairly invasive
changes to how qemu implements interrupting a cpu
which is executing TCG code.
Not sure where this should sit in the priority list.
Things of note:
- there has been some discussion of broadening the "KVM Forum"
conference to include other virtualisation related topics including
Xen and also the TCG aspects of Qemu. Still all up in the air but
possibly colocated with LinuxCon in Vancouver in August. See:
http://www.linux-kvm.org/page/KVM_Forum_2011
Absences: (complete to end of 2010)
Fri 17 Dec - Tue 4 Jan inclusive.
2011: Dallas Linaro sprint 9-15 Jan. Holiday 22 Apr - 2 May.
== This week ==
* Looked at a generic bug in GAS's handling of ifuncs. Sent a patch upstream:
http://sourceware.org/ml/binutils/2010-11/msg00495.html
Alan quite reasonably wanted me to test on a variety of targets. For want
of anything better, I wrote a script to test Alan's list of 118 targets.
Tests went OK, patch committed upstream.
* Wrote more IFUNC tests. Found another problem (as yet unresolved).
* Looked at vector stuff, but nothing tangible yet.
(I also had to spend some time on other IBM things, sorry.)
== Next week ==
* Away Monday and Tuesday.
* More STT_GNU_IFUNC and vectors.
Richard
* Benchmarking of simple package builds with various string routine
versions; not finding enough difference in the noise to make any large
conclusions
* Looking at the string routine behaviour with perf to see where the time
is going
- getting hit by the Linaro kernels on silverbell missing Perf
enablement in the config
- Useful amount of time does seem to be spent outside the main 'fast
aligned' chunks of code
- pushing/popping registers does seem to be pretty expensive
* Started looking at libffi and hard float
- Started writing a spec
https://wiki.linaro.org/WorkingGroups/ToolChain/Specs/LibFFI-variadic
- It's going to need an API change to libffi, although the change
shouldn't break any existing code on existing platforms where they work.
* Helping with the image testing
Dave