== Last Week ==
* Reached the point with understanding libunwind where I can begin
writing patches for parsing unwind information out of .ARM.exidx and
.ARM.extab ELF sections.
== This Week ==
* Begin writing support for ARM-specific unwind information to libunwind.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== Linaro GCC ==
* Continued looking at big-endian/quad-vector patch: attempted to
figure out the proper semantics for vec_extract in big endian mode
(about 1 day). Put on hold temporarily to work on lp675347, QT failing
to build due to constraint failure in inline asm statements used for
atomic operations: found the patch which introduced the failure, and
suggested a workaround to the OP. Came up with a plausible-looking
patch, and started testing it, after spending some time trying to
figure out why ARM Linux mainline doesn't build at present. Patch sent
upstream.
Hi Richard,
As per the discussion at this mornings call; I've reread the TRM and I
agree with you about the LSLS being the same speed as the TST. (1 cycle)
However as we agreed, the uxtb does look like 2 cycles v the AND 1 cycle.
On the space v perf theme, one thing that would be interesting to know is
whether there are any icache/issue stage limitations;
i.e. if I have a stream of 32-bit Thumb-2 instructions that are all listed
as 1 cycle and are all in i-cache, can they be fetched
and issued fast enough, or is there a performance advantage to short
instructions?
Dave
LP:663939 - Thumb2 constants
* Continued testing, found a few bugs. Tidied a few bits up.
* Wrote some new testcases to go with the patch.
LP:618684 - ICE
* Begun looking at this one. So far I can't reproduce it. I have a
debuggable native toolchain building, but it'd been delayed by hardware
issues.
In the course of testing I discovered that the ARM FSF config wasn't
testing the right thing, so begun work on a new, more appropriate FSF
build/test config for Linaro work.
Also found the the SD card rootfs in my IGEPv2 board was corrupted. I've
restored it from backup, and now it's working once more.
== Linaro and upstream GCC ==
* Linaro launchpad issues:
- LP #672833, x64-64 varargs regression: after testing pushed bzr branch
for merging.
- LP #634738, inefficient low bit extraction: some discussion with Yao.
- LP #618684, ICE when building ziproxy: looked into and quickly found
not reproducible anymore of Linaro 4.5 trunk.
* Worked on some GCC bugzilla PRs:
- PR44557, ICE in Thumb-1 secondary reload: this should be fixed by a
change of the scratch operand constraint of "reload_inhi" from "r" to
"l". Interesting to note that this was from the
merged-arm-thumb-backend-branch merge, from about 10 years ago.
- PR46508: libffi fails to build on VFP asm instructions, seems to need
a '.fpu vfp' directive. Probably missed earlier because my toolchain was
configured with --with-fpu=vfp.
- PR45416: 4.6 code generation regression on ARM, after expand from SSA
changes. Looking at this currently.
== This week ==
* Look at Linaro issues with higher priority.
* Continue working on GCC PRs.
== Linaro GCC ==
* Merge ldm/stm patch to Linaro 4.5 tree.
Found two regressions on the last minute of proposing merge request in
pass ce3. Revert one of ldm/stm patches about ifcvt. Complete testcase
in branch.
* Try Richard E.'s "TST to LSLS transformation" patch on cortex-a9 with
FFMPEG. No speed improvements.
* Various Linaro GCC Bug fixing.
** LP:634738
Follow the fix to GCC PR40697, and create a new patch, which emits
extzv or shift rather than loading constants in some cases. Tested on
FSF GCC trunk, and no regression. However, found a regression by eyes
in pr44999.c, in which, ubfx (4byte) is generated, rather than uxth
(2byte). uxth is produced by combiner from ashift and lshiftrt. During
reading arm.c, find that constant handling in thumb2 should be improved
to some extent.
** LP:633243
Re-implement regrename improvement, as Eric B. suggested in
gcc-patches. Spend some time on understanding API in GCC related to
hard-reg. Tested on x86_64-linux. No regression.
** LP:638935
Update my tree to FSF trunk, and find RTL seq for fldm/fstm peephole
disappears due to fix to PR45722. Extend arm-ldmstm.ml to support vfp.
Peephole and RTL patterns for vfp are done. Will revise
arm.c:{load,store}_multiple_sequence to accept vfp data.
Fix a bug in ldm/stm peephole when starting offset is negative.
== This week ==
* LP:634738: Figure out how uxth is produced by combiner.
* LP:633243: Test it on ARM.
* LP:638935: Revise {load,store}_multiple_sequence to accept vfp data.
--
Yao (齐尧)
Re my recent email "Upstream GCC feature freeze", I think we're agreed
that we need to create a branch that tracks GCC 4.6 development, but has
our own performance improvements included. The question is where to host it?
Option 1: Launchpad/bzr
Pros:
* We need no permission to do it
* The branch will naturally evolve into our 4.6 release series in time.
* The 3-way merge works well (if slowly)
* We can include patches that we have no intention of posting upstream
ever
* Our patch tracker will Just Work.
* Merge requests will be available.
Cons:
* Bzr ;)
* It's hidden away from the view of most GCC developers
Option 2: GCC SVN branch
Pros:
* We can work in the open, submitting patches via gcc-patches, as usual
* The final merge to GCC trunk (come stage 1) will be eased, a little
Cons:
* We can't really apply anything we want just for ourselves
* we may end up maintaining an LP branch shadowing the svn branch
* When we do want to do 4.6 in LP, we'll have to backport all our
patches from 4.7, and this may no longer be straightforward.
* Write permissions not clear.
* Although I think you can just go ahead and do it?
OK, so I'm sure I've missed some big ones. Please discuss! ;)
I think the big question here is, when will we start wanting to make
(unstable/experimental) Linaro GCC 4.6 releases? If we want to do it
early, then we'll have no choice but to have an LP branch to release from.
Andrew
Like everyone from Toolchain WG I will share my activites in last week:
1. cross compilers for archive
- discussed with doko about dropping update-alternatives use
- wrote gcc-defaults-armel-cross 1.4 which does proper symlinks for cross
compilers
- wrote gcc-4.5-armel-cross 1.41 which removes update-alternatives support
- wrote gcc-4.4-armel-cross 1.37 which removes update-alternatives support
- wrote armel-cross-toolchain-base 1.53 which has all updates which I had
- sent all of them to Steve for review
Status of changes:
- default version of armel cross compiler will be 4.5 like it is in Natty
- both 4.4 and 4.5 will be provided as it is for native
- any traces of update-alternatives use should be removed
Needs to be done:
- adding conflicts on older cross compilers to gcc-defaults-armel-cross
Order of upload to archive:
- armel-cross-toolchain-base
- gcc-4.5-armel-cross
- gcc-4.4-armel-cross
- gcc-defaults-armel-cross
2. Checked few old bugs do they still apply:
- Bugs #646729, #637454, #671455 are done with armel-cross-toolchain-base 1.52
(landed in maverick-proposed)
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
Short week.
Finally got external hard drive for my beagle - makes it sanely possible to
natively build things.
Got eglibc cross built (Thanks to Wookey for pointing me in the right
direction with the magic incantation of dpkg-buildpackage -aarmel
--target=binary) and
easily rebuilding . I have a version with the neon version of my memset
built into it - it doesn't seem to make a noticeable difference to my
ghostscript benchmark
though.
Panda's aren't likely to turn up until mid December; arranging borrowing
an A9 is turning out to be difficult, but it looks like we should be able to
get access to
the one in the London datacentre - although it has a disc problem at the
moment.
I did manage to get a colleague to try my tests on his own Toshiba AC-10
(Tegra-2 - no Neon); the
graphs had approximately the same shape as my previous Panda tests. Memchr
looked pretty
good on there.
Also trying to look at the sign off I need for various libc access.
Dave
I mainly worked on the atomic memory operations blueprint/item:
* posted an updated patch for #643171 on the libc-ports ml after running the
glibc testsuite natively on the vexpress
* continued to learn about the ARM instructions involved :)
* started to write some gcc testcases that scan the asm output of the __sync
builtins (mainly to detect differences between the gcc versions - not sure how
useful those tests would be for upstream as the sequences may easily change)
Ken
RAG:
Red:
Amber:
Green:
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 |
complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
Progress:
* Most of this week spent at the Meego conference in Dublin.
This seemed to be a rather apps-developer centric conf,
with not much of interest on the low-level side. There were
a few useful talks/conversations, though.
* Intel were giving away Atom-based netbooks to all attendees;
that's a lot of developers who are going to be testing and
optimising their apps for Atom devices rather than ARM...
* qemu: looked at https://bugs.launchpad.net/bugs/668799 ;
we don't seem to be taking the right lock before we manipulate
the graph of translation blocks. I have a fix which stops the
reported segfault, but the code has a number of "XXX not thread
safe" and "FIXME: not SMP safe" comments and generally doesn't
seem to have a coherent locking design :-(
* qemu: sent some minor patches upstream:
+ enable iwmmxt coprocessors in user mode
+ remove some unused functions from target-arm and target-sparc
+ fix a failure to build bug in a makefile
* qemu: some review of a patch to fix semihosting SYS_GET_CMDLINE
Plans
- qemu consolidation
- post-toolchain-review, sort out some milestones for
this report
Absences: (complete to end of 2010)
Thu/Fri 25-26 Nov; Fri 17 Dec - Tue 4 Jan inclusive.
(Dallas Linaro sprint 9-15 Jan.)
== This week ==
Started looking at STT_GNU_IFUNC support in BFD. There were a couple
of janitorial changes I needed to make in order to prepare elf32-arm.c
for the main patch. I tested those separately and submitted them upstream:
http://sourceware.org/ml/binutils/2010-11/msg00330.htmlhttp://sourceware.org/ml/binutils/2010-11/msg00331.html
I've now finished a prototype implementation of the STT_GNU_IFUNC
support itself. It wasn't as mechanical as I'd originally assumed,
which was nice.
Tests that I've run by hand seem to be doing the right thing.
I've now started writing tests for the testsuite (meaning:
I've completed 1 test so far).
== Next week ==
* Add more tests, including Thumb coverage.
* Start on the libc changes.
Richard
Doing an allmodconfig build on the kernel, I get the following:
CC arch/arm/kernel/asm-offsets.s
In file included from
/home/rob/proj/git/linux-2.6-dt/include/linux/kernel.h:12,
from
/home/rob/proj/git/linux-2.6-dt/include/linux/sched.h:54,
from
/home/rob/proj/git/linux-2.6-dt/arch/arm/kernel/asm-offsets.c:13:
/usr/lib/gcc/arm-linux-gnueabi/4.4.5/include/stdarg.h:40: internal
compiler error: Segmentation fault
It occurs on Maverick 4.4, 4.5 and CodeSourcery 2009Q1 cross toolchains.
It's confirmed by Codesourcery here:
http://www.codesourcery.com/archives/arm-gnu/msg03719.html
What's the status on this issue? I didn't see anything in Linaro gcc
bugs that looks related.
Rob
The STT_GNU_IFUNC blueprint:
https://wiki.linaro.org/WorkingGroups/ToolChain/Specs/Binutils-STT_GNU_IFUNC
says "the ARM EABI will be updated to support STT_GNU_IFUNC's requirements".
I suppose the most obvious thing that needs to be defined is the relocation
number for R_ARM_IRELATIVE. What's the best way of handling that?
The main options seem to be:
1. Reserve a relocation number with ARM first (129?).
2. Go ahead and implement it without having the EABI updated.
See whether the results are good before deciding whether
to bless it in the EABI.
3. Since STT_GNU_IFUNC is a GNU-specific, treat R_ARM_IRELATIVE
as GNU-specific too, and pinch one of the R_ARM_PRIVATE relocs.
I'm pretty sure (3)'s not the way to go, but I was aiming for
completeness. :-)
Richard
Hi,
On 17 November 2010 05:35, Michael Hope <michael.hope(a)linaro.org> wrote:
> 1. How easy is it to frequently merge in SVN? It used to be terrible
> as you had to manually track the merges. These days can you do a 'svn
> merge trunk' and have it just work?
I asked Mike Meissner to answer this question. Mike is very experienced in
GCC and GCC SVN branch management. I am attaching his reply.
Ira
I sent this recently to ppc64-toolchain(a)linux.ibm.com on how to use
svnmerge to manage branches:
This script (also ~meissner/meissner/bin.sh/svnmerge) is what I use to
update svn directories, such as ibm-gcc-4_5-branch. I think I originally
got it
from Ben E. and it may be in the contrib directory.
Typically the way I start a branch, such as my normal power7-meissner
branch, I do the following:
$ export TRUNK="svn+ssh://@gcc.gnu.org/svn/gcc/trunk"
$ export BNAME="power7-meissner"
$ export BRANCH="svn+ssh://@gcc.gnu.org/svn/gcc/branches/ibm/$BNAME"
$ export SRC="$HOME/fsf-src"
$ svn delete -m"delete old branch" $BRANCH
$ svn copy -m"Clone new branch" $TRUNK $BRANCH
$ cd $SRC
$ svn co $BRANCH
$ cd $BNAME
$ svnmerge init
$ svn update # this is sometimes needed
$ svn commit -m'Create svnmerge init info'
$ export REV="xxxx" # substitute subversion id for xxxxx
$ echo "power7-meissner branch, based on $REV." > gcc/REVISION
$ touch gcc/ChangeLog.power7
$ <edit gcc/ChangeLog.power to create initial contents>
$ svn add gcc/ChangeLog.power gcc/REVISION
$ svn commit -m'Add REVISION to branch'
In particular, creating GCC/REVISION allows you to tell what subversion
revision the source is based against. You can find the information via:
$ svn propget svnmerge-integrated
but it is a lot easier if you have a compiler tree to do gcc -v. After you
do a propget, you will need to do a svn update.
In this case, I use gcc/ChangeLog.power7 to hold the ChangeLog entries
local to the branch. That way I can see a summary of the changes, but not
pollute
the normal ChangeLog files.
To do merges, you need to make sure that all local changes are checked into
the branch. Then do:
$ cd $SRC/$BNAME
$ svnmerge merge
$ <edit gcc/REVISION and ChangeLog.power7 to indicate merge>
$ <test merged files, if satisified, check them in>
$ export REV="xxxx" # substitute subversion id for xxxxx
$ svn update # just in case
$ svn commit -m"Update to subversion id $REV"
Now, to create a patch file do, make sure the files are checked in:
$ cd $SRC/$BNAME
$ export PATCHFILE="$HOME/patches/mypatch.patch01"
$ <make ChangeLog entries in $PATCHFILE>
$ svn diff --old $TRUNK --new . -r $REV >> $PATCHFILE
$ <delete ChangeLog.power7, REVISION, property changes from $PATCHFILE>
$ submit patch
To see if there are changes to be merge in:
$ svnmerge avail
For example on the ibm-gcc-4_5-branch, the following changes are available
to be merged in: 164657-166510 when I originally wrote this message on the
9th
of November, and Peter has subsequently updated the merge.
I put the folliwng in ~/.subversion/config to provide my own diff command:
### Set diff-cmd to the absolute path of your 'diff' program.
### This will override the compile-time default, which is to use
### Subversion's internal diff implementation.
diff-cmd = /home/meissner/bin.sh/svndiff
Every so often, I find svnmerge misses, for example in deleting
directories.
It is helpful to do a diff from the mainline every so often to make sure
you are not missing newly created files or still are keeping older files or
just missed a change.
I'll include svndiff for the smarter svndiff command and mrm-changelog.el
that looks for the ChangeLog.<name> files I use in different branches. Feel
free to contact me to clarify some stuff.
--
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner(a)linux.vnet.ibm.com fax +1 (978) 399-6899
(See attached file: svnmerge)(See attached file: svndiff)(See attached
file: mrm-changelog.el)
Hi there,
There's a recording of this mornings public plan review available on
the wiki at:
https://wiki.linaro.org/Releases/1105/PublicPlanReview
Also included is a copy of the slides and supporting documents. Might
be interesting for those who missed it.
-- Michael
A heads up. I'd like to have a brainstorming session on potential
Thumb-2 performance improvements in GCC. Think about what you'd like
in such a session, and what preperation should be done, and we can
discuss the discussion (heh) on Monday.
-- Michael
Hi there,
I noticed that there's a QEMU users forum at:
http://adt.cs.upb.de/quf/
and that the abstract submission phase is still open, and closes
November 28th. It would be great to see some participation there and
help identify other key people interested in using and improving QEMU.
--
Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko
Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko
Zach Welch --
== Last Week ==
* Continued working on libunwind support. Trying to figure out why my
signal frame detection doesn't work as expected.
* Kept pace with the ltrace tree, testing recent patches on ARM.
== This Week ==
* Continue to work on libunwind signal frame detection.
Julian Brown --
== Linaro GCC ==
* Looked at issues #663198 (double-precision register expected) --
which was already fixed on the linaro branch, but the bug was reported
against a version just prior to that, and #667490 -- which involved a
possible problem with the NEON "load 0.0" patch. Experimented for a
while with the latter, but could not find anything wrong.
Followed up upstream, and requested a stand-alone test case.
* Worked on a proper solution to the VMOVN-in-big-endian-mode problem,
discovering that several other quadword-register operations were
similarly broken in the process. WIP patch sent to linaro-toolchain for
discussion, but it needs a little more work before it can be applied.
Peter Maydell --
Progress:
* qemu: more cleanup of signal handler VFP patchset;
I think I just need to add iwmmx support and it's good
* qemu: VCVT: found yet another bug, did final patchset
cleanup: submitted to upstream list [8 patch series]
* qemu: submitted a trivial patch to fix a problem where
__get_user/_put_user macros had an unnecessary local var
which could clash with a var being used by the macro user
* set up a tree on git.linaro.org which we can use for
a branch to make pull requests for ARM qemu fixes
* did a rough estimate of time to do an Eagle qemu model
(6 months + testing/bug fixing time)
Issues:
* lost some time to a problem where Linux VMs stopped being
able to talk to the LDAP server; however I have a workaround
and IT are investigating
Meetings:
* toolchain, toolchain standup, pdsw-tools, PD doughnuts
Plans
- attend Meego conference in Dublin (Nov 15-18 inc travel)
http://conference2010.meego.com/
- start on qemu consolidation by upstreaming various ARMv7
correctness fixes
Andrew Stubbs --
== GCC 4.5 ==
* Continued working on LP:663939.
* I still have not worked out how best to fix the constant
propagation problem that has been thwarting my optimization patch,
however I think I understand it better now.
* I have started on adding replicated pattern support to the
constant splitting. Initial results were good, but I discovered that I
had to rearrange the code somewhat to get the cost estimation and
negative/inverted constant support working correctly. So far, I have
it successfully using 16-bit replication pattern constants for
set/add/subtract operations. Other operations appear broken at the
moment, but it's almost certainly just a few tweaks required.
* TODO: Add support for 32-bit replicated pattern constants. Adjust
some of the other two-instruction constant generation techniques to
let them fall through to this new code, where it would be beneficial.
* Pushed the latest set of GCC patches into Linaro GCC 4.5.
Chung-Lin Tang --
== Linaro GCC ==
* Linaro #672833, one batch of my backports of Bernd's postreload
patches exposed some varargs regressions for x86-64, was reverted by
Michael. Tested the compiler and found it was fixed on mainline
rev.162384. Backporting this revision plus the postreload patches
fixed the regressions; x86-64 bootstrap also verified okay. There is
however another PR45027 fix that was needed on trunk, but needs a bit
more clarification if needed on a 4.5 compiler.
* Linaro #641397, CS issue #6753: bitfield optimization. Patch tested
without regressions, posted for CS internal review, should soon push
for Linaro merge.
* Started looking further at some GCC DF, IRA internals.
== This week ==
* Look at more Linaro issues.
* Maybe start looking at some GCC bugzilla PRs.
* There is a local ARM technical event in Hsinchu on Thursday, might
go and look around.
Yao Qi --
== Linaro GCC ==
* Mainline patch backport to Linaro 4.5.
** Patch "Fix an if statement in arm_rtx_costs_1". Verified on Linaro
4.5. 0.1% smaller on size, and 0.2% faster on speed. Merged to Linaro
4.5 by Andrew S.
** Try Nathan F's ifcvt-cond-move patch on cortex-a8 with -O2/O3. No
improvements on speed/size for EEMBC.
** Bernd's ldm/stm patch. Analyze the reason of regression on Linaro
4.5. Found something wrong in IRA rtl dump, and spend sometime on
understanding IRA rtl dump log. Thanks to Chung-Lin, I realize that IRA
dump is correct, and look back to ARM RTL patterns on ldm/stm. Compared
RTL patterns in 4.5 and 4.6, found some difference. Regenerate
ldmstm.md for Linaro 4.5 after update arm-ldmstm.ml a little bit.
Regressions goes away!
No speed improvement, but code is smaller by 0.2% in EEMBC. Still
prefer to merge to Linaro 4.5.
ocaml is an interesting language, but not easy to learn and read in vim.
* Some discussion on Linaro development process.
* My regrename improvement patch (re. LP:633243). Communicate with
Eric Botcazou back and forth, but current patch is still too
target-dependent to him, as a Middle-End maintainer. Still need some
improvements.
* Build FSF GCC trunk. CLoog requirement in configure is wrong, revert
configure to previous version, and then pass the version checking during
gcc configure.
Hi there. Could everyone in the toolchain working group start sending
their activity reports to this list please? Put [ACTIVITY] at the
start of the subject line so that they can be filtered.
Ta,
-- Michael
Hi there. Attached are the status reports from the Toolchain WG
members for last week.
-- Michael
Ken Werner --
Hi Michael,
* got access to the internal wiki/calendar/email :)
* continued to setup the borrowed vexpress board
* upgraded to the Linaro 10.11 release
* encountered various issues until I found that the /etc/hosts is empty
(#674090)
* learned that the SD card issue is a known problem (#632798)
* the network interface sometimes dies if stressed (Matt was able to
reproduce this)
* the disabled CONFIG_SWAP is being tracked as #672656
* sometimes the entire system hangs (when under heavy load?)
* David noticed that /proc/cpuinfo lacks neon support (but his string
benchmark/testcase ran fine)
* wondering why the kernel reports only about 800 BogoMIPS while it's around
2k on the panda board
* started to work on the atomic memory operations item
* identified the relevant GCC patches
* still looking for a good way to verify the GCC support
* posted a patch on the glibc-ports ml with regard to #643171
David A Gilbert --
I managed to get to try Ken Werner's Versatile Express board with an A9MP
tile; the shape
of the graphs matches that from the Panda, but the raw performance is down
by a factor of about
3 - I'm guessing it's clocked lower for some reason.
It confirms however that the Neon behaviour I was seeing with memset is
not Panda/OMAP4 specific;
no one has replied to my post to linaro-toolchain. It's a difficult
situation in that my fastest memset on
Beagle is with Neon, and my fastest on v9 is without Neon - what would you
select on?
I've just finished writing memchr tests and my first crack at a faster
version; I realised I could use the same
trick that I had used for strlen and it works nicely - it seems to be about
50% faster than the libc version;
I've not tested against any other versions yet.
Paul Mckenney hasn't replied yet about the OSSC stuff, but apparently
he's out travelling and back next
week; so I'll catch him then.
I tried preloading my faster memset into ghostscript, but found it was
blatantly ignoring it - I think the memset
is being called from somewhere inside libc; I managed to get xdeb to cross
build me a libc but haven't yet got my
changes into it.
My order for a USB hard drive for my beagle seems to have been delayed by
the supplier; I'm pushing this but
it's starting to be a bit of a pain.
Richard Sandiford --
== Last Week ==
* Pinged my GAS fix for Thumb PLT branches to locally-defined symbols.
Committed it to binutils trunk and 2.21 branch after approval. This
fixes the libgcc.so build failure that I was seeing with GOLD.
* Worked on a patch to fix GOLD's handling of non-function references
to weak undefined symbols. This ended up touching every backend
(i386, x86_64, ARM, Power and SPARC) and was quite invasive, so it
took a while in the end. Committed to binutils trunk after approval.
* Ran more tests, both with -marm and -mthumb. I'm getting identical
GCC test results (including gfortran and objc) for GOLD and BFD ld, so
I think we're at the stage where GOLD is a viable replacement for the
BFD linker.
== Next Week ==
* I'll start looking at the IFUNC support.
* I'll take another look at launchpad bug 665598.
Peter Maydell --
Progress:
* qemu: more cleanup of signal handler VFP patchset;
I think I just need to add iwmmx support and it's good
* qemu: VCVT: found yet another bug, did final patchset
cleanup: submitted to upstream list [8 patch series]
* qemu: submitted a trivial patch to fix a problem where
__get_user/_put_user macros had an unnecessary local var
which could clash with a var being used by the macro user
* set up a tree on git.linaro.org which we can use for
a branch to make pull requests for ARM qemu fixes
* did a rough estimate of time to do an Eagle qemu model
(6 months + testing/bug fixing time)
Issues:
* lost some time to a problem where Linux VMs stopped being
able to talk to the LDAP server; however I have a workaround
and IT are investigating
Meetings:
* toolchain, toolchain standup, pdsw-tools, PD doughnuts
Plans
- attend Meego conference in Dublin (Nov 15-18 inc travel)
http://conference2010.meego.com/
- start on qemu consolidation by upstreaming various ARMv7
correctness fixes
Ira Rosen --
Here is this week report:
1. BeagleBoard installed, now "playing" with it
2. Continued to work on auto-detection of vector size
3. Looked into mixed vector sizes
4. Learning about vld and vst instructions
It looks like I won't be able to participate in Wed calls, since I am alone
with the kids on Wednesday evenings.
Hi all,
I've hit a probable assembler bug trying to build a Thumb-2 kernel:
Trying to assemble the attached file, I get:
arch/arm/kernel/relocate_kernel.S: Assembler messages:
arch/arm/kernel/relocate_kernel.S:10: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
arch/arm/kernel/relocate_kernel.S:11: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
arch/arm/kernel/relocate_kernel.S:58: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
arch/arm/kernel/relocate_kernel.S:59: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
The code appears correct and resonable, except that there should be a
.align directive before the data words at the end of the file (but
adding this doesn't fix the error)
Assembling in ARM (i.e., without -mthumb), or deleting the .globl
lines associated with the affected target symbols, the problem goes
away.
I believe this may be already by tracked by CodeSourcery as is issue #8775 (?)
Has anyone hit this issue before? Is it fixed upstream?
Any help much appreciated.
Cheers
---Dave
Hi,
I've been looking at some basic libc routine optimisation and have a
curious problem with memset and wondered if
anyone can offer some insights.
Some graphs and links to code are on
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemset
I've written a simple memset in both a with and without Neon variety and
tested them on a Beagle(C4) and a Panda
board and I'm finding that the Neon version is faster than the non-neon
version (a bit) on the Beagle but a LOT slower on the
Panda - and I'd like to understand why it's slower than the non-neon version
- I'm guessing it's some form of cache interaction.
The graphs on that page are all generated by timing a loop that repeatedly
memsets the same area of memory; the X axis
is the size of the memset. Prior to the test loop the area is read into
cache (I came to the conclusion the A8 didn't write
allocate?). There are two variants of the graphs - absolute in MB/s on Y,
and a relative set (below the absolute) that
are relative to the performance of the libc routines. (The ones below those
pairs are just older versions).
if you look at the top left graph on that page you can see that on the
Beagle (left) my Neon routine beats my Thumb routine
a bit (both beating libc). If you look on the top right you see the Panda
performance with my Thumb code being the fastest and generally
following libc, but the Neon code (red line) topping out at about 2.5GB/s
which is substantially below the peak of the libc and ARM code.
The core loop of the Neon code (see the bzr link for the full thing) is:
4:
subs r4,r4,#32
vst2.8 {d0,d1,d2,d3}, [ r3:256 ]!
bne 4b
while the core of the non-Neon version is:
4:
subs r4,r4,#16
stmia r3!,{r1,r5,r6,r7}
bne 4b
I've also tried vst1 and vstm in the neon loop and it still won't match the
non-Neon version.
All suggestions welcome, plus I'd appreciate if anyone can suggest which
particular limit it's hitting - does
anyone have figures for the theoretical bus and L1 and L2 write bandwidths
for a Panda (and Beagle) ?
Thanks in advance,
Dave
Hi there. I've uploaded a draft of the slides and notes for next
weeks public review at:
http://bazaar.launchpad.net/~linaro-toolchain-wg/+junk/publicreview1105/fil…
'Toolchain Public Review 11.05.odp' is a set of slides I'll talk to.
The first 15-20 minutes will go through these to describe our focus
and goals and how they tie together the blueprints and priorities.
The rest of the session will go through the current blueprints and
priorities. See:
Toolchain Blueprints (short).pdf
for the summary version and:
Toolchain Blueprints (long).pdf
for the long version. The long version is interesting if you can't
find a particular tool or technology. It may be small enough to be
called out as a single work item.
These are only a draft, but I realised I haven't shared the plans with
the rest of the group very well and Monday's meeting won't be the
best.
I'm on holiday tomorrow but feel free to send me any comments,
-- Michael
Hi,
I started to look into mixed vector sizes (in the same loop). My main reason
for this was to allow widening and narrowing instructions, that have
different vector sizes for src and dest, to work properly. My example was
widen_mult (int = short * short), I thought its implementation was not
optimal. But now that I have a working GCC mainline for ARM, I see that it
works just fine.
short ub[], uc[];
int c[];
for (i = 0; i < n; i++)
c[i] = ub[i] * ua[i];
is compiled as:
.L11:
add r1, r1, #1
vldmia r4!, {d18-d19}
cmp r5, r1
vldmia ip!, {d16-d17}
vmull.s16 q10, d18, d16
vstr d20, [r3, #-32]
vstr d21, [r3, #-24]
vmull.s16 q8, d19, d17
vstr d16, [r3, #-16]
vstr d17, [r3, #-8]
add r3, r3, #32
bhi .L11
which looks good to me at least from the vmull point of view.
Does anyone have an example when mixed vector size instructions are not used
properly?
Another reason for mixed sizes could be cases where only part of the loop
can be vectorized with the wider vectors. I don't know how common this is.
Are there any other reasons to implement mixed vector sizes? I understand
that this can be a useful feature, I am just not sure it's the most
important one.
Thanks,
Ira
I've been going through the ChangeLog for the release and am having
trouble justifying some of the changes brought in. In particular:
* -fstrict-volatile-bitfields, which is more appropriate for bare
metal/kernel code
* Cortex-M4 support
* C locale support in libstdc++-v3
The march/mcpu clean up is OK but marginal.
Our focus is time based performance on the Cortex-A series with an
implied applications over kernel/bare metal. This is a very narrow
view, but every non-performance line of code we bring in can also
bring in a bug.
Any thoughts? For those who are looking at using our toolchain, is
earlier access to other toolchain improvements interesting?
-- Michael
Hi all,
As you may or may not know, upstream GCC has now entered 'stage 3' of
it's development cycle. This will last until spring.
This means that they are only accepting bug fixes and documentation
improvements. New features and any performance improvements must wait
until GCC 4.6 branches, prior to release, and GCC 4.7 development opens.
During this process, our usual preferred work flow (upstream first) will
not work, so we'll have to do something else.
Here's my proposal:
* Create a new Launchpad branch for GCC 4.6.
* Synchronize this branch with upstream regularly
* once per week, perhaps.
* Try to get upstream approval for all new patches in the usual way
* on the understanding that they won't be applied until stage 1
* bug fixes are unaffected and may commit as usual.
* Commit all pending patches to our own 4.6 branch
* and backport them to our 4.5, branch, of course.
* Usual "no test regressions" policy applies to our own patches
* but beware regressions from merges from upstream.
* we may want to track the clean 4.6 test results for comparison
This is little different to what we do with the 4.5 release branch now.
Thoughts?
Andrew
The Linaro Toolchain Working Group is pleased to announce the latest
release of Linaro GCC 4.5.
Linaro GCC 4.5 is the fourth release in the 4.5 series. Based off the
latest GCC 4.5.1+svn164911, it includes many ARM-focused performance
improvements and bug fixes.
Interesting changes include:
* Various NEON related fixes
* Performance improvements
* A clean up of some of the testsuite test cases
* An updated version of the __sync multicore primitives
* Improvements in data packing when optimising for size
* C locale support in libstdc++-v3
This release adds the new option -fstrict-volatile-bitfields and
enables it by default on ARM. See doc/invoke.texi for more
information.
The source tarball is available from:
https://launchpad.net/gcc-linaro/+milestone/4.5-2010.11-0
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Note that there were no changes to the 4.4 series.
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.
Linaro GDB 7.2 2010.11-0 is the second release in the 7.2 series.
Based off the latest GDB 7.2, it includes a number of ARM-focused bug
fixes and enhancements.
This release concentrates on the GDB test suite and tidies up a number
of failures.
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.2-2010.11-0
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
-- Michael
Hi,
It looks like it's enough to implement targetm.vectorize.
autovectorize_vector_sizes for NEON in order to enable initial
auto-detection of vector size. With the attached patch and
-mvectorize-with-neon-quad flag, the vectorizer first tries to vectorize
for 128 bit, and if this fails, it tries to vectorize for 64 bit. For
example, in the attached testcase number of iterations is too small for 128
bit (first 2 iterations have to be peeled in order to align the array
accesses), but is sufficient for 64 bit (the accesses are aligned here).
I'd appreciate your comments on the patch, and I also have a few questions:
1. Why the default vector size is 64?
2. Where is the place of NEON vectorization tests? I found NEON tests with
intrinsics at gcc.target/arm, is that the right place?
3. According to gcc.dg/vect/vect.exp the only flag that is used for NEON
(in addition to target independent flags) is -ffast-math. Is that enough?
Thanks,
Ira
ChangeLog:
* config/arm/arm.c (arm_autovectorize_vector_sizes): New
function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.
Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c (revision 166032)
+++ config/arm/arm.c (working copy)
@@ -246,6 +246,7 @@ static bool arm_builtin_support_vector_misalignmen
const_tree type,
int misalignment,
bool is_packed);
+static unsigned int arm_autovectorize_vector_sizes (void);
/* Table of machine attributes. */
@@ -391,6 +392,9 @@ static const struct default_options arm_option_opt
#define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p
#undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
#define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
+ arm_autovectorize_vector_sizes
#undef TARGET_MACHINE_DEPENDENT_REORG
#define TARGET_MACHINE_DEPENDENT_REORG arm_reorg
@@ -23223,6 +23227,12 @@ arm_expand_sync (enum machine_mode mode,
}
}
+static unsigned int
+arm_autovectorize_vector_sizes (void)
+{
+ return TARGET_NEON_VECTORIZE_QUAD ? 16 | 8 : 0;
+}
+
static bool
arm_vector_alignment_reachable (const_tree type, bool is_packed)
{
test:
#define N 5
unsigned int ub[N+2] = {1,1,6,39,12,18,14};
unsigned int uc[N+2] = {2,3,4,11,6,7,1};
void main1 ()
{
int i;
unsigned int udiff = 2;
unsigned int umax = 10;
for (i = 0; i < N; i++)
{
/* Summation. */
udiff += (ub[i+2] - uc[i]);
/* Maximum. */
umax = umax < uc[i+2] ? uc[i+2] : umax;
}
}
Hi there. Just a reminder that today's call is the first at the new
time of 0900 UTC which is 9am in the UK, 10am in Germany, 11am in
Israel, and 5pm in China.
I've updated the meetings page at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings
with the new details.
-- Michael
Hi,
I am backporint some patches from FSF mainline, which may improve Linaro
4.5 gcc on thumb2 speed.
The first one is done by Richard E. "Improve optimization to transform
TST into LSLS"
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html
After it applied to Linaro 4.5 tree, EEMBC speed number downgrades,
while code size is reduced to some extent. The code difference is like
this,
6801 ldr r1, [r0, #0]
f831 3013 ldrh.w r3, [r1, r3, lsl #1]
-f413 6f00 tst.w r3, #2048 ; 0x800
-f43f af41 beq.w cc <t_run_test+0xcc>
+0518 lsls r0, r3, #20
+f57f af44 bpl.w cc <t_run_test+0xcc>
4610 mov r0, r2
After reading cortex-a8 TRM, I can't find exact timing cycles of lsls.
Under Chung-Lin's help, we feel that lsls should be slower than tst, but
don't have any evidence to prove. If any people is familiar with arm
microarch, help is welcome. If our assumption is correct, we may can
change this patch to an optimization specific to size only.
The second patch is Bernd's "Fix an if statement in arm_rtx_costs_1"
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg02096.html
After this patch applied, EEMBC benchmark number is not changed. Shall
we merge this patch to linaro 4.5 tree? I am inclined to merge it, but
if you have concerns on this patch, let us discuss here.
--
Yao Qi
CodeSourcery
yao(a)codesourcery.com
(650) 331-3385 x739
Hi there. I plan to change the Toolchain WG meetings due to daylight
savings and to better cover the US.
The Monday meeting will be at 0900 UTC which is 9 am in the UK, 10 am
in central Europe, and 5 pm in Beijing.
The standup calls will be merged into one at 1800 UTC on Wednesday
which is 6 pm in the UK, 7 pm in central Europe, and 1 pm on the US
East Coast. I don't expect China to call in as it's a quite
unreasonable time.
I'll update the invites and wiki page to reflect this. We'll start
the new times next week, so Monday the 8th will be the first meeting
at the new time.
-- Michael
The gaol and plan of investigation has been described in [1]
In the plan, this task is divided into three parts, 1) patch backport,
2) regression fix, and 3) exploration and study other ARM compilers.
This report follow the same manner.
1. Patch backport.
8 patches are listed in [1]. Backport them to Linaro 4.5 tree will
improve speed performance.
Action/Recommendation: Backport them if speed improves. These patches
are ones that I think they *should* improve speed, but "performance
surprise" is not impossible.
2. Regression fix.
So far (until r99399), Linaro GCC 4.5 is slower than FSF GCC 4.5.0 on
some EEMBC benchmarks. Performance regression is introduced by four
commits, r99324,r99330,r99369,r99380, see details in [2].
Action/Recommendation: Figure out why speed regression is introduced,
and try to fix it.
One cent here is that how to avoid speed regression. I do believe that
sometimes regression is unavoidable, but it is better if can track them,
and keep them manageable.
3. Exploration and study other ARM compilers.
In this part, I don't find any possible thumb-2 specific improvements.
However, loop optimization and instruction scheduling should be improved
on ARM. (This statement may be true to all ports, or even all compilers)
Some tickets are opened for this part,
LP:660644 Missed optimization opportunities
LP:662692 Inner loop in autcor00 can be optimized better
LP:656957 LP:645267 Improve code generation on switch statement
LP:663793 Tune Swing Modulo Scheduling or Selective Scheduling for ARM
LP:656373 Try -fsched-pressure for ARM
I have to admit that instruction scheduling is quite hard, but if we can
do something here, that will be great. I've put it in
"performance-insdie-gcc" session on UDS. Let us talk about it a little
there next week.
During this investigation, I also find LTO or "whole-program
optimization" is useful to some EEMBC benchmarks. (I didn't run LTO/WPO
at all, but I got this when read source of benchmarks)
[1] Plan of CS304: Thumb2 tuning investigation.
http://lists.linaro.org/pipermail/linaro-toolchain/2010-October/000300.html
[2] https://wiki.linaro.org/YaoQi/Sandbox/Thumb2SpeedOptimization
--
Yao Qi
CodeSourcery
yao(a)codesourcery.com
(650) 331-3385 x739
Hi there. I've updated the list of potential Summit sessions based on
yesterdays call. Could people please check the Sessions table on
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-10-18
and flesh out the agenda for sessions that have your name against them.
The agenda should be five to ten discussion points, preferably of
things that are not well understood and could use input from the
group.
There's a good discussion on what to expect at a Summit here:
http://oubiwann.blogspot.com/2010/10/q-the-ubuntu-developer-summits.html
You can check the already-approved sessions here:
http://summit.ubuntu.com/uds-n/
Feel free to join in to any other sessions you might find interesting.
There will be quite a few people with diverse backgrounds there,
including ~80 people from Linaro, ~400 from Ubuntu, ~200 from the
community, and ~200 remote. The overlap between Toolchain and Ubuntu
interests might not be great, so I'll make sure a common work room is
available for idle time.
-- Michael
Some of Linaro developers works with ARM devices older then ARMv7-a
architecture. Other people experiments with hard-float ABI. Each of them has
to rebuild toolchain for own use and that means playing with components to
have them build properly.
But it is no more - I made some patches and armel-cross-toolchain-base since
1.53 version + newer source packages for gcc-4.[45]-armel-cross have support
for "debian/flavour" file which allows to set some flags related to toolchain
build.
So far supported things are:
- ARM architecture
- float ABI
- FPU mode
- Thumb mode
This feature is not merged into regular Ubuntu packages yet as this is work in
progress which needs to be cleaned first.
http://people.linaro.org/~hrw/armel-cross-toolchain/ has all source packages
needed.
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
Some of Linaro developers works with ARM devices older then ARMv7-a
architecture. Other people experiments with hard-float ABI. Each of them has
to rebuild toolchain for own use and that means playing with components to
have them build properly.
But it is no more - I made some patches and armel-cross-toolchain-base since
1.53 version + newer source packages for gcc-4.[45]-armel-cross have support
for "debian/flavour" file which allows to set some flags related to toolchain
build.
So far supported things are:
- ARM architecture
- float ABI
- FPU mode
- Thumb mode
This feature is not merged into regular Ubuntu packages yet as this is work in
progress which needs to be cleaned first.
http://people.linaro.org/~hrw/armel-cross-toolchain/ has all source packages
needed.
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
I meant to send this to the "external" Linaro toolchain mailing list,
not the internal CS one. Apologies to those who receive it twice!
In a follow-up message, Joseph Myers pointed out a post he'd written
previously on the same subject:
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html
In further followups (at the risk of misrepresenting Joseph & Paul
Brook's opinions!), there seemed to be general agreement that a scheme
something like that outlined below, with "permuting" loads/stores and
some way of handling multiple in-register layouts for vectors seems
like it will be a necessary addition to the vectorizer, going forward.
Julian
Begin forwarded message:
Date: Thu, 7 Oct 2010 16:45:17 +0100
From: Julian Brown <julian(a)codesourcery.com>
To: Ira Rosen <IRAR(a)il.ibm.com>
Cc: Tejas Belagod <Tejas.Belagod(a)arm.com>, Linaro List
<gnu-linaro-tools(a)codesourcery.com> Subject: [gnu-linaro-tools] NEON
vectorization: use of specialized load/store instructions
Hi,
We're having some system issues, so I thought I'd take the chance to
write down some things I've been thinking about re: utilising the NEON
load/store instructions more effectively. I've also attempted to
summarize the problems with big-endian mode. All unverified as of yet,
so please take with a pinch of salt :-). Comments appreciated. It's
been a while since I last thought about some of this stuff...
Cheers,
Julian
Use of specialized load instructions
====================================
To provide good support for NEON's element and structure load/store
instructions, GCC lacks support for a couple of key features:
1. A good way of representing a set of two, three or four vector
registers (either D- or Q-sized), possibly with non-unit stride.
2. A generalised mapping between memory locations and lane numbers.
To start with point 1: currently the element and structure load/store
instructions are only supported via intrinsics. These are specified to
load and store as if going via an array embedded in a union, i.e.:
typedef struct int8x8x2_t
{
int8x8_t val[2];
} int8x8x2_t;
__extension__ static __inline int8x8x2_t __attribute__
((__always_inline__)) vld2_s8 (const int8_t * __a)
{
union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv;
__rv.__o = __builtin_neon_vld2v8qi ((const __builtin_neon_qi *) __a);
return __rv.__i;
}
Even for a trivial test program, e.g.:
#include <arm_neon.h>
int foo (int8_t *x)
{
int8x8x2_t result = vld2_s8 (x);
return vget_lane_s8 (result.val[0], 1);
}
We will generate code like so:
sub sp, sp, #32
vld2.8 {d16-d17}, [r0]
mov r3, sp
vstmia sp, {d16-d17}
add ip, sp, #16
ldmia r3, {r0, r1, r2, r3}
stmia ip, {r0, r1, r2, r3}
fldd d16, [sp, #16]
vmov.s8 r0, d16[1]
add sp, sp, #32
bx lr
I.e., rather than being used directly, the registers loaded by vld2
will always be spilled to the stack then reloaded. This obviously
reduces the usefulness of these intrinsics by a large factor. With some
planning, it'd be good to find a powerful enough solution to this
problem so that the same representation for multiple registers can be
used by the autovectorizer as well as the intrinsic-handling code.
(One difficulty is that the "foo.val[X]" interface should still be
available to user code. There's probably no need for "val" to literally
be an array, though other representations would require front-end
changes).
Assuming it's hard for the register allocator to deal with
highly-constrained situations like requiring four consecutive
registers, one (ugly) possibility might be to run a pass before
register allocation, looking for "big" multi-register vectors and
pre-allocating them to hard registers. Even using a fixed allocation of
a single set of registers (e.g. make it so that all multi-reg
loads/stores larger than a Q register must use d0-d7, or whatever)
would probably give better code than what we produce at present, in
most cases.
Now, point 2. To start with, an aside: AIUI, there is currently an
assumption in the vectoriser code that increasing element numbers in
vector registers correspond to increasing addresses when those
registers are loaded from and stored to memory (as if the vector was a
short array, or alternatively as if a union of the vector register and
an array of element-types had the same numberings for lanes and array
indices corresponding to the same elements). Unfortunately that is only
true for NEON in little-endian mode: in big-endian mode, the story is
more complicated, for reasons I will try to explain.
To remain compliant with the soft-float variant of the ARM EABI, we
must pass vector register arguments in ARM registers (or the stack),
not vector registers. This means that we must be very careful with the
ordering of elements for values passed to functions. Consider the
trivial function:
int __attribute__((noinline)) qux (int16x8_t x)
{
x = vaddq_s16 (x, x);
return vgetq_lane_s16 (x, 1);
}
This is compiled by GCC to the following (slightly unimpressively):
vmov d18, r1, r0 @ v8hi
vmov d19, r3, r2
vmov d20, r1, r0 @ v8hi
vmov d21, r3, r2
vadd.i16 q8, q9, q10
vmov.s16 r0, d16[1]
bx lr
Which may then be called like, e.g.:
ldmia sp, {r0-r3}
blx qux
So: notice that we're careful that when vector values are transferred
from NEON registers to core registers, the same result will be
transferred to/from memory when we use ldm/stm (core registers) or
vldm/vstm (vector registers) -- i.e. we might use "vldm rX, {d18-d19}",
storing d18 and d19 in consecutive increasing addresses, or "ldmia rX,
{r0-r3}", again with consecutive registers in increasing memory
locations, and we get the same outcome. The fact that we can use the
multiple-register loads/stores is also important for spilling/reloading
between vector and core registers, which inevitably happens
occasionally.
Notice also that when we call the above function like so:
typedef union {
int16x8_t quadvec;
int16_t half[8];
} u;
int foo (int8_t *x)
{
u bar;
int i;
for (i = 0; i < 8; i++)
bar.half[i] = i;
qux (bar.quadvec);
}
The value returned from "qux" is NOT 2 (1+1), as it would be if we were
accessing the value at index 1 in the superimposed array in the union
"u". The vgetq_lane_s16 call still interprets the array as if it had
been loaded in little-endian element order. But we don't get the result
we would have if the vector had been interpreted in purely big-endian
order either (i.e. 12, 6+6)! In fact from the perspective of the
element numbering used by vgetq_lane_s16, the vector elements we see
for each of the (equal) operands of the "vadd" instruction in the qux
function are:
equiv. core register
lane number (at function entry) value
----------- -------------------- -----
[0] high part of r1 3
[1] low part of r1 2
[2] high part of r0 1
[3] low part of r0 0
[4] high part of r3 7
[5] low part of r3 6
[6] high part of r2 5
[7] low part of r2 4
So the value returned will be 2+2, 4.
Now, coming back to the vectorizer. Current practice means that
increasing element numbers should correspond to increasing memory
locations: i.e., that "array ordering" is in effect, just as in the
call to vgetq_lane_s16 in the above example. This leads to an anomaly:
it means that when the vectorizer asks for a particular element, it
will generally get a different one. Most of the time we get away with
this, since the vectorizer mostly deals with "opaque" vectors which are
operated on element-wise: i.e. we only deal with data at the
granularity of whole vectors, so it doesn't matter which order the
elements are in. The ARM implementations of reduction operations
fortuitously calculate the results across all elements simultaneously,
so when one of those elements is extracted, we still get the right
answer.
One notable exception to this though is the movmisalign<mode> patterns:
these are implemented using the vld1 and vst1 instructions, which load
elements in "array" order (increasing elements from increasing memory
locations), even in big-endian mode. Since vectors loaded using those
instructions are "incompatible" with the above scheme, such misaligned
accesses are simply disabled in big-endian mode.
Of course, generally, sticking with the current non-solution in
big-endian mode is not sustainable (and is probably already broken in
various cases). So it might be worth thinking about whether supporting
big-endian mode properly, as well as handling the more complex load and
store element/structure instructions, can be done using some
generalised solution.
I'm thinking (without having much idea about how feasible such an idea
is) of something along the lines of a function (in the mathematical
sense) attached to each vector value manipulated by the vectorizer, to
map that value's element numberings to and from memory offsets. So then
the quad-word vector of 16-bit elements discussed above would look
like, in big-endian mode:
foo, {6, 4, 2, 0, 14, 12, 10, 8}
Whereas in little-endian mode (or in big-endian mode, for vectors
loaded using vld1), it would look like:
foo, {0, 2, 4, 6, 8, 10, 12, 14}
And then, perhaps more interestingly, a vector loaded using e.g. a
"multiple 3-element structures" load,
vld3.16 {d1, d2, d3}, [rN]
Might look like (in either endianness, assuming we can represent a
vector of such size in our hypothetical scheme):
foo, {0, 6, 12, 18, 2, 8, 14, 20, 4, 10, 16, 22}
Though it's not clear that such a scheme would be powerful enough to
represent the whole range of element/structure loads/stores available
(you'd probably need to be able to specify skipped or don't-care
elements to do that, at least).
First of all, the goal of this work is about investigation on speed
improvement on linaro gcc 4.5. Finally, the output/result of this work
is to list all possible recommendations/actions to improve speed on
linaro 4.5. Comments to this plan are welcome.
So far, we can improve speed in three ways,
1. Backport patches from FSF GCC 4.6. Note that we don't want to
backport the whole 4.6.
2. Benchmark with FSF GCC 4.5.0. Fix performance regressions if there
are on linaro gcc 4.5. Output is the reason of performance regression,
or even further, give recommendations on how to fix it.
3. Study the code generated by other ARM compilers, and give
recommendations on how to improve GCC to do better job.
I'll describe these three ways in details in the following sections,
- Backport patches from FSF GCC 4.6
I went through gcc-patches archive, and select several patches that are
helpful to code improvements.
1 ifcvt optimization. Target independent.
http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00832.html
2 redundant register move for sign extending. Thumb2.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43137
3. PR 45335 Use ldrd and strd to access two consecutive words.
Not yet approved.
http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00059.html
4. Fix an if statement in arm_rtx_costs_1.
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg02096.html
5. Reduce code duplication for Thumb2 move patterns
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00624.html
6. ARM ldm/stm peepholes
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00512.html
7. PR44999 Replace "and r0, r0, #255" with uxtb in thumb2
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01700.html
8. Improve optimization to transform TST into LSLS
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html
9. Fix bswap patterns for ARM / Thumb and Thumb2.
http://gcc.gnu.org/ml/gcc-patches/2010-01/msg01238.html
- Fix speed regression
I found speed regression on EEMBC on linaro 4.5, compared with FSF GCC
4.5.0, and I'll investigate why speed regression happens on these cases.
Here is a table below about speed regression compared between FSF GCC
4.5.0 and Linaro GCC 4.5 (revno:99398)
O2 O3
puwmod01, -5.5 -3.5
bitmnp01, -7.9 -0.7
routelookup, -6.4 -8.2
conven00data_1, -7.2 -5.8
conven00data_2, -8.1 -7.3
conven00data_3, -6.6 -5.5
viterb00data_1, -1.7 +5.9
viterb00data_2, -4.3 +2.6
viterb00data_3, -2.3 +1.8
viterb00data_4, -5.3 -0.3
- Study the code generated by other ARM compilers.
In this part, I'll study the binary generated by other ARM compilers,
and try to teach GCC smart enough to do the same thing. This piece of
work is quite open, and hard to estimate how much output we could get.
--
Yao Qi
CodeSourcery
yao(a)codesourcery.com
(650) 331-3385 x739
People here might want to have a look at this bug:
http://gcc.gnu.org/bugzilla/show_bg.cgi?id=45979
Note that the heap randomization feature added to the kernel was part of
a Linaro security blueprint.
Nicolas
The Linaro Toolchain Working Group is pleased to announce the 2010.10
consolidation release including Linaro GCC 4.4, Linaro GCC 4.5, and
the first version of Linaro GDB 7.2.
Linaro GDB 7.2 2010.10-0 is the first release in the 7.2 series. Based
off the latest GDB 7.2, it includes a number of ARM-focused bug fixes
and enhancements.
Interesting changes include:
* Backtraces in Thumb-2 code are significantly improved
* Much better prologue and epilogue parsing
* Improved software watchpoint support
* Many test suite tidy-ups
Linaro GCC 4.5 is the third release in the 4.5 series. Based off the
latest GCC 4.5.1+svn, it includes many ARM-focused performance
improvements and bug fixes.
Linaro GCC 4.4 is the fourth release in the 4.4 series. Based off the
latest GCC 4.4.5, it fixes many of the issues found during building
Ubuntu over the last few months.
Interesting changes include:
* Linaro GCC 4.4 is now based off FSF GCC 4.4.5
* Cortex A8 and Cortex A9 scheduler NEON improvements
* Better code generation for constant addresses with inline assembly
* Better code for copying small constant strings
* Various correctness improvements
Downloads are available from the Linaro GCC and GDB pages on Launchpad:
https://launchpad.net/gcc-linarohttps://launchpad.net/gdb-linaro
-- Michael
Hi all,
I was wondering someone knows about a ARM DCC (debug
communications channel) device driver.
The idea is to run gdbserver on /dev/dcc such that application
debugging does not hog a serial/ethernet port.
I'd modify OpenOCD to forward the DCC onto a TCP/IP port
to connect GDB to the gdbserver.
--
Øyvind Harboe
US toll free 1-866-980-3434 / International +47 51 63 25 00
http://www.zylin.com/zy1000.html
ARM7 ARM9 ARM11 XScale Cortex
JTAG debugger and flash programmer
(cc'ed to linaro-toolchain, bcc'ed to others who may be interested)
I'm considering adding a new Linaro Toolchain meeting to cover people
in the North/South American timezones. We've got quite a few people
in that area who are interested in the toolchain but can't make the
current 0900 UTC calls.
How about a weekly half-hour call on Wednesdays at 1800 UTC? Once
daylight savings drops out on the 7th of November, this would be 1000
Sacramento/PST, 1200 Houston/CST, and a reasonable evening time for
those in Europe who wish to join.
This will be a technical call and can cover topics such as status
updates, release plans, reported problems, and any input from
toolchain users.
Please send me an email if you are interested,
-- Michael
Hi Marcin. Would you consider passing
--enable-poison-system-directories to the cross compiler configure?
This makes the '-Wpoison-system-directories' option available which
warns you if the cross compiler picks up a library or header file from
/usr instead of the cross-build environment.
I'm talking with someone who's looking at using the Linaro compiler
and had a strange error due to picking up the host crtn.o. Having
this warning would of tracked down the problem faster.
-- Michael
I believe that the libgcc.a in our toolchain contains Thumb-2 code. I
verified this by doing objdump on libgcc.a and I see combinations of
16 and 32 bit instructions. So does that mean that the toolchain is
only usable for ARM versions that support Thumb-2?
Thanks,
John
As discussed in the meeting yesterday, CodeSourcery has a few MinGW
patches that I had not merged into Linaro GCC.
I have now investigated these patches, and I'm fairly happy that most
are not necessary for Linaro. They're mainly about interworking with Cygwin.
The one exception is this one:
http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01214.html
(and even that is primarily a GDB issue).
Andrew
I made a patch for ltrace that adds support for Thumb-2. There's not
much to it, but it allows me to trace applications built for Cortex-A8.
Without it, users will experience this bug:
https://bugs.launchpad.net/ubuntu/+source/ltrace/+bug/639796
Unfortunately, it appears that the upstream tree is not well-maintained.
I posted it to the mailing list for the project, but others' patches
have been ignored for many months. However, my post precipitated another
contributor to offer to maintain the package.
I also posted this patch as the proposed solution for the above LP bug,
which should allow Linaro to benefit from the work without worrying
about upstream. In fact, a new version of the package appears to have
been released that includes my patch (0.5.3-2ubuntu6). Please give this
updated package a whirl and let me know if there is more work to be done.
Thoughts? Unless I hear feedback from others, I will assume that this
tool now works for Cortex-A[89] and move on to other tasks.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
(this is for current Toolchain WG members. Sorry if I got anyone
else's hopes up)
We'll soon be coming into some decent dual-core Cortex-A9 boards that
have 1 GB of RAM and a good set of USB ports. I've asked for four of
them with hard drives to go into the data centre for general use.
Would anyone also like one for their desk? Note that you're generally
better off using a data centre board as it's one less thing to
maintain.
-- Michael
Hi
I finally built armel cross compiler packages for Ubuntu 10.04 'Lucid' LTS.
They are available in unsigned APT repository:
deb http://people.canonical.com/~hrw/ubuntu-lucid-armel-cross-compilers/ ./
They are built from Maverick packages:
- binutils-source
- eglibc-source
- gcc-4.4-source
- gcc-4.5-source
- linux-source-2.6.35
- armel-cross-toolchain-base
- gcc-4.4-armel-cross
- gcc-4.5-armel-cross
So they do not give exactly same versions as compilers used in 10.04 - please
remember about it while doing cross builds.
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
Hi folks
apparently some tool calls "strip" instead of "$triplet-strip" when
cross-building; this is something we shall fix, but it is apparently
corrupting the binaries in some cases:
https://bugs.launchpad.net/ubuntu/+source/binutils/+bug/615765
It seems the ELF architecture isn't set properly, or so I'm told.
Which component is to blame here? Are we looking at a binutils or a
gcc bug for not being able to set or read enough data that the
architecture mismatch isn't detected? What could we do about it?
Thanks!
--
Loïc Minier
The Linaro Toolchain Working Group is pleased to announce
the availability of a "developer preview" of Valgrind
which includes the support for ARM and Thumb which has
recently been added by the Valgrind developers.
Our aim with this preview release is to advertise
Valgrind's improved ARM support and encourage people
to try it out and find bugs before the official 3.6.0
release. Please report bugs via upstream's BTS:
http://valgrind.org/support/bug_reports.html
or you can ask on linaro-toolchain(a)lists.linaro.org
if you have any problems.
This release is a snapshot of upstream subversion; it
should generally work but you may encounter bugs, especially
if you run it on hand-optimised assembly that uses obscure
instructions.
New (upstream) features in this snapshot include:
* Greatly improved support for ARM
* Support for the Thumb instruction set
* Support for NEON and VFPv3 instructions
Known issues:
* callgrind has difficulty identifying ARM function
call and return so may not produce useful results
Downloads are available from the Linaro Overlay PPA:
https://launchpad.net/~linaro-maintainers/+archive/overlay
...so if you're running Linaro on an ARM system you
should be able to just install it with
'apt-get install valgrind'.
-- Peter Maydell
To All Ye Linaro Toolchain Folk, (and OpenOCD developers too)
After a week of reading specifications and code, I am ready to start
doing some serious hacking on OpenOCD. The following outlines my present
plans and expectations, with the caveat that time can change everything.
Last week, I started testing my BeagleBoard with OpenOCD, so I have
begun trying to validate and improve the Cortex-A8 support. Indeed, I
have already committed a minor patch that fixed a bug in the trunk
caused by new command syntax required to distinguish physical memory
addresses from virtual ones. That bug had been preventing the BeagleBoard
support from working for several months, so this seems to show that
nobody has been using (or even testing) the latest code with that board.
It seems that much of the debug architecture can be shared between these
two cores, so features added and bugs fixed for A8 should help me
implement A9 faster. Indeed, A9 support may be more a matter of
refactoring the existing code than developing new code. In this respect,
the lists of tasks for A8 and A9 may end up proceeding in parallel.
Cortex-A8:
1) Add missing topology detection for determining location of AHB-AP
(for system memory access), APB-AP (for DAP and other CoreSight
components), and register address range for accessing the DAP.
2) Fix Halt After Reset functionality (using vector catch magic).
3) Expose missing VFP3/NEON registers (only when present).
4) Fix various memory and resource leaks.
Cortex-A9:
1) Basic bring-up to successful attachment with debugger.
2) Develop board scripts for common evaluation boards.
3) Work on advanced features:
- download and run algorithms out of memory,
- breakpoints/watchpoints,
- tracing and performance monitoring,
4) Ensure SMP support works out-of-the-box.
Finally, it would be good to produce a new release when all of these
changes have made it into the tree. Due to various factors, the project
has not achieved a regular release schedule, but these features would
help to justify the effort from the community.
P.S. I have cc'd the openocd-development list in the hope of generating
useful feedback, but it requires subscribing to post (last I checked).
Sorry for the bad netiquette.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
Hello,
I've now checked the Linaro branding changes in to the gdb-linaro Bazaar
repository.
I've created a Wiki page describing the Linaro GDB release process based on
that repository:
http://wiki.linaro.org/WorkingGroups/ToolChain/GDBReleaseProcess
(modeled after Andrew's GCCReleaseProcess page)
Review and comments are welcome!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
In case this is useful in its current (unfinished!) form: here are some
notes I made whilst looking at a couple of the items listed for CS308
here:
https://wiki.linaro.org/Internal/Contractors/CodeSourcery
Namely:
* automatic vector size selection (it's currently selected by command
line switch)
* also consider ARMv6 SIMD vectors (see CS309)
* mixed size vectors (using to most appropriate size in each case)
* ensure that all gcc vectorizer pattern names are implemented in the
machine description (those that can be).
I've not even started on looking at:
* loops with more than two basic blocks (caused by if statements
(anything else?))
* use of specialized load instructions
* Conversly, perhaps identify NEON capabilities not covered by GCC
patterns, and add them to gcc (e.g. vld2/vld3/vld4 insns)
* any other missed opportunities (identify common idioms and teach the
compiler to deal with them)
I'm not likely to have time to restart work on the vectorization study
for at least a couple of days, because of other CodeSourcery work. But
perhaps the attached will still be useful in the meantime.
Do you (Ira) have access to the ARM ISA docs detailing the NEON
instructions?
Cheers,
Julian
While trying out the u-boot-next branch I found a problem. First some
explanation. On most platforms, u-boot is linked to the address it
will first start running. For example when using NOR flash U-Boot
will be linked to an address in flash. Very early in the boot
process, U-Boot copies itself to the top and ram and jumps there.
This relocation has worked for years on powerpc and other arches. The
-next tree adds this for arm and it almost works.
The part that does not work is that some veneer routines do not get fixed up.
Here is an example. A routine called i2c_init calls __aeabi_idiv.
Here is the disassembly:
...
288: e59f0148 ldr r0, [pc, #328] ; 3d8 <i2c_init+0x1a4>
28c: e1a01083 lsl r1, r3, #1
290: ebfffffe bl 0 <__aeabi_idiv>
294: e2507006 subs r7, r0, #6
298: 4a000001 bmi 2a4 <i2c_init+0x70>
Later after this .o is linked with everything else and libgcc that morphs to:
8000b384: e59f0148 ldr r0, [pc, #328] ; 8000b4d4
<_end+0xfff97c98>
8000b388: e1a01083 lsl r1, r3, #1
8000b38c: eb00aa43 bl 80035ca0 <____aeabi_idiv_veneer>
8000b390: e2507006 subs r7, r0, #6
8000b394: 4a000001 bmi 8000b3a0 <i2c_init+0x70>
and the veneer version is at the end of text with other veneers:
80035ca0 <____aeabi_idiv_veneer>:
80035ca0: e51ff004 ldr pc, [pc, #-4] ; 80035ca4
<_end+0xfffc2468>
80035ca4: 80035999 .word 0x80035999
80035ca8 <____aeabi_llsl_veneer>:
80035ca8: e51ff004 ldr pc, [pc, #-4] ; 80035cac
<_end+0xfffc2470>
80035cac: 80035c7d .word 0x80035c7d
80035cb0 <____aeabi_lasr_veneer>:
80035cb0: e51ff004 ldr pc, [pc, #-4] ; 80035cb4
<_end+0xfffc2478>
80035cb4: 80035c61 .word 0x80035c61
80035cb8 <____aeabi_llsr_veneer>:
80035cb8: e51ff004 ldr pc, [pc, #-4] ; 80035cbc
<_end+0xfffc2480>
80035cbc: 80035c49 .word 0x80035c49
80035cc0 <____aeabi_uidivmod_veneer>:
80035cc0: e51ff004 ldr pc, [pc, #-4] ; 80035cc4
<_end+0xfffc2488>
80035cc4: 8003597d .word 0x8003597d
80035cc8 <____aeabi_uidiv_veneer>:
80035cc8: e51ff004 ldr pc, [pc, #-4] ; 80035ccc
<_end+0xfffc2490>
80035ccc: 80035721 .word 0x80035721
80035cd0 <____aeabi_idivmod_veneer>:
80035cd0: e51ff004 ldr pc, [pc, #-4] ; 80035cd4
<_end+0xfffc2498>
80035cd4: 80035c2d .word 0x80035c2d
then if we look at 80035998 we see some thumb code.
80035998 <__aeabi_idiv>:
80035998: 2900 cmp r1, #0
8003599a: f000 813e beq.w 80035c1a <.divsi3_nodiv0+0x27c>
When u-boot copies itself to ram it relocates the jump tables it knows
about and could relocate the addresses in the veneer routines if it
knew about them.
There are at least three possible ways to fix these:
1) u-boot has its own private libgcc and if I use it the problem goes away.
2) is there an option for the toolchain to use an arm libgcc instead of thumb?
3) is there a way to find the veneers at runtime and fix them up?
All input welcome.
Thanks,
John
Hello Michael,
I'm looking into "branding" changes needed for a Linaro GDB release. So
far I've made the following changes:
- Set default PKGVERSION to "Linaro GDB" instead of "GDB"
- Set default BUGURL to "http://bugs.launchpad.net/gdb-linaro/" instead of
"http://www.gnu.org/software/gdb/bugs/"
- Set version number according to Linaro version scheme
- Update release script to generate tarballs/directories named
"gdb-linaro-$VERSION" instead of "gdb-$VERSION".
As a result, the default GDB startup output now reads:
GNU gdb (Linaro GDB) 7.2-2010.10-0
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>.
Do you agree that this is the way we should go? Have I overlooked
anything?
Unless there are objections, I'm planning to check these changes in later
this week.
As a related question, the generated files in a standard GDB 7.2 release
seem to have been built on a relatively old system (RHEL 4 ?), which is
visible through the versions of tools like bison, flex, texinfo, and
gettext used to build those files. When building our Linaro GDB release
tarballs, should we:
- just use the tools as installed on a recent build system (say, Ubuntu
Lucid), or
- attempt to rebuild the release with the exact same set of tools used for
the GDB 7.2 release?
The second option has the advantage of reducing the amount of changes, e.g.
visible in a full diff of the release tarballs. However, it has the
disadvantage that reconstructing those exact set of tools (including Red
Hat patches, it seems) is somewhat difficult, and can in addition lead to
somewhat outdated results ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi all,
I was recently hired by CodeSourcery and have been assigned to Linaro
for the purpose of improving OpenOCD.
Specifically, I will be adding new support for Cortex-A9 SMP, though I
may also make a few improvements to its handling of Cortex-A8 in the
process. If you have experience using OpenOCD in these contexts, let me
know if you have any specific requests for features or fixes, and I will
try to fold them into my plans.
After this cross-posted introduction, I believe that most of my
correspondence will appear on the Toolchain mailing list, but I wanted
to make sure that everyone knows that they can find me there.
Cheers,
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.4 and Linaro GCC 4.5.
Linaro GCC 4.4 is the third release in the 4.4 series. Based off the
latest GCC 4.4.4, it pulls in the pre-4.4.5 changes made by the FSF
over the last six months.
Linaro GCC 4.5 is the second release in the 4.5 series. Based off the
latest GCC 4.5.1, it finishes the merge of many ARM-focused
performance improvements and bug fixes.
Interesting changes include:
* Improved performance on the Cortex-A9
* Backports of a range of performance improvements from mainline
* New inline versions of the GCC builtin sync primitives
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Also available is an early release of optimised string routines for
the Cortex-A series, including a mix of NEON and Thumb-2 versions of
memcpy(), memset(), strcpy(), strcmp(), and strlen(). For more
information see:
https://launchpad.net/cortex-strings
Pre-build packages are available in the Linaro Toolchain PPA at:
https://launchpad.net/~linaro-toolchain-dev/+archive/ppa
-- Michael
Hi,
We are looking for some possible improvements and optimizations on
thumb2 code size. Currently, I am running some benchmarks with
compilation flag "-Os -march=armv7-a -mthumb", and hope to find some
thing interesting that we can improve. Beside that, do you have some
ideas on this topic? or do you have some observations on thumb2 code
that we may probably improve the size?
Any thoughts on this are appreciated.
Yao
I think that it is easier to describe situation in email then on irc.
Currently there are 4 packages related to cross compilation support:
- armel-cross-toolchain-base (a-c-t-base in short)
- gcc-4.4-armel-cross
- gcc-4.5-armel-cross
- gcc-defaults-armel-cross
Each of them got into archive but they need to be updated to get installable
packages.
Status of each package:
1. a-c-t-base is at 1.47 in archive and was built from gcc-4.5-source
4.5.1-6ubuntu1 version. This package is used to bootstrap armel cross
toolchain and generates:
- binutils-arm-linux-gnueabi (from binutils-source)
- libc6(-dev,-dbg)-armel-cross (from eglibc-source)
- linux-libc-dev-armel-cross (from linux-source-2.6.35)
- gcc-4.5-arm-linux-gnueabi-base, libgcc1(-dbg)-armel-cross (from
gcc-4.5-source)
libgcc1* packages have /usr/share/doc/ directories as symlinks to
/usr/share/doc/gcc-4.5-arm-linux-gnueabi-base/
I have a version which does not provide gcc-4.5-arm-linux-gnueabi-base
package, libgcc(-dbg)-armel-cross depends on gcc-4.5-base and have
/usr/share/doc/ directories pointing into gcc-4.5-base one. Need to fix
this symlink by providing those files in libgcc1 package instead.
2. gcc-4.4-armel-cross is at 1.36 in archive and was built with gcc-4.4-source
4.4.4-14ubuntu4 version. This package provides compilers,
libstc++6-4.4-(dev,dbg,pic)-armel-cross, libmudflap0-4.4-dev-armel-cross
and gcc-4.4-arm-linux-gnueabi-base packages.
I have 1.38 version ready to upload which fixes #637454 #640298 bugs.
3. gcc-4.5-armel-cross is at 1.35 in archive and was built with gcc-4.5-source
4.5.1-7ubuntu1 version. This package provides compilers and runtime
libraries. But it does not provide libgcc1(-dbg)-armel-cross and
gcc-4.5-arm-linux-gnueabi-base because they are in a-c-t-base source
package. All resulting packages have /usr/share/doc/ directories pointing
into gcc-4.5-arm-linux-gnueabi-base one which is policy violation.
I have 1.37 version ready to upload which fixes #637454 #640298 bugs and
provides gcc-4.5-arm-linux-gnueabi-base package so policy violation is
removed.
4. gcc-defaults-armel-cross is at 1.3 in archive and does not require any
changes.
Main problem is that packages generated from gcc-4.5-source are split into two
packages: armel-cross-toolchain-base (libgcc1(-dbg)-armel-cross) and
gcc-4.5-armel-cross (all the rest). This was required to allow to bootstrap
cross compiler but gives problems when one is built with other version of
gcc-4.5-source then other - resulting packages are not installable (we have it
now in archive). It is also a thing which Matthias does not like and I
understand it. For now my only solution is to build both with one version of
gcc-4.5-source.
What are your opinions?
http://marcin.juszkiewicz.com.pl/download/ubuntu/ is download link for
mentioned versions.
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
xf. http://lists.linaro.org/pipermail/linaro-toolchain/2010-August/000069.html
> It is not upstreamable due to copyright issues, but we have a policy
> that we can keep such patches, if we wish.
I wrote this patch. If I am the copyright issue, then there is no issue.
I have a copyright assignment for all my GCC work to the FSF. That
assignment also covers the patch in the e-mail stored at
http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00199.html. I consider
copyright to all my patches assigned to the FSF if I have submitted
the patches to gcc-patches(a)gcc.gnu.org, or attached them to a Problem
Report in GCC bugzilla, or both.
The only reason why this patch for GIMPLE PRE is not in the FSF GCC
already, is that I just never cared enough to pursue it. GCC is just a
hobby for me, and experimenting with ideas is fun. Doing all the
required testing for inclusion in the FSF GCC is not fun and it costs
time that I usually can't find. I am just too busy with other things
to clear off this and other pending patches/ideas from my TODO list
:-)
If you wish to submit this patch for the FSF GCC, please feel free to
do so. In fact, I'd encourage you to do so. Likewise for my patch for
e.g. http://gcc.gnu.org/PR20070, and for the GIMPLE hoisting pass.
Ciao!
Steven
Hi,
about the status of binutils testsuite Thumb coverage (CS204 in the
workplan), I have filed two Launchpad bugs:
#640263: Testsuite coverage: Thumb-2 VFP/NEON encodings
https://bugs.launchpad.net/binutils-linaro/+bug/640263
#640272: Testsuite coverage: Thumb relocations
https://bugs.launchpad.net/binutils-linaro/+bug/640272
To summarize: I currently do not see any testing of Thumb-2 VFP/NEON
encodings; Thumb mode relocations are also only barely tested in the ld
testsuite.
Also, please inform if there are any other areas of binutils Thumb
testing that may be of concern to Linaro.
Thanks,
Chung-Lin
* Goal
Goal of this work is to look for thumb2 code size improvements on FSF
GCC trunk.
* Methodology
** Build FSF GCC trunk w/ and wo/ hardfp, run benchmarks including
eembc, spec2000, and dhrystone, and check asm code to see if there is
any possible improvements on size.
** Get input and suggestion from ARM experts.
** Search open PRs in GCC bugzilla.
* Results
Each item has been tracked on launchpad, and is listed with some elements,
** Cause: cause of this problem is known or unknown
** Difficulty: estimation of implementation difficulty
** Recommendation: Yao's recommendation on that bug for next step
1. LP:633233 Push/pop low register rather than high register when
keeping stack alignment
As Richard E. pointed out, it was implemented in gcc-4.5 on 2009, but
Yao still can see the usage of r8 on FSF GCC trunk.
Cause: Might be a regression if problem disappears on gcc-4.5.
Difficulty: Easy. might not hard to fix a regression.
Recommendations: Fix this regression if it is.
2. LP:633243 Improve regrename to make use of low registers.
Get input from Bernd S. and Julian B. Initial implementation has been
suggested by Bernd S.
Cause: current regrename in gcc treats high and low registers equally.
Difficulty: Medium.
Recommendation: Implement it as Bernd suggested, and do benchmarking
to see how much size is improved.
3. LP:634682 Redundant uxth/sxth insn are generated
Cause: Unknown
Difficulty: Unknown
Recommendation: No recommendation so far.
4. LP:634696 Function is not inlined properly with -Os
In consumer/cjpeg/jmemmgr.c, GCC inlined out_of_memory() with -Os, so
increase code size.
Cause: Unknown.
Difficulty: Unknown
Recommendation: Educate GCC to inline carefully when -Os is turned on.
5. GCC PR40730 LP:634731 Redundant memory load
6. LP:634738 inefficient code to extract least bits from an integer value
GCC PR40697 is for thumb-1. The same problem is in thumb-2.
Cause: Unknown.
Difficulty: Medium.
Recommendation: Fix it the similar way as fixing GCC PR40697.
7. LP:634891 Replace load/store by memcpy more aggressively
Difficulty: Should be easy.
Recommendation: Fix to this problem might be "reduce threshold value
once -Os is turned on".
8. LP:637220 allocate local variables with fewer instructions
GCC PR40657 is about this kind of problem, and was fixed. The similar
prolbme exits on gcc with hardfp.
Cause: Unknown.
Difficulty: Unknown.
Recommendation: No recommendation so far.
9. GCC PR 43721 Failure to optimize (a/b) and (a%b) into single
__aeabi_idivmod call
Difficulty: Medium or easy.
Recommendation: No.
10. LP:637814 Combine add/move to add
LP:637882 Combine ldr/mov to ldr
Possible improvements have been found. No idea how to fix it yet.
Cause: Unknown.
Difficulty: Unknown.
Recommendation: No.
11. LP:638014 Replace memset by memclr when 2nd parameter is zero
Difficulty: Easy.
Recommendation: No recommendation so far.
12. LP:625233 Merge constant pools for small functions
Cause: Unknown.
Difficulty: Medium.
Recommendation: No.
13. LP:638935 Replace multiple vldr by vldm
Some vldr insns accessing consecutive address can be replaced by
single vldm. It is not about thumb2, but related to code size optimization.
Cause: Unknown.
Difficulty: Medium.
Recommendation: No.
--
Yao Qi
CodeSourcery
yao(a)codesourcery.com
(650) 331-3385 x739
Hi there. I've always wanted to mix this:
http://www.futurlec.com/ET-STM32_Stamp.shtml
with some of this:
http://bit.ly/cD0JPS
to control my one of these:
http://www.traxxas.com/products/electric/rustler2006/gallery/3705-3qrtr-Bla…
and it sounds like a good opportunity to dogfood the Linaro toolchain
at the same time. What's the best way to set up a Cortex-M3 toolchain
with an appropriate newlib and libgcc?
A wrapper script works fine but I need a way of recompiling libgcc for
the Cortex-M series. I'd love to get a arm-none-eabi toolchain
package out of this that others could use. Could I re-work the cross
packaging to use newlib and change the configure flags instead? Are
there existing Debianised cross packages that I could reuse?
Ta,
-- Michael
Hi Andrew. Well, the builds are done and they're OK. I've added the
ability to compare against an explicit release to make checking
regressions easier.
4.4 results are here:
http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs…
i686 and x86_64 have not regressed since 2010.08.
On arm, and ignoring the limits test, 2010.09 adds a failure on
gcc.c-torture/compile/991026-2.c. According to the log the run timed
out but I can't reproduce it.
4.5 results are here:
http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs…http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs…
i686 has not regressed since 2010.08. x86_64 fails on
gcc.target/i386/wmul-1.c, but this is a new tests for new features and
are not a regression against 4.5.1.
arm is messier. The following new failures exist:
Vectoriser related:
* g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorized 1 loops" 1
* g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorizing
stmts using SLP" 1
* gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect
"vectorized 1 loops" 1
* gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect
"vectorized 1 loops" 1
* gcc.dg/vect/vect-reduc-dot-s16b.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-1a.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-1b.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-1c.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-2a.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/vect-reduc-pattern-2b.c scan-tree-dump-times vect
"vectorized 1 loops" 0
* gcc.dg/vect/wrapv-vect-reduc-pattern-2c.c scan-tree-dump-times
vect "vectorized 1 loops" 0
Others:
* gcc.target/arm/neon-load-df0.c scan-assembler vmov.i32[
\t]+[dD][0-9]+, #0\n
* gcc.target/arm/synchronize.c scan-assembler __sync_synchronize
neon-load-df0 is a new test. synchronize.c is an incorrect test as
the compiler now correctly uses the dmb instruction.
Your thoughts?
-- Michael
I would like to announce that my work on armel cross toolchain got to the very
nice point - all packages are available from PPA.
What does it mean to you?
1. no "are you sure to install those unverified packages" messages from APT
2. ability to easily rebuild toolchain on own machines
So if you used my repository from people.canonical.com then please switch to
PPA one:
add-apt-repository ppa:hrw/armel-cross-compilers
Old repository will be available for some time but will not get any updates.
Next step: merging those packages into Maverick release.
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
I've been checking over the benchmarks as a lead up to the 2010.09
release. We're in a good way compared to both 4.4.4 and 4.5.1 for
most non-trivial tests.
* pybench is 10.9 % faster than 4.4.4 and 7.7 % faster than 4.5.1.
* linpack is 46.4 % faster than 4.4.4 and the same as 4.5.1.
* ffmpeg h.264 video decode (with hand written assembler versions
turned off) is 15.4 % faster than 4.4.4 and 1.2 % faster than 4.5.1.
All results are statistically invalid and against poor workloads, but
I'll work on that.
See http://ex.seabright.co.nz/helpers/benchcompare for more.
-- Michael
Loïc Minier wrote:
> I see you moved the wiki page to the public space, thanks
>
> Couple of notes:
> * make sure you use the rename action on the page, I think this will
> preserver history (I didn't check whether you did or not, but I think
> not)
No, I didn't. I use copy and paste. I'll use rename action.
> * add a page at the old location with "#redirect NewPage" or
> "#refresh 0 http://newurl/"
OK, got it. Thanks for your help on wiki.
...are available here:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-09-06
A copy and activity reports are included below.
-- Michael
Attendees
• Name Email IRC Nick
Andrew Stubbs ams(a)codesourcery.com ams_cs
Chung-Lin Tang cltang(a)codesourcery.com cltang
Julian Brown julian(a)codesourcery.com jbrown
Loïc Minier lool(a)linaro.org lool
Marcin Juszkiewicz marcin.juszkiewicz(a)linaro.org hrw
Matthias Klose doko(a)canonical.com doko
Michael Hope michael.hope(a)linaro.org michaelh
Peter Maydell peter.maydell(a)linaro.org pm215
Richard Earnshaw richard.earnshaw(a)arm.com rearnshaw
Ulrich Weigand ulrich.weigand(a)linaro.org uweigand
Yao Qi yao(a)codesourcery.com yao
Agenda
• Licensing of string routines
• State of valgrind
• State of GDB
• Open tickets
□ 600298, 616141, 604753: SMP/sync related
□ 605059 4.4.5
□ 629671 ICE in reload_cse_simplify_operands in thumb-1 mode
□ 590696 Wrong use of objdump during cross build
• Upcoming release
• Creating blueprints
Blueprint Assignee
Initial delivery of Linaro GCC 4.4 ams
Cross Compiler Packages hrw
Action Items from this Meeting
• ACTION: Richard to check with the legal department on string licensing
issues
• ACTION: Peter to talk with valgrind upstream re: Linaro releasing a
ARM-focused version
• ACTION: Michael to organise an 'experimental' PPA that toolchain output can
go into
• ACTION: Michael to talk with Cody Somerville re: building on ARM
• ACTION: Michael to set up a GDB 7.2 based off the release tarball
• ACTION: Andrew to pull sync changes back into 4.4 for this release
• ACTION: Michael to assign appropriate sync ticket to Andrew to track the
backport
• ACTION: Andrew to merge the current post 4.4.4 release branch into our 4.4
for this release
• ACTION: Julian to do a basic investigation into 629671
• ACTION: Andrew to merge the cross-compile objdump ticket into this release
and re-kick upstream process
Action Items from Previous Meeting
• ACTION: Michael to re-check with TSC that we can assign copyright but keep
ability to relicense
• DONE: Yao to continue on GDB for a week then switch to investigation
• ACTION: Peter to check into the state and progress of valgrind for the
meeting on the 30th.
• ACTION: Chung-Lin to shift the CSL backport list out onto the Linaro wiki
• ACTION: Michael to see about doing an archive rebuild with 4.5
• DONE: Michael to send IBM's list to Yao
Minutes
String routines:
• Michael asked Richard about getting the current str* routines by ARM
transferred to Linaro
• Linaro will then get these into other C libraries
• FSF prefers LGPL and copyright for glibc
• Linaro prefers MIT/X11 everywhere so that fixes and improvements can be
shared
• Richard is concerned about the copyright assignment and any patent grant
• ACTION: Richard to check with the legal department on string licensing
issues
• Extreme fallback is to re-write the routines to all be under Linaro
copyright. memcpy() and similar may need this
Valgrind:
• Peter has been looking at how it works on the ARM platform
• Upstream is very responsive to issues
• Now works on Firefox and OO.org
• Upstrem doesn't have any particular release cycle
• ARM changes are pretty extensive and can't be extracted
• Peter suggested making valgrind available in a PPA to start with
• NEON detection at startup is remaining issue
• What next?
□ Packaging is straight forward
□ Don't want to steal upstream's thunder or release something
inappropriate
□ ACTION: Peter to talk with valgrind upstream re: Linaro releasing a
ARM-focused version
• Could bring into the Linaro overlay PPA
• ACTION: Michael to organise an 'experimental' PPA that toolchain output can
go into
• ACTION: Michael to talk with Cody Somerville re: building on ARM
GDB:
• 7.2 is now available
• Time to start up a gdb-linaro based on that
• Matthias mentioned that we will have GDB 7.2 on Maverick
• How should we manage the source
□ QEMU is over git
□ Could use bzr or git
□ bzr with Launchpad can't handle multiple branches when pulling from git
□ GDB is unique in how it's mixed in with the rest of the projects hosted
on sourceare
□ Branches as such are trucky
□ Could just base off tarballs
□ ACTION: Michael to set up a GDB 7.2 based off the release tarball
Tickets:
• ACTION: Andrew to pull sync changes back into 4.4 for this release
• ACTION: Michael to assign appropriate sync ticket to Andrew to track the
backport
• ACTION: Andrew to merge the current post 4.4.4 release branch into our 4.4
for this release
• ACTION: Julian to do a basic investigation into 629671
• ACTION: Andrew to merge the cross-compile objdump ticket into this release
and re-kick upstream process
Patch tracker:
• Andrew noted that it is now fully populated with the GCC data
• Has assigned various patches that still need to go upstream to Yao and
Julian
Next meeting is on 2010-09-08 on the public code.
--- Chung-Lin Tang
== Linaro Toolchain ==
* Google ARM patch sets: committed a second set to SG++ 4.5 trunk on
Tues. AndrewS pushed both sets to Linaro. Worked on a third set, those
related to PR42235, but this time regression test results were not so
clean. Will look into, but considering whether to stop the backports
here.
* LP:628526, submitted a patch to gcc-patches for explicitly turning
off stack protection in libgcc build flags, awaiting response.
* LP:601030, eglibc 2.11/12 problem with ___longjmp_chk on x86-64.
Problem seems to be clear, fix quite simple, but so far cannot seem to
reproduce and verify. Also unclear if I should send the fix to eglibc
or glibc, the idea of the latter making me a bit nervous... :P
== libffi ==
* Got an acknowledgement from the libffi maintainer that he'll review
the VFP hard-float support patch soon.
== This week ==
* Look into remaining Google approved patches, mainly those related to
PR42235 and PR42575.
* Try to reproduce LP:601030 and send patch soon.
* Linaro GCC investigations.
--- Andrew Stubbs
== Linaro GCC ==
* Michael has get the new patch tracker into a usable state. I've
transferred all the data from the old wiki tracker, and looked up the
remaining data as far as I can. The new tracker should now be fully
populated with data. It's here, for the moment:
http://ex.seabright.co.nz/helpers/patchtrack
* Start Yao and Julian on the optimization investigation tasks.
* Continue trawl through the CS bugs looking for candidates to push
to the Linaro tracker.
== Other ==
* Public holiday on Monday.
* Attended the monthly CS/Linaro sync meeting.
--- Yao Qi
== Linaro GDB wrap up ==
* LP:615993 gdb.base/sigstep.exp failures
Patch was committed to gdb mainline and 7.2 branch.
* LP:615995 gdb.base/watch-vfork.exp failures
Discussed with Pedro, create a patch, which fixed failures on ARM,
but can't fix failures on x86(they are caused by different problems).
Leave the x86 failures there, and patch is being reviewed in
gdb-patches.
== Linaro GCC ==
* CS306:Investigate on thumb2 improvement
Read/understand previous effort related on code size
improvement from CSL wiki pages.
Experiment with CSL scripts for size benchmarking. With Dan's
help, run benchmarking in a correct/reasonable way.
'Reproduce' some inefficient code mentioned by Julian. Some of them
are still there.
== Misc ==
* LP:605042
Revert one patch, and rebuild it. No seg fault is found.
== This Week ==
* Continue my work on CS306.
--- Peter Maydell
RAG:
Red:
Amber: virtio-system writeup not going as fast as expected
Green: ARM legal OK now received
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | ? | |
I need to replan this (no forward progress this week
because more important stuff intervened)
Progress:
virtio-system:
- actually trying a SATA disk revealed that the PB926 PCI
interrupt mapping was wrong; now fixed after consulting
the schematics and a round or two of patch testing with Arnd
- I have a PB1176 board but it doesn't seem to talk to
the serial port on poweron. Will try a firmware reflash
but it might just be broken...
- no progress on writeup because other things intervened.
valgrind:
- went through the motions of getting a valgrind svn snapshot
into the ubuntu packaging
- tested on pegatron (A8, maverick, thumb2), found four bugs:
+ BX PC not implemented (fixed upstream)
https://bugs.kde.org/show_bug.cgi?id=249775
+ RBIT not implemented (fixed upstream)
https://bugs.kde.org/show_bug.cgi?id=249924
+ pwrite64 syscall not implemented (fixed upstream)
https://bugs.kde.org/show_bug.cgi?id=249996
+ test for presence of neon wrong
https://bugs.kde.org/show_bug.cgi?id=249775
With a bodge for the last and the fixes for the first 3,
valgrind now successfully runs openoffice and firefox.
other:
- Investigated https://bugs.launchpad.net/bugs/628471 : qemu-maemo
doesn't work with new linaro beagleboard kernels. It looks
like we now try to probe for NAND (which failed earlier
for other reasons which I suspect are a now-fixed bug),
and qemu-maemo's NAND implementation doesn't map anything at
the address the nand code is trying to poll for a status bit.
- first post to qemu-devel :-) (review of somebody's
patch to not confuse SMC with BKPT in the arm decoder)
Plans:
virtio-system:
- hoping to get the qemu patches into the ubuntu qemu-maemo
package, which will avoid the need to talk about patching qemu
- finish the writeup and put it on the wiki
- test PCI patches on PB1176
valgrind:
- respin a valgrind with proper fixes for everything and
put it in a PPA somewhere
other:
- come up with some fix or workaround for #628471
- put the rebased ubuntu qemu-maemo work up onto gitorious
so other people can see it
Absences:
Friday 5 November and 20 other days in this calendar year
Looks good. I've created a real project, added a README/LICENSE, and
merged your changes. See:
https://launchpad.net/tcwg-web
There was a funny render difference between Firefox and Chromium -
revisions with no bugs lead to a rowspan of zero which Firefox doesn't
like. I also pulled some common code out into a function and used the
built-in variable 'loop'. 'loop' is quite nice as it provides values
like .index, .first, .odd, and so on based on your position in the
loop.
-- Michael
On Fri, Sep 3, 2010 at 11:02 PM, Andrew Stubbs <ams(a)codesourcery.com> wrote:
> Hi Michael,
>
> I've been playing with you patch tracker, and come up with this:
>
> https://code.launchpad.net/~ams-codesourcery/+junk/tcwg-web
>
> I don't seem to be able to propose an official merge request to your branch,
> but it's just a quick implementation anyway, and could probably be cleaned
> up.
>
> The patch renders each ticket in it's own row (without changing the way the
> first two columns are rendered). This means they can have their own colour
> and we can maybe see better what status goes with what bug.
>
> To see an example of what it does, see revision 4.4:93544
>
> Andrew
>
I want to share status of my cross compiler packages work with all of you.
Some time ago I did a split of them into two:
- armel-cross-toolchain-base (1.36 now)
- armel-cross-toolchain (1.29 now)
Where first one provides binutils, linux headers, libc6 and libgcc
packages. Second provides final gcc.
Today I got a-c-t-base to a moment when it builds fine on PPA [1]. 1.36 got
sent for rebuild to fix missing gcc-4.5-arm-linux-gnueabi-base package. When
it will build then a-c-toolchain package will get uploaded for build.
Result will deprecate my current repository at people.canonical.com [2]
because PPA gives signed repository.
On Monday I will probably have to update both components because there was
gcc-4.4 upload so probably gcc-4.5 will follow (so I will be able to drop one
patch).
Additionally I made 'gcc-defaults-armel-cross' package (available in [1])
which makes installing of cross compilers a bit easier (no need to worry which
version to install - just "apt-get install gcc-arm-linux-gnueabi" is enough).
Selection of cross gcc version is done in other way then native one. Native is
using "gcc" package which contains /usr/bin/gcc as symlink to /usr/bin/gcc-4.4
file. Cross gcc uses "update-alternatives" to setup /usr/bin/arm-linux-
gnueabi-gcc file. I want to fix it in 11.04 so cross gcc will use same method
as native one.
1. https://edge.launchpad.net/~hrw/+archive/arm-cross-compiler
2. http://people.canonical.com/~hrw/ubuntu-maverick-armel-cross-compilers/
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
Michael,
a quick update to our discussion today: actually, GDB 7.2 has already been
released earlier today:
http://sourceware.org/ml/gdb-announce/2010/msg00003.html
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi Alexander. I've looked into the problem and the linker error is
caused by a mix of stack protector options between libgcc and the C
library.
GCC includes a feature called the stack smashing protector which
detects writing past the end of a stack based object. It's quite nice
as it gives decent protection against buffer overruns which are the
most common type of security vulnerability.
The implementation is straight forward: when compiled with
-fstack-protector, any function with a stack-based character array
will have extra stack checking code inserted into the prologue and
epilogue. The prologue allocates a canary value at the top of stack
and fills it in with the value of '__stack_chk_guard' provided by
libssp. The epilogue checks this value and calls `__stack_chk_fail`
if it has been changed. The stack protector can interfere with some
code and isn't applicable in others.
The problem here is caused by a stack up of things:
* glibc knows about -fstack-protector and turns it on and off for
different functions and libraries
* gcc knows about -fstack-protector and includes libssp if required
* glibc knows about libgcc and statically links against it to ensure
availability
* Meego seems to turn on -fstack-protector by default (as does Ubuntu)
This results in the libgcc function '_gcc_Unwind_Backtrace' being
built with the stack protector and the glibc library 'libanl' without.
At static link time GCC sees that the stack protector is off and
skips linking against libssp, causing the missing symbol error.
The solution is to add -fno-stack-protector to the libgcc build
options and rebuild the compiler. I've heard (but can't track down
the link) that the ARM libgcc unwind functions must be built this way
in any case.
See
http://svn.debian.org/wsvn/gcccvs/branches/sid/gcc-4.5/debian/patches/gcc-d…
for how Debian does this.
Hope that helps,
-- Michael
On Fri, Aug 27, 2010 at 9:06 PM, <Alexander.Kanevskiy(a)nokia.com> wrote:
> Hi Michael.
>
> I've created for you account in MeeGo OBS (build system that we use in MeeGo
> is OpenSuSE build system)
>
> login: michaelh
> password: wog-feg-da
> Web client url: https://build.meego.com
> API url: https://api.meego.com
>
> The build log that had problem with glibc 2.12 + gcc 4.5 you can find here:
>
> https://build.meego.com/package/live_build_log?arch=armv7el&package=glibc&pr
> oject=home%3Akad%3Abranches%3ATrunk%3ATesting&repository=standard
>
> Might be you have some idea what went wrong, as our toolchain people were
> not able to find why combination of latest gcc plus glibc 2.11.x works, but
> not gcc 4.5 + glibc 2.12.0 :(
>
> This log is from my home project inside OBS, where stuff is already a bit
> outdated. I'll ask Jan-Simon from Linux Fundation to point to right place
> where latest builds are present, so you can experiment with them.
>
> --
> Best regards, Alexander Kanevskiy.
>
>
>
Hi all,
I've just discovered that Ubuntu is not using the Linaro release
information in the --version string. This is not ideal when we get bug
reports as it makes it hard to understand what Linaro release to use to
reproduce the issue.
Therefore, I've created a new wiki page to track the mappings:
https://wiki.linaro.org/WorkingGroups/ToolChain/VersionMappings
For now this only applies to GCC, but no doubt other tools will follow.
Please help keep it up to date if you find a version is missing. I've
added it to the GCC release process wiki page, so hopefully it should
get looked at at least once a month.
Andrew
Hi there. We have a Toolchain WG has a Versatile Express board coming
our way. It's a quad-core Cortex-A9 with 1 GB of RAM, so quite decent
really.
Does anyone have a pressing need for it? If not then I'll take it and
make it available over SSH.
-- Michael
Please find the activity reports and minutes for Monday's meeting
below. The minutes are also available at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-23
Minutes from the Wednesday and Friday standup calls are at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-18https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-20
-- Michael
Attendees
• Name Email IRC Nick
Andrew Stubbs andrew.stubbs(a)linaro.org ams
Chung-Lin Tang cltang(a)codesourcery.com cltang
Matthias Klose doko(a)canonical.com doko
Michael Hope michael.hope(a)linaro.org michaelh
Peter Maydell peter.maydell(a)linaro.org pm215
Richard Earnshaw richard.earnshaw(a)arm.com rearnshaw
Yao Qi yao.qi(a)linaro.org yao
Agenda
• Open tickets
□ 616141 Backport the sync_* primitive fixes
□ 590696 fix wrong use of objdump during cross build
□ 600277 Backport ARM Cortex A9 scheduling changes
□ 605059 Merge 4.4.5
• Upcoming release
□ GCC 4.4
□ GCC 4.5
□ GDB
□ Strings
• 4.6 backport approach
• Creating blueprints
• Connecting with other groups
Blueprint Assignee
Initial delivery of Linaro GCC 4.4 ams
Cross Compiler Packages hrw
Action Items from this Meeting
• ACTION: Chung-Lin to move the list of other backports out of the CSL wiki
and into Linaro
• ACTION: Michael to re-check with TSC that we can assign copyright but keep
ability to relicense
• ACTION: Yao to continue on GDB for a week then switch to investigation
• ACTION: Peter to check into the state and progress of valgrind for the
meeting on the 30th.
Action Items from Previous Meeting
Minutes
Tickets:
• Went through the open tickets in the agenda
• Andrew will backport the SMP changes, including the sync primitives
• Andrew will backport the A9 changes
□ Most of the changes should come through easily
□ There is a write after write hazard
□ Currently uses the new cost infrastructure
□ Backport the cost infrastructure if it will be used further in the
future
4.6 branch:
• Andrew suggested starting a 4.6 branch after the start of stage 3
□ Start landing patches early
□ When FSF 4.6.0 comes out, we will have a corresponding Linaro 4.6.0
• ACTION: Chung-Lin to move the list of other backports out of the CSL wiki
and into Linaro
String routines:
• Richard asked about the response
• Michael had replies from Roland McGrath (http://sourceware.org/ml/
libc-alpha/2010-08/msg00029.html) but not the wider gcc-sc
• All other architectures are LGPL and FSF assigned
• The current approach is to assign a particular version to glibc
• Could cause a small maintenance problem in the future
• Richard isn't sure that we can assign copyright of a particular version
• ACTION: Michael to re-check with TSC that we can assign copyright but keep
ability to relicense
4.6 backports:
• Talked about the approach for backporting 4.6 features
• Won't backport every single change as then Linaro 4.5 becomes FSF 4.6
• Backport correctness fixes as the problem is found
• Backport performance changes as they occur
• Discussed how upstream could be tracked
□ Notification of any CSL or ARM authered changes will come from them
□ All changes are supposed to go through gcc-patches
□ Andrew notes that gcc-cvs provides a filtered view of what actually
landed
□ At least monitor these lists and search for ARM|Thumb|NEON|XSCALE|
Cortex|Coretx|VFP|Snapdragon|OMAP
Michael noted that IBM are interested in the ARM compiler and plan to get
involved soon.
Michael has asked again for A9 hardware. No news yet.
Future:
• Would like to spend some time soon running invetigrations to spit out some
blueprints
• ACTION: Yao to continue on GDB for a week then switch to investigation
• Andrew noted that there is one more person to come from CSL
• Will ask that person to do investigation
• Richard is keen to see the blueprints to check against what ARM is doing
□ Michael asked for information about their planning process so that we
can line things up
Valgrind:
• Peter noted that the valgrind changes have been committed upstream
• ACTION: Peter to check into the state and progress of valgrind for the
meeting on the 30th.
Next meeting is a stand-up meeting on 2010-08-25 on the public code.
--- Andrew Stubbs
== GCC 4.5 ==
* Continued pushing 4.5 patches to Linaro. I have now caught up with
current development I think.
* Lots of discussion on the patch tracker. You'd think it was more
important than the compiler .... :(
== Upstream ==
* Did before and after tests of the Coretex-A5 scheduler against
upstream HEAD. All seemed well (or at least, no worse) so I've posted
the patch upstream. No word back yet ....
--- Chung-Lin Tang
== Hard-float ==
* Testing EEMBC softfp vs. hard-float calling convention performance numbers.
* The only conclusive result was that OAmark is 2%-3% faster,
presumably due to vector graphics-like code in that suite. May look
into other code (was suggested Cairo) to see if any gain in changing
to hard-float.
* Withdraw earlier comment on small improvements on Automark (was not
apparent after more experiment runs).
* Currently working to produce report files.
== Linaro GCC ==
* Looking at getting into GCC backport work this week.
--- Yao Qi
== Linaro GDB ==
* LP:615997 gdb.dwarf2/dw2-ref-missing-frame.exp failure
Patch is committed to gdb mainline.
* LP:615999 gdb.gdb/selftest.exp failure
Patch is committed to gdb mainline.
* LP:615995 gdb.base/watch-vfork.exp : Watchpoint triggers after
vfork (sw) (timeout)
With Pedro's help, got to know the failure of this case on arm and
x86 are different. Created a patch as Ulrich suggested, and it works
on 2.6.32, while fails in a different way on 2.6.35. Failure is
caused by debuggee process is killed by a SIGTRAP. Still no clue why
that can happen.
== Linaro GCC ==
* My patch to PR45094 is approved, and checked in to mainline.
== This Week ==
* Fix LP:615995 and other linaro gdb bugs.
--- Ulrich Weigand
== GCC ==
* Collected and wrote up suggestions for future GCC work
== GDB ==
* Opened Launchpad bugs for known GDB problems and testsuite failures
* Investigated bug #620595 (gdb.threads/threxit-hop-specific.exp failure)
* Fixed bug #615998 (gdb.gdb/observer.exp failures) in mainline and 7.2
* Worked on upstream fix for #620595 (gdb.threads/threxit-hop-specific.exp
failure)
* Analyzed bug #620611 (Unable to backtrace out of vector page 0xffff0000)
== Infrastructure ==
* Continued working with our order&control team to acquire IGEPv2 boards
--- Peter Maydell
RAG:
Red: None
Amber: ARM legal OK for qemu contributions still pending
Green: we have approval for laptops for linaro secondees
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | 2010-08-27 | |
Progress:
virtio-system:
- got my versatile kernel/qemu running with virtio disk and network
versus non-virtio
- ran some basic benchmarking (bonnie++ for disk, tbench for net).
Disk is faster with virtio, but strangely networking is not!
- tried an upstream qemu too -- net virtio still slower
- built a realview kernel in preparation for testing Arnd's
PCI patches on hardware
qemu-focused-kernel:
- some research into which ARM dev boards support PCI in
hardware, kernel and qemu, to try to find a good choice for
basing a qemu-focused kernel on
merge-other-branches:
- started compiling list of qemu branches for possible consolidation
Issues: the intersection of (recent ARM hardware) (PCI support)
and (supported in qemu) looks suspiciously like the empty set.
Plans:
virtio-system:
- borrow some versatile or realview hardware and test Arnd
Bergmann's PCI patches
- make a start on writing up the config/benchmark results
qemu-focused-kernel:
- flesh out this blueprint
valgrind:
- try to build an ARM valgrind from upstream's thumb branch
Absences:
Friday 5 November and 20 other days in this calendar year
...are available here:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-18
I'm going to stop sending out emails about the stand up minutes and
include links the weekly minutes instead.
Trick of the day:
w3m -dump https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-18
| xclip -selection clipboard
...dumps a web page straight into your clipboard for pasting into an
email client.
-- Michael
Attendees
• Name Email IRC Nick
Andrew Stubbs andrew.stubbs(a)linaro.org ams
Yao Qi yao.qi(a)linaro.org yao
Ulrich Weigand ulrich.weigand(a)linaro.org uweigand
Peter Maydell peter.maydell(a)linaro.org pm215
Julian Brown julian(a)codesourcery.com jbrown
Loïc Minier loic.minier(a)linaro.org lool
Michael Hope michael.hope(a)linaro.org michaelh
Chung-Lin Tang cltang(a)codesourcery.com cltang
Agenda
• Standup meeting
Blueprint Assignee
Initial delivery of Linaro GCC 4.4 ams
Cross Compiler Packages hrw
Action Items from this Meeting
Action Items from Previous Meeting
• DONE: Michael to think about synchronising Linaro releases with upstream
• DONE: Michael to organise a call with Matthias, Loic to continue the topic
• DONE: Michael to write up and email patch tracker mechanics for review
• DONE: Ulrich to add his time away to the Linaro calendar
• ACTION: Michael and Ulrich to add GDB new features as blueprints to
Launchpad
• ACTION: Andrew to look into frequent runs of CSL benchmarks
• ACTION: Michael to make sure Linaro has a FSF copyright assignment
agreement
Minutes
• Michael
□ Continuing on patch tracking
□ Continuing investigating string routines
□ Julican noted that using NEON adds power usage and adds a context
switch cost
• Andrew
□ Has a few patches left to go
□ The ones left are a bit curlier
□ Reviewing the upstream state of the current patches
□ Will be sending the Cortex-A5 work upstream
• Yao
□ is continuing with the GDB bug fixes
□ Most are caused by the testsuite
□ Michael noted that we want to make any work we do available early. If
landing on trunk, either backport to 7.2 or note for later pulling into
our branch
• Ulrich
□ Working on bugs such as:
☆ Tracking thumb bit on a long jump
☆ Tracing in the kernel stubs
□ Is currently working upstream
□ Mentioned the ICACHE flush problem seen on Michael's board
☆ ACTION: Michael will try to upgrade the kernel from Angstrom 2.6.32
to a Linaro kernel
• Peter
□ Continues on virtio and QEMU
□ Network benchmark currently shows virtio performing worse than emulated
□ Looking upstream to see if this problem exists
□ Still waiting on approval to release work. Richard will take care of
next week
• Chung-Lin
□ Starting on hard vs soft FP performance tests
□ Testing on a i.MX51 board
□ Michael wants Chung-Lin to finish up on libffi soon
• Julian
□ Working on a vector conditions patch
□ Currently seeing a segfault in compiled applications
□ ACTION: Michael will re-try the build failures that Julian saw by the
end of this week
Next meeting is a stand-up meeting on 2010-08-20 on the public code.
The patch tracking conversation has got a little out-of-hand, and I know
I've misunderstood some of the features Michael has been proposing, and
I suspect vice versa.
So, here's my attempt to compare and contrast the various advantages,
disadvantages, and differences of the ideas so far, by means of use cases.
Looking at this, I think we can probably come up with a solution that
uses the good bits from each (maybe method 1 with the milestone/status
policy from method 2, for example).
Please read the below, and let me know if I left anything out, or if I
misunderstood something.
Andrew
====
For the purposes of this document:
* Method 0 is my original patch tracker, here:
https://wiki.linaro.org/WorkingGroups/ToolChain/GCC4.5UpstreamPatches
* Method 1 is Michael's proposed patch tracker, here:
https://wiki.linaro.org/WorkingGroups/ToolChain/PatchTracking#Method%201
and here: http://ex.seabright.co.nz/helpers/patchtrack
* Method 2 is my proposed system, here:
https://wiki.linaro.org/WorkingGroups/ToolChain/PatchTracking#Method%202
---------------------------------------------------------
1. What does a user have to do to get a patch tracked?
Method 0:
Nothing. New rows are added to the wiki page regularly by a script and
cron-job.
Method 1:
Nothing. The tracking report is updated regularly.
Method 2:
Nothing. New tickets are created automatically, regularly.
------------------------------------------------------------
2. How to find tracking information for a revision?
Method 0:
Search the wiki page for the revision number.
Method 1:
Goto the report page, click through to all the various associated
tickets, if any.
Method 2:
Go to launchpad, and search for "r123456", or select it from the list in
the relevant gcc-linaro-tracking milestone.
------------------------------------------------------------
3. How to find tracking information for a bug fix?
Method 0:
Search the wiki page for the bug number - hopefully somebody has posted
a link. Alternatively, the first line of the commit message will be
present. If that doesn't work, then find the revision number by other
means (bzr).
Method 1:
Go to the bug ticket - it should be there, or a link to another bug that
has it. Alternatively, go to the tracker report page, and search for the
commit message. If that fails try to identify the revision number by
other means (bzr)
Method 2:
Go to the bug ticket - if the bug was committed with --fixes, there will
be a link to the tracking ticket. Alternatively, search
gcc-linaro-tracking to find the commit message. If that fails try to
identify the revision number by other means (bzr)
----------------------------------------------------------
4. How to add new tracking information?
Method 0:
Edit the wiki page.
Method 1:
Add the new information to one or all of the associated bugs, if any. If
there are no existing tickets, create a new ticket (using the link on
the tracker report) and put the information there.
Method 2:
Add the information to the ticket.
-----------------------------------------------------------
5. How to indicate that the bug is upstream?
Method 0:
Edit the wiki page, set the bgcolor to green.
Method 1:
Assign all the bug tickets to a gcc-linaro-tracking milestone.
Method 2:
Mark the bug "Fix committed". Ensure that the ticket has the correct
milestone.
------------------------------------------------------------
6. How to list all patches that need to go upstream?
Method 0:
View the wiki page - the patches are highlighted in red and yellow.
Method 1:
View the tracker report - the patches are highlighted in red, yellow,
and orange.
(Note that launchpad will only list the patches that already had a
ticket attached, or else somebody has create one. This will usually only
include patches where somebody had something to say about it.)
Method 2:
All open bugs in gcc-linaro-tracking.
-------------------------------------------------------------
7. How to list all patches that need forwarding on rebase from 4.5 to 4.6?
Method 0:
Any patches marked in red or yellow on the wiki page need forwarding.
Any patches marked in green with an upstream landing number of 4.7 or
higher also need forwarding. (This information is not yet encoded in the
page, but it's a wiki, so flexibility is not a problem.)
Any patches in grey also need considering. Some are uninteresting
version bumps and such. Some are patches we plan to carry forever.
Probably a new colour could be used to make this clearer - it's a wiki.
Method 1:
Any patches in the report not yet upstream need forwarding. Any patches
in launchpad against the 4.7 milestone (or higher) also need forwarding.
Any patches in the "never" milestone also need considering. Some might
be ancient patches we used to carry in 4.4, but have since been dropped.
Some will be patches we intend to carry.
Method 2:
All patches against the 4.7 milestone, both open and closed (modify the
launchpad search criteria) need forwarding. All patches in the
"series:never,milestone:4.5" milestone in the "won't fix" state need
forwarding.
(Patches we don't intend to carry forward will be "closed", and patches
from 4.4 won't be in "series:never,milestone:4.5", so we never have to
worry about those.)
------------------------------------------------------------------
8. How to we track what patches have been forwarded on rebase already?
Method 0:
It's a wiki, add a column.
Method 1:
Committing the patch on a new branch will (with --fixes) will cause
launchpad to list the commit on the bug page. There's no way to query
this though.
Method 2:
Committing the patch on a new branch will give a "new" patch to track.
The trackerbot will create a new ticket for this revision. The old
ticket will be marked as a "duplicate" of the new one (manually, or
automatically). The new bugs will have "4.6/r123456" in the subject
line, so can be easily be differentiated.
-----------------------------------------------------------------
9. What else needs doing on a rebase?
Method 0:
Create a new page with a new table. Forward the information from the old
table manually, by editing the wiki.
Method 1:
Create a new tracker report.
Method 2:
Set up the trackerbot on the new branch.
------------------------------------------------------------------
10. What prompts users to use the system?
Method 0:
Nothing. (Management nagging.)
Method 1:
Nothing mentioned so far.
Method 2:
The bug is always assigned to somebody. They'll be notified by email,
and it will show up on their launchpad pages.
------------------------------------------------------------------
11. What happens when a bug produces multiple patches?
Method 0:
Multiple lines in the table, initially. But, it's a wiki, so they can be
edited, moved around, and coalesced as required.
Method 1:
The same bug has to track multiple patches.
????? How does that work with the 'affects gcc-linaro-tracking' lines?
Method 2:
One ticket per commit. Each is tracked separately, but the user is free
to mark each ticket as a duplicate of the other, and/or move the data
from one ticket to another.
------------------------------------------------------------------
12. What happens when one commit fixes multiple bugs?
Method 0:
Nothing special.
Method 1:
Multiple bugs will track the same submission process. Either the user
must post all the data to all the bugs, or one bug must get (manually)
appointed the master bug, and the others have links posted.
Method 2:
One ticket will be created to track the patch. The ticket will contain
links to all the bugs, and each bug will contain a back-link. This is
very little different to the normal case.
I've fleshed out a potential way of tracking patches at:
https://wiki.linaro.org/WorkingGroups/ToolChain/PatchTracking#Method%201
It's not too bad if you're a developer. The extra steps are:
* Create a ticket
* Mark that ticket as affecting upstream
* Change the status as the patch evolves
* Mark where the patch lands when finished
This is all done through Launchpad's existing interface.
Thoughts?
-- Michael
Sorry about these. Hopefully I'm done.
On Thu, Aug 19, 2010 at 2:28 PM, Michael Hope <620229(a)bugs.launchpad.net> wrote:
> Public bug reported:
>
> Related: lp:gcc-linaro/4.5,revno=99360
>
> Code hoisting improvements
>
> Merged from SourceryG++
>
> (Backport from FSF)
>
> ** Affects: gcc-linaro
> Importance: Undecided
> Status: New
>
> ** Affects: gcc-linaro/4.5
> Importance: Undecided
> Status: New
>
>
> ** Tags: revision
>
> --
> [4.5:r99360] Code hoisting improvements
> https://bugs.launchpad.net/bugs/620229
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Linaro GCC: New
> Status in Linaro GCC 4.5 series: New
>
> Bug description:
> Related: lp:gcc-linaro/4.5,revno=99360
>
> Code hoisting improvements
>
> Merged from SourceryG++
>
> (Backport from FSF)
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/gcc-linaro/+bug/620229/+subscribe
>
I know that Dhrystone isn't a very good benchmark, but it's still interesting:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/Dhrystone
If we can get twice the performance out of strcpy() and memcpy() then
the Dhrystone result should go up by almost 30 %. It would make a
nice headline at least :)
-- Michael
...are available here:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-16
Activity reports are included below.
-- Michael
Attendees
Name Email IRC Nick
Andrew Stubbs andrew.stubbs(a)linaro.org ams
Julian Brown julian(a)codesourcery.com jbrown
Loïc Minier loic.minier(a)linaro.org lool
Matt Gretton-Dann ARM
Matthias Klose doko(a)canonical.com doko
Michael Hope michael.hope(a)linaro.org michaelh
Peter Maydell peter.maydell(a)linaro.org pm215
Ulrich Weigand ulrich.weigand(a)linaro.org uweigand
Yao Qi yao.qi(a)linaro.org yao
Agenda
• Benchmarks
□ ffmpeg (h.264), libmad (mp3), LAPACK, CoreMark, Dhrystone, and memcpy()
□ EEMBC
□ Methods of benchmarking
• Patch tracking
□ Minimum features
□ Discovering upstream patches to backport
□ Example version at http://ex.seabright.co.nz/helpers/patchtrack
□ Ticket view at http://ex.seabright.co.nz/helpers/tickets
• Upstream tracking
□ Ubuntu tracks release+SVN. What should we do?
• memcpy() and friends
□ https://wiki.linaro.org/WorkingGroups/ToolChain/StringRoutines
□ Copyright issues with EGLIBC
• Hardware status
• Creating blueprints
• Connecting with other groups
• Open tickets
□ 616141 Backport the sync_* primitive fixes
□ 590696 fix wrong use of objdump during cross build
□ 600277 Backport ARM Cortex A9 scheduling changes
□ 605059 Merge 4.4.5
Action Items from this Meeting
• ACTION: Michael to think about synchronising Linaro releases with upstream
• ACTION: Michael to organise a call with Matthias, Loic to continue the
topic
• ACTION: Michael to write up and email patch tracker mechanics for review
• ACTION: Ulrich to add his time away to the Linaro calendar
• ACTION: Michael and Ulrich to add GDB new features as blueprints to
Launchpad
• ACTION: Andrew to look into frequent runs of CSL benchmarks
• ACTION: Michael to make sure Linaro has a FSF copyright assignment
agreement
Action Items from Previous Meeting
• DONE: Richard to ask the GCC developers on IRC what the status of 4.4.5 is
□ Matthias talked with Jacob and they expect a release at the end of
August
• DONE: Andrew to merge 4.5.1 and the Firefox fix by 2010-08-17
• DONE: Ulrich to ticket GDB items
• DONE: Michael to understand whiteboards as a way of organising features
Minutes
Upstream patches:
• Loic, Matthias, and Michael discussed tracking the upstream release branch
• Upstream has a roughly three week cycle on patch releases
• Matthias tracks this release branch and pulls in SVN as a patch into Ubuntu
• Michael is concerned about the extra work, pulling a partial
□ feature, catching a regression, fallout due to releasing something that
upstream hasn't released,
• Release branches are very stable. Chance of a regression is low
• Better chance of getting a problem fixed upstream if caught early before
the release
• Loic: harder to track upstream
• Loic: also harder to identify a Linaro release -
FSF-4.4.4+svr12345+bzr6789...
• Matthias spoke with Jacob in GCC. They expect a release at the end of
□ August
• Michael prefers basing on released versions only
• Michael also wants to be able to reproduce any issues found in Ubuntu. Hard
at the
□ moment as Ubuntu runs Linaro plus a reasonably large set of patches
ts to be able
• One approach: merge the day after the Linaro release. Gives the maximum
time for
□ testing. Still a month behind, still have Ubuntu GCC != Linaro GCC
• Matthias
□ Would continue to keep diff between Linaro and upstream
□ Is doing changes upstream
□ Would have to maintain those separately
□ If there are release critical changes, would have to do them himself
• Discussed different phase of Linaro GCC vs Ubuntu - the releases won't be
in sync so there will always be a need for release critical fixes in Ubuntu
• ACTION: Michael to think about synchronising Linaro releases with upstream
□ Drift the Linaro release to be a week later than the upstream release
• Andrew wasn't concerned. Noted that the release branches are really good
• ACTION: Michael to organise a call with Matthias, Loic to continue the
topic
Patch tracking:
• Cortex-A5 changes are coming into Linaro
• Michael did a quick show and tell on the patch tracker hack
• ACTION: Michael to write up and email patch tracker mechanics for review
Other topics:
• Ulrich is out on vacation next week
• ACTION: Ulrich to add his time away to the Linaro calendar
• ACTION: Michael and Ulrich to add GDB new features as blueprints to
Launchpad
• Michael noted that we use bzr for version control wherever possible,
including for QEMU
• Michael asked Andrew if we could start regular runs of the commonly used
benchmark
• ACTION: Andrew to look into frequent runs of CSL benchmarks
• String routines
□ Michael talked with upstream and they greatly prefer FSF+LGPL
□ ACTION: Michael to make sure Linaro has a FSF copyright assignment
agreement
• Loic mentioned Valgrind patches
□ Peter is interested in helping
Next meeting is a stand-up meeting on 2010-18-11 on the public code.
-- Julian Brown
== GCC 4.5 bugs in Launchpad ==
* Attempted to reproduce bugs against Linaro GCC 4.5 in
Launchpad: issues #614184, #614185 and #614186. The last of these has
been closed as invalid by the submitter already, and the first two don't
seem to reproduce using a cross-compiler (i.e. using CS's build
machinery), nor with a natively-bootstrapped bzr head build (as of some
day last week). So, no real progress there.
== GCC 4.5 vectorization improvements ==
* Started tracking down the causes of some failures in vect.exp:
pr43430-1.c failed because of missing vector comparison support. This
seemed relatively straightforward to implement using NEON instructions,
so I had a go at doing that. Similarly another test fails (vect-35.c)
because of missing support for widening/narrowing patterns, though I'm
still investigating a fix for that.
-- Andrew Stubbs
== Linaro GCC releases ==
* Merged GCC 4.5.1 into the Linaro sources.
* Respun the 4.5-2010.08 release.
== Linaro GCC 4.5 ==
* Continued pushing SG++ patches into Linaro. I'm now most of the way
through these, but I had hoped to have had it done by now.
== Other ==
* Lost a few hours to internet outages. An engineer came out and
changed all the connections, so hopefully that's solved the problem.
It seems the recent bad weather had got into the underground cables
between here and the street cabinet. It's fibre from there, so in
theory it must be local.
* Took Wednesday as vacation.
-- Peter Maydell
Subject: [weekly][linaro] report week 32
RAG:
Red: None
Amber: ARM legal OK for qemu contributions
Green: booted versatile+virtio kernel on qemu
Milestones: none as yet
Progress:
virtio-system:
- identified which qemu patches give a qemu which can boot linaro
alpha-3 on beagle emulation
- built a kernel for 'versatile' flavour with virtio support
- found some patches from qemu upstream which are needed for
newer versatile kernels to boot
- got my versatile kernel running on qemu with the linaro
rootfs mounted via virtio
- put various qemu changes/patches into a git tree, so (a) I
know what I changed and (b) as an exercise in learning git
Issues: none
Plans:
virtio-system:
- get versatile to boot *without* virtio for comparison
- find an io benchmark and do some benchmarking
qemu-focused-kernel:
- flesh out this blueprint
Absences:
None planned.
-- Chung-Lin Tang
== libffi VFP hard-float support ==
Testsuite fixes and final tuning before submitting. Had to port some
stuff from the GCC testsuite to let libffi's testsuite support a
dg-skip-if option, to skip some variadic function tests based on
compiler options (skip when -mfloat-abi=hard), and I am not a
DejaGNU/expect/Tcl expert...:P Whole patch was submitted to main
libffi mailing list on Sunday, waiting for feedback.
(see http://sourceware.org/ml/libffi-discuss/2010/msg00153.html)
== This week ==
* See if any feedback on libffi patch.
* Start Linaro 4.5 EEMBC comparisons.
* Catch up on GCC work.
-- Yao Qi
== Linaro GCC ==
* Pinged ARM backend maintainer for my patch to GCC PR45094. Still
no response.
* Submit my patch on ARM target triplet in gcc-patches. With Dan's
help, got GCC write access. Checked in my patch.
* Update status of LP bugs. Mark them as Fix Released since
failures go away in gcc-linaro-4.4-2010.08-0.
== Linaro GDB ==
* Search something about GDB from my brain, like gdb test,
tcl/expect, ptrace/breakpointetc. Fortunately, most of
them are still there. :)
* LP:615997 gdb.dwarf2/dw2-ref-missing-frame.exp failure
Failure is caused by alignment of test case. Sent a patch to
gdb-patches, and revise it with Dan's help.
* LP:615989 gdb.base/pending.exp
Failure is caused by wrong code in .debug_line. Failure goes away
when it is compiled by gcc-linaro-4.5-2010.08-1.
Open a gcc bug LP:617384.
* LP:615995 gdb.base/watch-vfork.exp : Watchpoint triggers after
vfork (sw) (timeout)
Reproduce it on x86. GDB is waiting endlessly between
TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORK_DONE. Looks like
child of debuggee hangs on 'signal'.
== This Week ==
* Look into LP:615995 deeply, and other linaro gdb bugs.
Hi
I'm concerned that the workaround for apr was just uploaded in the form
of disabling process shared mutexes (see LP #599874), but we didn't
address or investigate the root cause in eglibc.
Would someone be able to look at LP #604753 where the issue is tracked?
Thanks!
--
Loïc Minier
...is available at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-16
-- Michael
Agenda
• Benchmarks
□ ffmpeg (h.264), libmad (mp3), LAPACK, CoreMark, Dhrystone, and memcpy()
□ EEMBC
□ Methods of benchmarking
• Patch tracking
□ Minimum features
□ Discovering upstream patches to backport
□ Example version at http://ex.seabright.co.nz/helpers/patchtrack
□ Ticket view at http://ex.seabright.co.nz/helpers/tickets
• Upstream tracking
□ Ubuntu tracks release+SVN. What should we do?
• memcpy() and friends
□ https://wiki.linaro.org/WorkingGroups/ToolChain/StringRoutines
□ Copyright issues with EGLIBC
• Hardware status
• Creating blueprints
• Connecting with other groups
• Open tickets
□ 616141 Backport the sync_* primitive fixes
□ 590696 fix wrong use of objdump during cross build
□ 600277 Backport ARM Cortex A9 scheduling changes
□ 605059 Merge 4.4.5
Blueprint Assignee
Initial delivery of Linaro GCC 4.4 ams
Cross Compiler Packages hrw
Action Items from this Meeting
• TBD
Action Items from Previous Meeting
• ACTION: Richard to ask the GCC developers on IRC what the status of 4.4.5
is
• ACTION: Andrew to merge 4.5.1 and the Firefox fix by 2010-08-17
• ACTION: Ulrich to ticket GDB items
• ACTION: Michael to understand whiteboards as a way of organising features
...are here:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-08-11
-- Michael
Attendees
• Name Email IRC Nick
Yao Qi yao.qi(a)linaro.org yao
Ulrich Weigand ulrich.weigand(a)linaro.org uweigand
Richard Earnshaw richard.earnshaw(a)arm.com rearnshaw
Peter Maydell peter.maydell(a)linaro.org pm215
Michael Hope michael.hope(a)linaro.org michaelh
Julian Brown julian(a)codesourcery.com jbrown
Chung-Lin Tang cltang(a)codesourcery.com cltang
Agenda
• Stand up call
Blueprint Assignee
Initial delivery of Linaro GCC 4.4 ams
Cross Compiler Packages hrw
Action Items from this Meeting
• ACTION: Michael to email his configure options to Julian
• ACTION: Richard, if he's going, will set up
Action Items from Previous Meeting
Minutes
• Julian
□ Continues to merge 4.4 into the CSL 4.5 branch
□ Andrew is pushing changes out to the Linaro branch
□ Can't reproduce some failures
□ ACTION: Michael to email his configure options to Julian
• Ulrich
□ Has been ticketing the GDB features and faults for him and Yao to work
on
• Yao
□ GCC patch has been approved upstream
□ Going through the GDB tickets
• Chung-Lin
□ Doing some last optimisations
□ Preparing to send the patch upstream
• Peter
□ Looking into virtio
□ Amit has been working on similar
□ Michael suggests getting on IRC and asking
□ Almost has approval to work publicly
• Richard
□ Issues were found with the sync fixes. These are being reworked
• Michael
□ Looked at benchmarks and string routines
□ Want the group to start pulling the A9 changes down
□ And other A9 pipeline changes
□ Richard suggests pulling Marcus's 4.4 sync primitives fix in too
• GCC Summit
□ Overlaps with UDS
□ Ulrich suggests a BoF session on the ARM toolchain
□ ACTION: Richard, if he's going, will set up
□ Not enough material for a paper at the moment
The next meeting is the stand up call on Friday.
I wanted to point somebody at this mailing list (linaro-toolchain),
and I noticed that it wasn't listed here:
https://wiki.linaro.org/GettingInvolved
which is the first place I tried. Is that deliberate, or just an oversight?
thanks
-- Peter Maydell
Hi Michael,
here's a list of features and bugfixes that can serve as a basis for
Monday's discussion on what we should be working on in GDB in the future.
The goal of the list is to fix currently known problems with GDB, including
the testsuite, as well as bringing GDB on ARM in line with other platforms
by adding required back-end support to enable common GDB features that are
already supported elsewhere. It does not yet include anything completely
new that we'd develop specifically for ARM.
If anybody knows of a feature or bugfix I've missed, please let me know!
Features/fixes involving kernel support:
- hardware watchpoint support
- Neon registers in core files
- Interrupted syscall handling
- PTRACE_ATTACH disabled ?
Features/fixes involving GCC support:
- backtrace from abort (missing LR save)
- debug info for args in varargs routine
GDB features/fixes:
- prologue parsing on Thumb-2
- displaced stepping on Thumb
- syscall tracing support
- improved epilogue detection (fix software watchpoints)
- multi-threaded debugging inferior crashes
- multi-threaded Thumb/ARM state tracking
- signal handler stepping
- inferior call fixes
- misc. other testsuite regressions
gdbserver features/fixes
- Neon register support
- fast tracepoints
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
= Friday 7th August 2010 =
== This month's meetings ==
<<MonthCalendar(WorkingGroups/ToolChain/Meetings,2010,08,,,,WorkingGroups/ToolChain/MeetingTemplate)>>
== Attendees ==
||<rowbgcolor="#333333" rowstyle="color: white; font-weight:
bold;"style="text-align: center;">Name ||<style="text-align:
center;">Email ||<style="text-align: center;">IRC Nick ||
|| Andrew Stubbs || andrew.stubbs(a)linaro.org || ams ||
|| Chung-Lin Tang || cltang(a)codesourcery.com || cltang ||
|| Julian Brown || julian(a)codesourcery.com || jbrown ||
|| Marcin Juszkiewicz || marcin.juszkiewicz(a)linaro.org || hrw ||
|| Michael Hope || michael.hope(a)linaro.org || michaelh ||
|| Peter Maydell || peter.maydell(a)linaro.org || pm215 ||
|| Richard Earnshaw || richard.earnshaw(a)arm.com || rearnshaw ||
|| Ulrich Weigand || ulrich.weigand(a)linaro.org || uweigand ||
|| Yao Qi || yao.qi(a)linaro.org || yao ||
== Agenda ==
* Stand up meeting
== Action Items from this Meeting ==
== Action Items from Previous Meeting ==
== Minutes ==
* Andrew:
* Continues to push the 4.5 patches
* Seen one regression so far which he is investigating
* Continues to approve 4.4 merge requests
* Spinning 2010.08 release today
* Will give tarball to michaelh to also build
* Yao:
* Continuing on bug fixes and merges
* [[LP:602174]]: Problem has gone away, to confirm on release
* [[LP:602288]]: Leave test in-place. Change was backed out
* [[LP:602190]]: will set options in test case
* Ulrich:
* Investigating test failures
* getfem++ failure is triggered in wrapped library
* May be due to a different environment
* Will investigate further
* Richard:
* Cortex-A9 patches sent upstream
(http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00481.html)
* `__sync` primitive patches sent upstream
(http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00492.html)
* Peter
* Introduced himself
* Julian:
* Porting patches from 4.4 to the 4.5 CSL branch is ongoing
* Continuing with the misalignment patch issues
* Looking into failures on the 4.5 CSL branch
* Michael asked Julian to look at the 4.5 tests on Linaro as well
* Marcin:
* All stages of the cross compiler are done
* Sent a link to the PPA over IRC
* Mentioned that configure objcopy issue. Michael says that the
TCWG will take it over
* Vacation is coming up in about a week
* Chung-Lin:
* Now running the libffi test suite
* Four regressions so far
* Andrew is organising access to the benchmark suite
* Michael: do want to be able to reproduce these results in the
future. Please record everything needed to reproduce (compiler, host,
environment, scripts, etc.)
* Michael:
* Extending the builds further. Added eglibc.
* Thinking about what's next
* Discuss on Monday