== Last Week ==
* Reached the point with understanding libunwind where I can begin
writing patches for parsing unwind information out of .ARM.exidx and
.ARM.extab ELF sections.
== This Week ==
* Begin writing support for ARM-specific unwind information to libunwind.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== Linaro GCC ==
* Continued looking at big-endian/quad-vector patch: attempted to
figure out the proper semantics for vec_extract in big endian mode
(about 1 day). Put on hold temporarily to work on lp675347, QT failing
to build due to constraint failure in inline asm statements used for
atomic operations: found the patch which introduced the failure, and
suggested a workaround to the OP. Came up with a plausible-looking
patch, and started testing it, after spending some time trying to
figure out why ARM Linux mainline doesn't build at present. Patch sent
upstream.
Hi Richard,
As per the discussion at this mornings call; I've reread the TRM and I
agree with you about the LSLS being the same speed as the TST. (1 cycle)
However as we agreed, the uxtb does look like 2 cycles v the AND 1 cycle.
On the space v perf theme, one thing that would be interesting to know is
whether there are any icache/issue stage limitations;
i.e. if I have a stream of 32-bit Thumb-2 instructions that are all listed
as 1 cycle and are all in i-cache, can they be fetched
and issued fast enough, or is there a performance advantage to short
instructions?
Dave
LP:663939 - Thumb2 constants
* Continued testing, found a few bugs. Tidied a few bits up.
* Wrote some new testcases to go with the patch.
LP:618684 - ICE
* Begun looking at this one. So far I can't reproduce it. I have a
debuggable native toolchain building, but it'd been delayed by hardware
issues.
In the course of testing I discovered that the ARM FSF config wasn't
testing the right thing, so begun work on a new, more appropriate FSF
build/test config for Linaro work.
Also found the the SD card rootfs in my IGEPv2 board was corrupted. I've
restored it from backup, and now it's working once more.
== Linaro and upstream GCC ==
* Linaro launchpad issues:
- LP #672833, x64-64 varargs regression: after testing pushed bzr branch
for merging.
- LP #634738, inefficient low bit extraction: some discussion with Yao.
- LP #618684, ICE when building ziproxy: looked into and quickly found
not reproducible anymore of Linaro 4.5 trunk.
* Worked on some GCC bugzilla PRs:
- PR44557, ICE in Thumb-1 secondary reload: this should be fixed by a
change of the scratch operand constraint of "reload_inhi" from "r" to
"l". Interesting to note that this was from the
merged-arm-thumb-backend-branch merge, from about 10 years ago.
- PR46508: libffi fails to build on VFP asm instructions, seems to need
a '.fpu vfp' directive. Probably missed earlier because my toolchain was
configured with --with-fpu=vfp.
- PR45416: 4.6 code generation regression on ARM, after expand from SSA
changes. Looking at this currently.
== This week ==
* Look at Linaro issues with higher priority.
* Continue working on GCC PRs.
== Linaro GCC ==
* Merge ldm/stm patch to Linaro 4.5 tree.
Found two regressions on the last minute of proposing merge request in
pass ce3. Revert one of ldm/stm patches about ifcvt. Complete testcase
in branch.
* Try Richard E.'s "TST to LSLS transformation" patch on cortex-a9 with
FFMPEG. No speed improvements.
* Various Linaro GCC Bug fixing.
** LP:634738
Follow the fix to GCC PR40697, and create a new patch, which emits
extzv or shift rather than loading constants in some cases. Tested on
FSF GCC trunk, and no regression. However, found a regression by eyes
in pr44999.c, in which, ubfx (4byte) is generated, rather than uxth
(2byte). uxth is produced by combiner from ashift and lshiftrt. During
reading arm.c, find that constant handling in thumb2 should be improved
to some extent.
** LP:633243
Re-implement regrename improvement, as Eric B. suggested in
gcc-patches. Spend some time on understanding API in GCC related to
hard-reg. Tested on x86_64-linux. No regression.
** LP:638935
Update my tree to FSF trunk, and find RTL seq for fldm/fstm peephole
disappears due to fix to PR45722. Extend arm-ldmstm.ml to support vfp.
Peephole and RTL patterns for vfp are done. Will revise
arm.c:{load,store}_multiple_sequence to accept vfp data.
Fix a bug in ldm/stm peephole when starting offset is negative.
== This week ==
* LP:634738: Figure out how uxth is produced by combiner.
* LP:633243: Test it on ARM.
* LP:638935: Revise {load,store}_multiple_sequence to accept vfp data.
--
Yao (齐尧)
Re my recent email "Upstream GCC feature freeze", I think we're agreed
that we need to create a branch that tracks GCC 4.6 development, but has
our own performance improvements included. The question is where to host it?
Option 1: Launchpad/bzr
Pros:
* We need no permission to do it
* The branch will naturally evolve into our 4.6 release series in time.
* The 3-way merge works well (if slowly)
* We can include patches that we have no intention of posting upstream
ever
* Our patch tracker will Just Work.
* Merge requests will be available.
Cons:
* Bzr ;)
* It's hidden away from the view of most GCC developers
Option 2: GCC SVN branch
Pros:
* We can work in the open, submitting patches via gcc-patches, as usual
* The final merge to GCC trunk (come stage 1) will be eased, a little
Cons:
* We can't really apply anything we want just for ourselves
* we may end up maintaining an LP branch shadowing the svn branch
* When we do want to do 4.6 in LP, we'll have to backport all our
patches from 4.7, and this may no longer be straightforward.
* Write permissions not clear.
* Although I think you can just go ahead and do it?
OK, so I'm sure I've missed some big ones. Please discuss! ;)
I think the big question here is, when will we start wanting to make
(unstable/experimental) Linaro GCC 4.6 releases? If we want to do it
early, then we'll have no choice but to have an LP branch to release from.
Andrew
Like everyone from Toolchain WG I will share my activites in last week:
1. cross compilers for archive
- discussed with doko about dropping update-alternatives use
- wrote gcc-defaults-armel-cross 1.4 which does proper symlinks for cross
compilers
- wrote gcc-4.5-armel-cross 1.41 which removes update-alternatives support
- wrote gcc-4.4-armel-cross 1.37 which removes update-alternatives support
- wrote armel-cross-toolchain-base 1.53 which has all updates which I had
- sent all of them to Steve for review
Status of changes:
- default version of armel cross compiler will be 4.5 like it is in Natty
- both 4.4 and 4.5 will be provided as it is for native
- any traces of update-alternatives use should be removed
Needs to be done:
- adding conflicts on older cross compilers to gcc-defaults-armel-cross
Order of upload to archive:
- armel-cross-toolchain-base
- gcc-4.5-armel-cross
- gcc-4.4-armel-cross
- gcc-defaults-armel-cross
2. Checked few old bugs do they still apply:
- Bugs #646729, #637454, #671455 are done with armel-cross-toolchain-base 1.52
(landed in maverick-proposed)
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
Short week.
Finally got external hard drive for my beagle - makes it sanely possible to
natively build things.
Got eglibc cross built (Thanks to Wookey for pointing me in the right
direction with the magic incantation of dpkg-buildpackage -aarmel
--target=binary) and
easily rebuilding . I have a version with the neon version of my memset
built into it - it doesn't seem to make a noticeable difference to my
ghostscript benchmark
though.
Panda's aren't likely to turn up until mid December; arranging borrowing
an A9 is turning out to be difficult, but it looks like we should be able to
get access to
the one in the London datacentre - although it has a disc problem at the
moment.
I did manage to get a colleague to try my tests on his own Toshiba AC-10
(Tegra-2 - no Neon); the
graphs had approximately the same shape as my previous Panda tests. Memchr
looked pretty
good on there.
Also trying to look at the sign off I need for various libc access.
Dave
I mainly worked on the atomic memory operations blueprint/item:
* posted an updated patch for #643171 on the libc-ports ml after running the
glibc testsuite natively on the vexpress
* continued to learn about the ARM instructions involved :)
* started to write some gcc testcases that scan the asm output of the __sync
builtins (mainly to detect differences between the gcc versions - not sure how
useful those tests would be for upstream as the sequences may easily change)
Ken
RAG:
Red:
Amber:
Green:
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 |
complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
Progress:
* Most of this week spent at the Meego conference in Dublin.
This seemed to be a rather apps-developer centric conf,
with not much of interest on the low-level side. There were
a few useful talks/conversations, though.
* Intel were giving away Atom-based netbooks to all attendees;
that's a lot of developers who are going to be testing and
optimising their apps for Atom devices rather than ARM...
* qemu: looked at https://bugs.launchpad.net/bugs/668799 ;
we don't seem to be taking the right lock before we manipulate
the graph of translation blocks. I have a fix which stops the
reported segfault, but the code has a number of "XXX not thread
safe" and "FIXME: not SMP safe" comments and generally doesn't
seem to have a coherent locking design :-(
* qemu: sent some minor patches upstream:
+ enable iwmmxt coprocessors in user mode
+ remove some unused functions from target-arm and target-sparc
+ fix a failure to build bug in a makefile
* qemu: some review of a patch to fix semihosting SYS_GET_CMDLINE
Plans
- qemu consolidation
- post-toolchain-review, sort out some milestones for
this report
Absences: (complete to end of 2010)
Thu/Fri 25-26 Nov; Fri 17 Dec - Tue 4 Jan inclusive.
(Dallas Linaro sprint 9-15 Jan.)
== This week ==
Started looking at STT_GNU_IFUNC support in BFD. There were a couple
of janitorial changes I needed to make in order to prepare elf32-arm.c
for the main patch. I tested those separately and submitted them upstream:
http://sourceware.org/ml/binutils/2010-11/msg00330.htmlhttp://sourceware.org/ml/binutils/2010-11/msg00331.html
I've now finished a prototype implementation of the STT_GNU_IFUNC
support itself. It wasn't as mechanical as I'd originally assumed,
which was nice.
Tests that I've run by hand seem to be doing the right thing.
I've now started writing tests for the testsuite (meaning:
I've completed 1 test so far).
== Next week ==
* Add more tests, including Thumb coverage.
* Start on the libc changes.
Richard
Doing an allmodconfig build on the kernel, I get the following:
CC arch/arm/kernel/asm-offsets.s
In file included from
/home/rob/proj/git/linux-2.6-dt/include/linux/kernel.h:12,
from
/home/rob/proj/git/linux-2.6-dt/include/linux/sched.h:54,
from
/home/rob/proj/git/linux-2.6-dt/arch/arm/kernel/asm-offsets.c:13:
/usr/lib/gcc/arm-linux-gnueabi/4.4.5/include/stdarg.h:40: internal
compiler error: Segmentation fault
It occurs on Maverick 4.4, 4.5 and CodeSourcery 2009Q1 cross toolchains.
It's confirmed by Codesourcery here:
http://www.codesourcery.com/archives/arm-gnu/msg03719.html
What's the status on this issue? I didn't see anything in Linaro gcc
bugs that looks related.
Rob
The STT_GNU_IFUNC blueprint:
https://wiki.linaro.org/WorkingGroups/ToolChain/Specs/Binutils-STT_GNU_IFUNC
says "the ARM EABI will be updated to support STT_GNU_IFUNC's requirements".
I suppose the most obvious thing that needs to be defined is the relocation
number for R_ARM_IRELATIVE. What's the best way of handling that?
The main options seem to be:
1. Reserve a relocation number with ARM first (129?).
2. Go ahead and implement it without having the EABI updated.
See whether the results are good before deciding whether
to bless it in the EABI.
3. Since STT_GNU_IFUNC is a GNU-specific, treat R_ARM_IRELATIVE
as GNU-specific too, and pinch one of the R_ARM_PRIVATE relocs.
I'm pretty sure (3)'s not the way to go, but I was aiming for
completeness. :-)
Richard
Hi,
On 17 November 2010 05:35, Michael Hope <michael.hope(a)linaro.org> wrote:
> 1. How easy is it to frequently merge in SVN? It used to be terrible
> as you had to manually track the merges. These days can you do a 'svn
> merge trunk' and have it just work?
I asked Mike Meissner to answer this question. Mike is very experienced in
GCC and GCC SVN branch management. I am attaching his reply.
Ira
I sent this recently to ppc64-toolchain(a)linux.ibm.com on how to use
svnmerge to manage branches:
This script (also ~meissner/meissner/bin.sh/svnmerge) is what I use to
update svn directories, such as ibm-gcc-4_5-branch. I think I originally
got it
from Ben E. and it may be in the contrib directory.
Typically the way I start a branch, such as my normal power7-meissner
branch, I do the following:
$ export TRUNK="svn+ssh://@gcc.gnu.org/svn/gcc/trunk"
$ export BNAME="power7-meissner"
$ export BRANCH="svn+ssh://@gcc.gnu.org/svn/gcc/branches/ibm/$BNAME"
$ export SRC="$HOME/fsf-src"
$ svn delete -m"delete old branch" $BRANCH
$ svn copy -m"Clone new branch" $TRUNK $BRANCH
$ cd $SRC
$ svn co $BRANCH
$ cd $BNAME
$ svnmerge init
$ svn update # this is sometimes needed
$ svn commit -m'Create svnmerge init info'
$ export REV="xxxx" # substitute subversion id for xxxxx
$ echo "power7-meissner branch, based on $REV." > gcc/REVISION
$ touch gcc/ChangeLog.power7
$ <edit gcc/ChangeLog.power to create initial contents>
$ svn add gcc/ChangeLog.power gcc/REVISION
$ svn commit -m'Add REVISION to branch'
In particular, creating GCC/REVISION allows you to tell what subversion
revision the source is based against. You can find the information via:
$ svn propget svnmerge-integrated
but it is a lot easier if you have a compiler tree to do gcc -v. After you
do a propget, you will need to do a svn update.
In this case, I use gcc/ChangeLog.power7 to hold the ChangeLog entries
local to the branch. That way I can see a summary of the changes, but not
pollute
the normal ChangeLog files.
To do merges, you need to make sure that all local changes are checked into
the branch. Then do:
$ cd $SRC/$BNAME
$ svnmerge merge
$ <edit gcc/REVISION and ChangeLog.power7 to indicate merge>
$ <test merged files, if satisified, check them in>
$ export REV="xxxx" # substitute subversion id for xxxxx
$ svn update # just in case
$ svn commit -m"Update to subversion id $REV"
Now, to create a patch file do, make sure the files are checked in:
$ cd $SRC/$BNAME
$ export PATCHFILE="$HOME/patches/mypatch.patch01"
$ <make ChangeLog entries in $PATCHFILE>
$ svn diff --old $TRUNK --new . -r $REV >> $PATCHFILE
$ <delete ChangeLog.power7, REVISION, property changes from $PATCHFILE>
$ submit patch
To see if there are changes to be merge in:
$ svnmerge avail
For example on the ibm-gcc-4_5-branch, the following changes are available
to be merged in: 164657-166510 when I originally wrote this message on the
9th
of November, and Peter has subsequently updated the merge.
I put the folliwng in ~/.subversion/config to provide my own diff command:
### Set diff-cmd to the absolute path of your 'diff' program.
### This will override the compile-time default, which is to use
### Subversion's internal diff implementation.
diff-cmd = /home/meissner/bin.sh/svndiff
Every so often, I find svnmerge misses, for example in deleting
directories.
It is helpful to do a diff from the mainline every so often to make sure
you are not missing newly created files or still are keeping older files or
just missed a change.
I'll include svndiff for the smarter svndiff command and mrm-changelog.el
that looks for the ChangeLog.<name> files I use in different branches. Feel
free to contact me to clarify some stuff.
--
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner(a)linux.vnet.ibm.com fax +1 (978) 399-6899
(See attached file: svnmerge)(See attached file: svndiff)(See attached
file: mrm-changelog.el)
Hi there,
There's a recording of this mornings public plan review available on
the wiki at:
https://wiki.linaro.org/Releases/1105/PublicPlanReview
Also included is a copy of the slides and supporting documents. Might
be interesting for those who missed it.
-- Michael
A heads up. I'd like to have a brainstorming session on potential
Thumb-2 performance improvements in GCC. Think about what you'd like
in such a session, and what preperation should be done, and we can
discuss the discussion (heh) on Monday.
-- Michael
Hi there,
I noticed that there's a QEMU users forum at:
http://adt.cs.upb.de/quf/
and that the abstract submission phase is still open, and closes
November 28th. It would be great to see some participation there and
help identify other key people interested in using and improving QEMU.
--
Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko
Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko
Zach Welch --
== Last Week ==
* Continued working on libunwind support. Trying to figure out why my
signal frame detection doesn't work as expected.
* Kept pace with the ltrace tree, testing recent patches on ARM.
== This Week ==
* Continue to work on libunwind signal frame detection.
Julian Brown --
== Linaro GCC ==
* Looked at issues #663198 (double-precision register expected) --
which was already fixed on the linaro branch, but the bug was reported
against a version just prior to that, and #667490 -- which involved a
possible problem with the NEON "load 0.0" patch. Experimented for a
while with the latter, but could not find anything wrong.
Followed up upstream, and requested a stand-alone test case.
* Worked on a proper solution to the VMOVN-in-big-endian-mode problem,
discovering that several other quadword-register operations were
similarly broken in the process. WIP patch sent to linaro-toolchain for
discussion, but it needs a little more work before it can be applied.
Peter Maydell --
Progress:
* qemu: more cleanup of signal handler VFP patchset;
I think I just need to add iwmmx support and it's good
* qemu: VCVT: found yet another bug, did final patchset
cleanup: submitted to upstream list [8 patch series]
* qemu: submitted a trivial patch to fix a problem where
__get_user/_put_user macros had an unnecessary local var
which could clash with a var being used by the macro user
* set up a tree on git.linaro.org which we can use for
a branch to make pull requests for ARM qemu fixes
* did a rough estimate of time to do an Eagle qemu model
(6 months + testing/bug fixing time)
Issues:
* lost some time to a problem where Linux VMs stopped being
able to talk to the LDAP server; however I have a workaround
and IT are investigating
Meetings:
* toolchain, toolchain standup, pdsw-tools, PD doughnuts
Plans
- attend Meego conference in Dublin (Nov 15-18 inc travel)
http://conference2010.meego.com/
- start on qemu consolidation by upstreaming various ARMv7
correctness fixes
Andrew Stubbs --
== GCC 4.5 ==
* Continued working on LP:663939.
* I still have not worked out how best to fix the constant
propagation problem that has been thwarting my optimization patch,
however I think I understand it better now.
* I have started on adding replicated pattern support to the
constant splitting. Initial results were good, but I discovered that I
had to rearrange the code somewhat to get the cost estimation and
negative/inverted constant support working correctly. So far, I have
it successfully using 16-bit replication pattern constants for
set/add/subtract operations. Other operations appear broken at the
moment, but it's almost certainly just a few tweaks required.
* TODO: Add support for 32-bit replicated pattern constants. Adjust
some of the other two-instruction constant generation techniques to
let them fall through to this new code, where it would be beneficial.
* Pushed the latest set of GCC patches into Linaro GCC 4.5.
Chung-Lin Tang --
== Linaro GCC ==
* Linaro #672833, one batch of my backports of Bernd's postreload
patches exposed some varargs regressions for x86-64, was reverted by
Michael. Tested the compiler and found it was fixed on mainline
rev.162384. Backporting this revision plus the postreload patches
fixed the regressions; x86-64 bootstrap also verified okay. There is
however another PR45027 fix that was needed on trunk, but needs a bit
more clarification if needed on a 4.5 compiler.
* Linaro #641397, CS issue #6753: bitfield optimization. Patch tested
without regressions, posted for CS internal review, should soon push
for Linaro merge.
* Started looking further at some GCC DF, IRA internals.
== This week ==
* Look at more Linaro issues.
* Maybe start looking at some GCC bugzilla PRs.
* There is a local ARM technical event in Hsinchu on Thursday, might
go and look around.
Yao Qi --
== Linaro GCC ==
* Mainline patch backport to Linaro 4.5.
** Patch "Fix an if statement in arm_rtx_costs_1". Verified on Linaro
4.5. 0.1% smaller on size, and 0.2% faster on speed. Merged to Linaro
4.5 by Andrew S.
** Try Nathan F's ifcvt-cond-move patch on cortex-a8 with -O2/O3. No
improvements on speed/size for EEMBC.
** Bernd's ldm/stm patch. Analyze the reason of regression on Linaro
4.5. Found something wrong in IRA rtl dump, and spend sometime on
understanding IRA rtl dump log. Thanks to Chung-Lin, I realize that IRA
dump is correct, and look back to ARM RTL patterns on ldm/stm. Compared
RTL patterns in 4.5 and 4.6, found some difference. Regenerate
ldmstm.md for Linaro 4.5 after update arm-ldmstm.ml a little bit.
Regressions goes away!
No speed improvement, but code is smaller by 0.2% in EEMBC. Still
prefer to merge to Linaro 4.5.
ocaml is an interesting language, but not easy to learn and read in vim.
* Some discussion on Linaro development process.
* My regrename improvement patch (re. LP:633243). Communicate with
Eric Botcazou back and forth, but current patch is still too
target-dependent to him, as a Middle-End maintainer. Still need some
improvements.
* Build FSF GCC trunk. CLoog requirement in configure is wrong, revert
configure to previous version, and then pass the version checking during
gcc configure.
Hi there. Could everyone in the toolchain working group start sending
their activity reports to this list please? Put [ACTIVITY] at the
start of the subject line so that they can be filtered.
Ta,
-- Michael
Hi there. Attached are the status reports from the Toolchain WG
members for last week.
-- Michael
Ken Werner --
Hi Michael,
* got access to the internal wiki/calendar/email :)
* continued to setup the borrowed vexpress board
* upgraded to the Linaro 10.11 release
* encountered various issues until I found that the /etc/hosts is empty
(#674090)
* learned that the SD card issue is a known problem (#632798)
* the network interface sometimes dies if stressed (Matt was able to
reproduce this)
* the disabled CONFIG_SWAP is being tracked as #672656
* sometimes the entire system hangs (when under heavy load?)
* David noticed that /proc/cpuinfo lacks neon support (but his string
benchmark/testcase ran fine)
* wondering why the kernel reports only about 800 BogoMIPS while it's around
2k on the panda board
* started to work on the atomic memory operations item
* identified the relevant GCC patches
* still looking for a good way to verify the GCC support
* posted a patch on the glibc-ports ml with regard to #643171
David A Gilbert --
I managed to get to try Ken Werner's Versatile Express board with an A9MP
tile; the shape
of the graphs matches that from the Panda, but the raw performance is down
by a factor of about
3 - I'm guessing it's clocked lower for some reason.
It confirms however that the Neon behaviour I was seeing with memset is
not Panda/OMAP4 specific;
no one has replied to my post to linaro-toolchain. It's a difficult
situation in that my fastest memset on
Beagle is with Neon, and my fastest on v9 is without Neon - what would you
select on?
I've just finished writing memchr tests and my first crack at a faster
version; I realised I could use the same
trick that I had used for strlen and it works nicely - it seems to be about
50% faster than the libc version;
I've not tested against any other versions yet.
Paul Mckenney hasn't replied yet about the OSSC stuff, but apparently
he's out travelling and back next
week; so I'll catch him then.
I tried preloading my faster memset into ghostscript, but found it was
blatantly ignoring it - I think the memset
is being called from somewhere inside libc; I managed to get xdeb to cross
build me a libc but haven't yet got my
changes into it.
My order for a USB hard drive for my beagle seems to have been delayed by
the supplier; I'm pushing this but
it's starting to be a bit of a pain.
Richard Sandiford --
== Last Week ==
* Pinged my GAS fix for Thumb PLT branches to locally-defined symbols.
Committed it to binutils trunk and 2.21 branch after approval. This
fixes the libgcc.so build failure that I was seeing with GOLD.
* Worked on a patch to fix GOLD's handling of non-function references
to weak undefined symbols. This ended up touching every backend
(i386, x86_64, ARM, Power and SPARC) and was quite invasive, so it
took a while in the end. Committed to binutils trunk after approval.
* Ran more tests, both with -marm and -mthumb. I'm getting identical
GCC test results (including gfortran and objc) for GOLD and BFD ld, so
I think we're at the stage where GOLD is a viable replacement for the
BFD linker.
== Next Week ==
* I'll start looking at the IFUNC support.
* I'll take another look at launchpad bug 665598.
Peter Maydell --
Progress:
* qemu: more cleanup of signal handler VFP patchset;
I think I just need to add iwmmx support and it's good
* qemu: VCVT: found yet another bug, did final patchset
cleanup: submitted to upstream list [8 patch series]
* qemu: submitted a trivial patch to fix a problem where
__get_user/_put_user macros had an unnecessary local var
which could clash with a var being used by the macro user
* set up a tree on git.linaro.org which we can use for
a branch to make pull requests for ARM qemu fixes
* did a rough estimate of time to do an Eagle qemu model
(6 months + testing/bug fixing time)
Issues:
* lost some time to a problem where Linux VMs stopped being
able to talk to the LDAP server; however I have a workaround
and IT are investigating
Meetings:
* toolchain, toolchain standup, pdsw-tools, PD doughnuts
Plans
- attend Meego conference in Dublin (Nov 15-18 inc travel)
http://conference2010.meego.com/
- start on qemu consolidation by upstreaming various ARMv7
correctness fixes
Ira Rosen --
Here is this week report:
1. BeagleBoard installed, now "playing" with it
2. Continued to work on auto-detection of vector size
3. Looked into mixed vector sizes
4. Learning about vld and vst instructions
It looks like I won't be able to participate in Wed calls, since I am alone
with the kids on Wednesday evenings.
Hi all,
I've hit a probable assembler bug trying to build a Thumb-2 kernel:
Trying to assemble the attached file, I get:
arch/arm/kernel/relocate_kernel.S: Assembler messages:
arch/arm/kernel/relocate_kernel.S:10: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
arch/arm/kernel/relocate_kernel.S:11: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
arch/arm/kernel/relocate_kernel.S:58: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
arch/arm/kernel/relocate_kernel.S:59: Error: invalid offset, value too
big (0xFFFFFFFFFFFFFFFC)
The code appears correct and resonable, except that there should be a
.align directive before the data words at the end of the file (but
adding this doesn't fix the error)
Assembling in ARM (i.e., without -mthumb), or deleting the .globl
lines associated with the affected target symbols, the problem goes
away.
I believe this may be already by tracked by CodeSourcery as is issue #8775 (?)
Has anyone hit this issue before? Is it fixed upstream?
Any help much appreciated.
Cheers
---Dave
Hi,
I've been looking at some basic libc routine optimisation and have a
curious problem with memset and wondered if
anyone can offer some insights.
Some graphs and links to code are on
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemset
I've written a simple memset in both a with and without Neon variety and
tested them on a Beagle(C4) and a Panda
board and I'm finding that the Neon version is faster than the non-neon
version (a bit) on the Beagle but a LOT slower on the
Panda - and I'd like to understand why it's slower than the non-neon version
- I'm guessing it's some form of cache interaction.
The graphs on that page are all generated by timing a loop that repeatedly
memsets the same area of memory; the X axis
is the size of the memset. Prior to the test loop the area is read into
cache (I came to the conclusion the A8 didn't write
allocate?). There are two variants of the graphs - absolute in MB/s on Y,
and a relative set (below the absolute) that
are relative to the performance of the libc routines. (The ones below those
pairs are just older versions).
if you look at the top left graph on that page you can see that on the
Beagle (left) my Neon routine beats my Thumb routine
a bit (both beating libc). If you look on the top right you see the Panda
performance with my Thumb code being the fastest and generally
following libc, but the Neon code (red line) topping out at about 2.5GB/s
which is substantially below the peak of the libc and ARM code.
The core loop of the Neon code (see the bzr link for the full thing) is:
4:
subs r4,r4,#32
vst2.8 {d0,d1,d2,d3}, [ r3:256 ]!
bne 4b
while the core of the non-Neon version is:
4:
subs r4,r4,#16
stmia r3!,{r1,r5,r6,r7}
bne 4b
I've also tried vst1 and vstm in the neon loop and it still won't match the
non-Neon version.
All suggestions welcome, plus I'd appreciate if anyone can suggest which
particular limit it's hitting - does
anyone have figures for the theoretical bus and L1 and L2 write bandwidths
for a Panda (and Beagle) ?
Thanks in advance,
Dave
Hi there. I've uploaded a draft of the slides and notes for next
weeks public review at:
http://bazaar.launchpad.net/~linaro-toolchain-wg/+junk/publicreview1105/fil…
'Toolchain Public Review 11.05.odp' is a set of slides I'll talk to.
The first 15-20 minutes will go through these to describe our focus
and goals and how they tie together the blueprints and priorities.
The rest of the session will go through the current blueprints and
priorities. See:
Toolchain Blueprints (short).pdf
for the summary version and:
Toolchain Blueprints (long).pdf
for the long version. The long version is interesting if you can't
find a particular tool or technology. It may be small enough to be
called out as a single work item.
These are only a draft, but I realised I haven't shared the plans with
the rest of the group very well and Monday's meeting won't be the
best.
I'm on holiday tomorrow but feel free to send me any comments,
-- Michael