Hi Richard,
As per the discussion at this mornings call; I've reread the TRM and I
agree with you about the LSLS being the same speed as the TST. (1 cycle)
However as we agreed, the uxtb does look like 2 cycles v the AND 1 cycle.
On the space v perf theme, one thing that would be interesting to know is
whether there are any icache/issue stage limitations;
i.e. if I have a stream of 32-bit Thumb-2 instructions that are all listed
as 1 cycle and are all in i-cache, can they be fetched
and issued fast enough, or is there a performance advantage to short
instructions?
Dave
LP:663939 - Thumb2 constants
* Continued testing, found a few bugs. Tidied a few bits up.
* Wrote some new testcases to go with the patch.
LP:618684 - ICE
* Begun looking at this one. So far I can't reproduce it. I have a
debuggable native toolchain building, but it'd been delayed by hardware
issues.
In the course of testing I discovered that the ARM FSF config wasn't
testing the right thing, so begun work on a new, more appropriate FSF
build/test config for Linaro work.
Also found the the SD card rootfs in my IGEPv2 board was corrupted. I've
restored it from backup, and now it's working once more.
== Linaro and upstream GCC ==
* Linaro launchpad issues:
- LP #672833, x64-64 varargs regression: after testing pushed bzr branch
for merging.
- LP #634738, inefficient low bit extraction: some discussion with Yao.
- LP #618684, ICE when building ziproxy: looked into and quickly found
not reproducible anymore of Linaro 4.5 trunk.
* Worked on some GCC bugzilla PRs:
- PR44557, ICE in Thumb-1 secondary reload: this should be fixed by a
change of the scratch operand constraint of "reload_inhi" from "r" to
"l". Interesting to note that this was from the
merged-arm-thumb-backend-branch merge, from about 10 years ago.
- PR46508: libffi fails to build on VFP asm instructions, seems to need
a '.fpu vfp' directive. Probably missed earlier because my toolchain was
configured with --with-fpu=vfp.
- PR45416: 4.6 code generation regression on ARM, after expand from SSA
changes. Looking at this currently.
== This week ==
* Look at Linaro issues with higher priority.
* Continue working on GCC PRs.
== Linaro GCC ==
* Merge ldm/stm patch to Linaro 4.5 tree.
Found two regressions on the last minute of proposing merge request in
pass ce3. Revert one of ldm/stm patches about ifcvt. Complete testcase
in branch.
* Try Richard E.'s "TST to LSLS transformation" patch on cortex-a9 with
FFMPEG. No speed improvements.
* Various Linaro GCC Bug fixing.
** LP:634738
Follow the fix to GCC PR40697, and create a new patch, which emits
extzv or shift rather than loading constants in some cases. Tested on
FSF GCC trunk, and no regression. However, found a regression by eyes
in pr44999.c, in which, ubfx (4byte) is generated, rather than uxth
(2byte). uxth is produced by combiner from ashift and lshiftrt. During
reading arm.c, find that constant handling in thumb2 should be improved
to some extent.
** LP:633243
Re-implement regrename improvement, as Eric B. suggested in
gcc-patches. Spend some time on understanding API in GCC related to
hard-reg. Tested on x86_64-linux. No regression.
** LP:638935
Update my tree to FSF trunk, and find RTL seq for fldm/fstm peephole
disappears due to fix to PR45722. Extend arm-ldmstm.ml to support vfp.
Peephole and RTL patterns for vfp are done. Will revise
arm.c:{load,store}_multiple_sequence to accept vfp data.
Fix a bug in ldm/stm peephole when starting offset is negative.
== This week ==
* LP:634738: Figure out how uxth is produced by combiner.
* LP:633243: Test it on ARM.
* LP:638935: Revise {load,store}_multiple_sequence to accept vfp data.
--
Yao (齐尧)
Re my recent email "Upstream GCC feature freeze", I think we're agreed
that we need to create a branch that tracks GCC 4.6 development, but has
our own performance improvements included. The question is where to host it?
Option 1: Launchpad/bzr
Pros:
* We need no permission to do it
* The branch will naturally evolve into our 4.6 release series in time.
* The 3-way merge works well (if slowly)
* We can include patches that we have no intention of posting upstream
ever
* Our patch tracker will Just Work.
* Merge requests will be available.
Cons:
* Bzr ;)
* It's hidden away from the view of most GCC developers
Option 2: GCC SVN branch
Pros:
* We can work in the open, submitting patches via gcc-patches, as usual
* The final merge to GCC trunk (come stage 1) will be eased, a little
Cons:
* We can't really apply anything we want just for ourselves
* we may end up maintaining an LP branch shadowing the svn branch
* When we do want to do 4.6 in LP, we'll have to backport all our
patches from 4.7, and this may no longer be straightforward.
* Write permissions not clear.
* Although I think you can just go ahead and do it?
OK, so I'm sure I've missed some big ones. Please discuss! ;)
I think the big question here is, when will we start wanting to make
(unstable/experimental) Linaro GCC 4.6 releases? If we want to do it
early, then we'll have no choice but to have an LP branch to release from.
Andrew
Like everyone from Toolchain WG I will share my activites in last week:
1. cross compilers for archive
- discussed with doko about dropping update-alternatives use
- wrote gcc-defaults-armel-cross 1.4 which does proper symlinks for cross
compilers
- wrote gcc-4.5-armel-cross 1.41 which removes update-alternatives support
- wrote gcc-4.4-armel-cross 1.37 which removes update-alternatives support
- wrote armel-cross-toolchain-base 1.53 which has all updates which I had
- sent all of them to Steve for review
Status of changes:
- default version of armel cross compiler will be 4.5 like it is in Natty
- both 4.4 and 4.5 will be provided as it is for native
- any traces of update-alternatives use should be removed
Needs to be done:
- adding conflicts on older cross compilers to gcc-defaults-armel-cross
Order of upload to archive:
- armel-cross-toolchain-base
- gcc-4.5-armel-cross
- gcc-4.4-armel-cross
- gcc-defaults-armel-cross
2. Checked few old bugs do they still apply:
- Bugs #646729, #637454, #671455 are done with armel-cross-toolchain-base 1.52
(landed in maverick-proposed)
Regards,
--
JID: hrw(a)jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz
Short week.
Finally got external hard drive for my beagle - makes it sanely possible to
natively build things.
Got eglibc cross built (Thanks to Wookey for pointing me in the right
direction with the magic incantation of dpkg-buildpackage -aarmel
--target=binary) and
easily rebuilding . I have a version with the neon version of my memset
built into it - it doesn't seem to make a noticeable difference to my
ghostscript benchmark
though.
Panda's aren't likely to turn up until mid December; arranging borrowing
an A9 is turning out to be difficult, but it looks like we should be able to
get access to
the one in the London datacentre - although it has a disc problem at the
moment.
I did manage to get a colleague to try my tests on his own Toshiba AC-10
(Tegra-2 - no Neon); the
graphs had approximately the same shape as my previous Panda tests. Memchr
looked pretty
good on there.
Also trying to look at the sign off I need for various libc access.
Dave
I mainly worked on the atomic memory operations blueprint/item:
* posted an updated patch for #643171 on the libc-ports ml after running the
glibc testsuite natively on the vexpress
* continued to learn about the ARM instructions involved :)
* started to write some gcc testcases that scan the asm output of the __sync
builtins (mainly to detect differences between the gcc versions - not sure how
useful those tests would be for upstream as the sequences may easily change)
Ken
RAG:
Red:
Amber:
Green:
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 |
complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
Progress:
* Most of this week spent at the Meego conference in Dublin.
This seemed to be a rather apps-developer centric conf,
with not much of interest on the low-level side. There were
a few useful talks/conversations, though.
* Intel were giving away Atom-based netbooks to all attendees;
that's a lot of developers who are going to be testing and
optimising their apps for Atom devices rather than ARM...
* qemu: looked at https://bugs.launchpad.net/bugs/668799 ;
we don't seem to be taking the right lock before we manipulate
the graph of translation blocks. I have a fix which stops the
reported segfault, but the code has a number of "XXX not thread
safe" and "FIXME: not SMP safe" comments and generally doesn't
seem to have a coherent locking design :-(
* qemu: sent some minor patches upstream:
+ enable iwmmxt coprocessors in user mode
+ remove some unused functions from target-arm and target-sparc
+ fix a failure to build bug in a makefile
* qemu: some review of a patch to fix semihosting SYS_GET_CMDLINE
Plans
- qemu consolidation
- post-toolchain-review, sort out some milestones for
this report
Absences: (complete to end of 2010)
Thu/Fri 25-26 Nov; Fri 17 Dec - Tue 4 Jan inclusive.
(Dallas Linaro sprint 9-15 Jan.)
== This week ==
Started looking at STT_GNU_IFUNC support in BFD. There were a couple
of janitorial changes I needed to make in order to prepare elf32-arm.c
for the main patch. I tested those separately and submitted them upstream:
http://sourceware.org/ml/binutils/2010-11/msg00330.htmlhttp://sourceware.org/ml/binutils/2010-11/msg00331.html
I've now finished a prototype implementation of the STT_GNU_IFUNC
support itself. It wasn't as mechanical as I'd originally assumed,
which was nice.
Tests that I've run by hand seem to be doing the right thing.
I've now started writing tests for the testsuite (meaning:
I've completed 1 test so far).
== Next week ==
* Add more tests, including Thumb coverage.
* Start on the libc changes.
Richard