Hi there. When you do a commit that fixes a bug, try adding a
'--fixes lp:123456' to the bzr commit. Launchpad recognises these and
automatically link the branch and merge request to the bug report.
This makes it easier for the reporter to see progress. If you forget
to update the bug status (such as bumping it from 'Triaged' to 'In
progress') then it's obvious that it's being worked on.
See https://bugs.launchpad.net/gcc-linaro/+bug/942307 for an example.
The status was 'Triaged' but there's an approved merge request which
shows work is being done.
-- Michael
Merged FSF GCC 4.7 to the Linaro GCC 4.7 branch.
Merged from GCC 4.6.3 release to Linaro GCC 4.6 branch.
Wrote and posted a patch to load DImode immediate constant into NEON
registers properly. Unfortunately, testing showed a bootstrap stage 2
vs. 3 miscompare, so there's something not quite right. However,
disassembly of the binaries hasn't revealed any problems, so this
failure is still a mystery. More investigation required.
Wrote and posted a patch to do DImode negation in NEON. Realised that I
had forgotten to do the core-register fall-back case; posted a new
version. Again, there's something annoyingly subtle that prevents
bootstrap. This time it looks like some sort of wrong code bug.
Investigating.
Wrote and posted a patch to do DImode one's complement in NEON. Richard
E questioned how it was written though. The tests passed successfully,
so that's a novelty this week!
Looked for other NEON instructions missing support. Didn't find any ...
but the machine description isn't exactly straight forward.
Considered the problems with choosing whether or not to do an operation
in NEON, or not. Discussed the existing state and possible solutions
with Ramana and Benrd (thanks Guys). Thought about it some more. Posted
a vague description of what might fix it to the linaro-toolchain list.
Awaiting replies.
Hi!
* Development benchmarks:
Finished the first implementation, sent to Michael for review.
* This became a very short week because of sickness.
Plans for week 10 is to triage existing bugs, and to get going again with
the SunSpider benchmark.
Regards
Åsa
Summary:
* Multilib on linaro toolchain.
Details:
1. Multilib test for linaro toolchain.
* Fix the multilib build issue when multiarch patch is applied.
* Fix a bug in crosstool-ng upstream to build multilib for glibc/eglibc.
* Successfully build multilib toolchain for armv6t2, cortex-a8 and
cortex-a9. And tests show it can link the correct libraries.
2. Still issues:
* Multilib and multiarch uses different directory structures.
Multilib build can not find the correct directories in the prebuilt
oneiric-sysroot.
* Build armv5t arm mode lib when default mode is thumb for armv7-a.
Annual leaves:
* Mar. 1-2.
Plans:
* Finalize the multilib solution for linaro binary toolchain.
* Work on code size optimization for the embedded toolchain.
Best regards!
-Zhenqiang
> The basic idea is that we add a new RTL optimization pass (or two) that
> assesses the usage of pseudo registers, and makes recommendations about
> what register class each should end up in, if there's a choice. These
> recommendations would then be used by later passes to get a better use
> of NEON. I might call this the "prealloc" pass, or something.
That sounds very much like the pre-reload that "new-ra" had at one
point (http://gcc.gnu.org/viewcvs/branches/new-regalloc-branch/gcc/pre-reload.c).
The problem with pre-reload for new-ra was that it was basically
reload instead of something nicer and cleaner. It also only ran just
before the register allocator, which is too late for the problem you
are trying to solve.
> Firstly, for each pseudo-register in a function, the pass would look at
> the insn constraints for each "def" and "use", and see how the registers
> relate to one another. This might determine things like "if rN is in
> class A, then rM must be also in class A".
At SUSE I tried to do this with the webizer pass (web.c). I wrote down
the ideas we implemented at the time (see
http://gcc.gnu.org/ml/gcc/2005-01/msg00179.html):
- web class, to replace regclass and choose register classes webs
instead of pseudos. This also includes splitting webs if a register
in a web really wants to be in two different classes to satisfy
constraints in two different insns. Right now, as far as I
understand, regclass just picks one and lets reload figure out how to
fix up that mistake.
- A semi-strict RTL mode. Right now there is just strict and
non-strict. On the branch there is a semi-strict mode which is the
same as strict RTL except that pseudo-registers are still allowed.
- pre-reload (which is related to web class) to make sure as many insn
constraints as possible are satisfied before the register allocator
goes to work. Basically, after pre-reload the insns stream should be
in semi-strict RTL form.
I used the webizer to unify defs and uses. I would split a web if it
needed multiple register classes (I inserted a mov, without checking
that a move existed from the source to the target register class), and
I put pseudos r1 and r2 in the same register class if there was an
insn (set (r1) (r2)) somewhere. The selection of the register classes
had a cost function, but I used rtx_cost, which is not very effective,
really. But I never took this experiment very far because for x86-64
the plan didn't work as well as I had hoped. I don't remember the
details, but the biggest problem I had with the experimental
implementation of these ideas (apart from lots of trouble with recog
for semi-strict RTL) was that there is a bit of an ordering problem
between combine on the one hand, and web-based register classes. If
you assign classes too early and don't allow things to change, then
combine fails too often. If you assign register classes after combine,
you may not get the instructions selected the way you want them to be.
This was when GCC still had the old local-alloc.c and global.c
allocators. Things may be different (better) with IRA and the upcoming
LRA stuff.
If you plan to work on this, I would suggest you discuss the plan on
the GCC mailing list also, with Jeff Law and Vladimir Makarov in CC
because they are working on a reload rewrite (LRA).
Ciao!
Steven
Hi,
OpenEmbedded:
* Worked on the meta-linaro layer and added libgcc and crosssdk
recipes to satisfy some bitbake dependencies
* I had to apply a few patches to build the linaro toolchain the OE
way (mostly gcc configury)
* successfully built the sato and Qt images
* Moved on to test the February release of the linaro binary toolchain
and (probably) and hit an issue with unaligned SD card images to used
with QEMU
* the guest kernel fails with: attempt to access beyond end of device
* /proc/partitions shows different block sizes (host vs. guest)
* the image size gets calculated on the fly by OE
* patch posted that introduces allows to specify a rootfs size alignment
* not seen on trunk as they use IDE
* Started to rebase the linaro-meta layer against current OE-core
* created https://wiki.linaro.org/KenWerner/Sandbox/OEMetaLinaroCard
based on the existent card of David R.
Regards,
Ken
== GCC ==
* Fixed mainline regression causing ICE in certain outer-loop
vectorization cases.
* Merged fwprop-subreg patch into Linaro GCC 4.7.
* Completed patch to generate usat/ssat instructions
where appropriate; checked into GCC mainline.
Merge requests to Linaro GCC 4.6 and 4.7 pending.
* Ongoing work on improving end-of-loop value computation.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
==Progress===
* Finished off PGO patch - sent upstream.
* Finished off the ABI tests - sent upstream.
* Investigated fixes for LP 942307 - a problem with kernel builds for
android. Backported a fix from Uli last year.
* Upstream patch review.
* Small configury done for SPEC2k as far as HC partitioning goes.
* Some Android benchmark investigations.
* Recovered from a broken upgrade on my laptop from natty to oneiric
on my laptop and then went all the way to Precise. It works
reasonably !
=== Plans ===
* Commit all approved and tested patches.
* Check on hc partitioning results from SPEC2k and make sure there is
an improvement and the feature works !
* Investigate https://bugs.launchpad.net/gcc-linaro/+bug/924726 in a
little more detail.
* Get back to partial-partial PRE.
Absences.
* 1 week holiday sometime before that - to be booked.
* Linaro Connect Q2.12 - May 28 - June 1 - travel booked - hotel to be booked.
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-??-?? || ||
(new blueprints & reestimate for this one pending)
Historical Milestones:
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| 2012-02-01 ||
== cp15-rework ==
* ploughing through conversion of cp15 registers to new design:
patchset now 20 patches long, still TODO crn={0,1,6,7,9}
== other ==
* reviewed more Xilinx Zynq model patches
* looking at BE8 support: Paul Brook has posted some patches
to support this in user mode
* LP:944645: fixed bug where we weren't clearing the IT bits when
entering an M profile exception handler
* sent out an arm-devs.next pullreq
* trying to track down why linux-user is failing brk() and thus
causing bash segfaults
Hi All,
As you know, the compiler currently has difficulties choosing between
whether to do an operation in NEON or not.
As I see it there are three problems:
1. Simply, is it profitable?
NEON can do many DImode operations in one or two instructions
where 2 to 10 normal ARM/Thumb instructions would be required
(not to mention the added register pressure), but there is a
cost associated with moving the inputs to NEON, and the results
back.
If the data can stay in NEON for more than one operation,
then that's even better.
If the data must be loaded from memory, and the result stored back
to memory, then it's only a question of whether the register space
is available, or not.
Currently these decisions are made in the IRA/reload passes.
2. Values that originate in hard-registers stay there.
This applies to function parameters, mostly, but also in general
where the result of an operation is allocated first.
If there is no instruction that can use the value there then the
value is 'reloaded' to a more suitable register. If there is any
alternative that avoids the move then the register allocator will
use it, regardless of the relatives costs of the other
alternatives.
This problem is reduced where an operation and move can happen in
one instruction, but NEON instructions do not do this much. We can
write insns that appear to do it, but these output multiple
instructions (see my recent core-SI=>NEON-DI extend patch).
3. It all happens too late.
The decision whether to use NEON or not is not made until register
allocation time. Naturally this means that most of the optimization
passes are already completed.
Part of the problem is that the operation almost certainly needs
splitting (into whatever form was chosen) and this might not be
straight forward, post-reload. (However, the split1 pass is
already quite late, so perhaps this isn't such a big deal.)
Another part of the problem is that passes such as the two
lower-subreg passes make assumptions about the register width which
are not accurate if the operation is to end up in NEON.
There are other, lesser problems, such as it being hard to adjust the
costs for different cores (A8 in particular) and the cost of generating
an immediate constant can't be known until it's known what instructions
will be used to generate it.
These problems are not specific to NEON, of course. I believe IWMMXT
suffers from the same issues. Likewise the C6X port, and also the i386
MMX to some degree. Anything that has instructions that only operate on
a subset of registers, basically.
So, Bernd has suggested an outline of a solution. I've quizzed him on
this, added a few of my own ideas, and probably a good selection of
misunderstandings, bad assumptions, and general cock ups, and come up
with something I can write here for comment. I can post something to
upstream later if it doesn't get totally shot down now.
The basic idea is that we add a new RTL optimization pass (or two) that
assesses the usage of pseudo registers, and makes recommendations about
what register class each should end up in, if there's a choice. These
recommendations would then be used by later passes to get a better use
of NEON. I might call this the "prealloc" pass, or something.
Firstly, for each pseudo-register in a function, the pass would look at
the insn constraints for each "def" and "use", and see how the registers
relate to one another. This might determine things like "if rN is in
class A, then rM must be also in class A".
E.g. if you have two registers with constraints like this:
"r,w"
"r,w"
.. (and 'r' and 'w' do not overlap) then you know that there is a choice
between one mode or another, whereas this:
"r,w,r,w"
"r,w,w,r"
.. would impose no restrictions and we can carry on as normal.
Having done that we'd end up with sets of pseudo-registers that must
make a decision one way or the other, and we'd know where the operations
are that would force a move from one class to the other.
There's a fair amount of handwavium in there at present, because I've
not worked out what to do with overlapping register classes (think
VFP_LO_REGS) and all the other complications.
Secondly, the pass would consider the costs of each alternative, and
store a recommended register class for each pseudo-register in a table
somewhere. It would also create new pseudos and insert extra move
instructions at the register file boundaries where an existing register
would have had split recommendations (this would solve problem 2 above).
Again, there's handwavium in "consider the costs". This isn't too hard
for size-optimization (assuming the "length" attributes on the insn is
correct), but more difficult for speed optimization. Factors to include
would be the move costs (here the A8 issues would be addresses) and the
relative speeds of the operations in both alternatives. Also, the
various possible transition points between the two modes might need some
comparisons.
Thirdly, the subsequent passes would need to be modified, as would some
of the back-end bits and bobs.
1. Lower-subreg would need to detect 'word_mode' based on the
recommended register class, not the global value.
2. The many split patterns in the machine description could be adjusted
so that, instead of simply conditionalizing on "reload_completed", they
split at split1 if that's the best option. (Maybe it would be profitable
to insert a new, earlier split pass specifically for this case to take
advantage of the likes of combine? I mean, ideally this decision would
have been made at expand time, if it could have been?) It might be
useful to *not* split too soon, in some cases, so that the register
allocator can still make the final decision based on register pressure,
and whatever other factors it uses. Of course, the existing late-split
option would need to be retained in case the prealloc pass is disabled,
in any case.
3. Various passes would have to be taught not to remove seemingly
superfluous register moves where they actually move between register
classes.
4. Pretty much nothing would need doing to register allocation! The
extra moves should make allocation a register pressure management issue,
rather than a question of making it work. DImode operations preallocated
to core-registers may already have been lowered, one way or the other
(by split1) so there's no decision left there, and if no lowering was
necessary then that option ought to be obviously cheaper. If it insists
on making contrary decisions then it can be taught to use the
recommendation as a hint, perhaps? In specific problem cases it would
also be possible to use instruction attributes to disable (or strongly
discourage) certain alternatives based on the recommended class.
5. The existing 'onlya8'/'nota8' nonsense can be removed.
6. The register move cost can be set correctly for each core.
7. If a constant is destined for a NEON register, most likely,
arm_gen_constant can use the NEON immediate rules to determine the cost.
There's clearly a lot of thought that needs to go into the
pseudo-register scan and decision making logic, but the whole thing
doesn't look like it'll boil down to very much code in the end.
There's also the question of where to put the pass? Too early and you'd
need to put a second one in to reassess the much changed RTL later, and
too late and lower-subreg won't be able to use it.
It's possible that it might be better to treat it more like the
data-flow analysis where it's not actually a stand-alone pass, but
rather a tool other passes can use? That might depend how
computationally expensive it is.
Any thoughts anyone? Might something like this actually work? Would it
be worth spending the time on this?
Andrew
Hi,
184603 fixes an ICE we're running into with Android test builds.
Please pull it in ASAP so I don't have to mess with the CFLAGS as a workaround.
ttyl
bero
Hi Peter, Rusty. I've fleshed out the KVM blueprints a bit more at:
https://wiki.linaro.org/MichaelHope/Sandbox/Q1.12ToDo
Notable is adding validation and de-PENDING a few things.
I need to know the following things before talking to Validation and
Platforms about things that are in their queue:
* What's the host kernel? Christoffers, or a linux-linaro-kvm based
off his plus work in progress?
* What's the guest kernel?
* Anything special that needs to be in the host rootfs outside qemu
and libvirt?
Do you have a host and guest kernel .config that I can pass on to Platforms?
-- Michael
Hi,
GDB on Android:
* Set up cross compiling environment using both the Android toolchain
and
the Linaro toolchain.
* Tested Android and Mozilla GDB running as a native binary on the
Linaro Android image on qemu. None of them were able to see all
threads in a multi-threaded process. The same happens with a native
binary on Android talking to a gdbserver on localhost.
* I was able to produce an Android GDB 7.1.x binary which when running
on an i386 host talking to the Android SDK emulator sees all threads.
It is not able to generate a backtrace though.
* Carnival holidays.
--
[]'s
Thiago Jung Bauermann
Linaro Toolchain Working Group
Investigated and produced and patch for bug lp936863.
Continued work on 64-bit shifts. I updated my neon shifts patch twice:
once to take -Os optimization into account, and once because I noticed
that the CC clobbers were being retained even after they were known to
be not required, and presumably this could be a bad thing.
Posted a patch to improve SI->DImode sign- and zero-extends that also
move values from core registers to neon registers.
Wrote a patch to implement negation in neon register, albeit at the cost
of a scratch register to hold the constant zero. This worked, but
insisted on loading the zero from memory, despite the fact that NEON has
a suitable instruction for loading zero.
Attempted to get "vmov dN, #0" to work. The problem appears to be that
DImode loads are handled by the VFP load pattern in vfp.md and that that
doesn't seem to DTRT for integer constants. I've got as far as trying to
figure out what it does do, but there's more to be done to get to the
bottom of the problem.
Tried to get the benchmarking going for 64-bit shifts. Unfortunately
ursa1 has been experiencing outages. Michael has been trying to fix it,
but so far I've not been able to run anything there. Instead, I've
launched benchmark runs via Michael's normal test infrastructure.
Hopefully this can replace the manual ones anyway. That would be more
convenient.
Hi,
I'm trying to use pre-built version of linaro toolchain for cross-compiler in Ubuntu 11.04 on our 64bit server.
I got it from http://people.linaro.org/~michaelh/incoming/binaries/.
When I run arm-linux-gnueabi-gcc to compile a c source, it says "No such file or directory".
The steps are as below:
1. Unpack gcc-linaro-arm-linux-gnueabi-2011.12-20111219+bzr2309~linux.tar.bz2.
2. Rename gcc-linaro-arm-linux-gnueabi-2011.12-20111219+bzr2309~linux to arm-fsl-linux-gnueabi.
3. Copy to ~/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/
2. Use arm-linux-gnueabi-gcc to compile my code.
The output is:
r65388@shlinux3:~/MEMCPYBM$ ~/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/arm-fsl-linux-gnueabi/bin/arm-linux-gnueabi-gcc
-bash: /home/r65388/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/arm-fsl-linux-gnueabi/bin/arm-linux-gnueabi-gcc: No such file or directory
I have installed lsb 4.0 and libc6 2.13.
r65388@shlinux3:~/MEMCPYBM$ dpkg -s lsb
Package: lsb
Status: install ok installed
Priority: extra
Section: misc
Installed-Size: 48
Maintainer: Ubuntu Developers <ubuntu-devel-discuss(a)lists.ubuntu.com>
Architecture: all
Version: 4.0-0ubuntu16
...
r65388@shlinux3:~/MEMCPYBM$ dpkg -s libc6-dev
Package: libc6-dev
Status: install ok installed
Multi-Arch: same
Priority: optional
Section: libdevel
Installed-Size: 11888
Maintainer: Ubuntu Developers <ubuntu-devel-discuss(a)lists.ubuntu.com>
Architecture: amd64
Source: eglibc
Version: 2.13-20ubuntu5
Provides: libc-dev
Depends: libc6 (= 2.13-20ubuntu5), libc-dev-bin (= 2.13-20ubuntu5), linux-libc-dev
Recommends: gcc | c-compiler
Suggests: glibc-doc, manpages-dev
Breaks: binutils (<< 2.20.1-1), binutils-gold (<< 2.20.1-11), cmake (<< 2.8.4+dfsg.1-5), gcc-4.4 (<< 4.4.6-3ubuntu1), gcc-4.4-base (<< 4.4.6-3ubuntu1), gcc-4.5 (<< 4.5.3-1ubuntu2), gcc-4.5-base (<< 4.5.3-1ubuntu2), gcc-4.6 (<< 4.6.0-12), gcj-4.4-base (<< 4.4.6-2ubuntu2), gcj-4.5-base (<< 4.5.3-1ubuntu2), gnat-4.4-base (<< 4.4.6-1ubuntu3), libhwloc-dev (<< 1.2-3), libjna-java (<< 3.2.7-4), liblouis-dev (<< 2.3.0-2), liblouisxml-dev (<< 2.4.0-2), make (<< 3.81-8.1), pkg-config (<< 0.26-1)
...
Do you know why?
How can I fix this issue?
Thanks~~
Yours
Terry
Summary:
* Linaro binary toolchain 2012.02 release.
* Try the Multilib patch on linaro toolchain.
Details:
1. Linaro binary toolchain 2012.02 release.
* Create linaro license file based on embedded toolchain, add
crosstool-ng and sysroot (eglibc) license.
* Tests on win7, oneiric and RHEL5.
2. Try to apply the multilib patch used in embedded toolchain to
linaro toolchain.
* The multilib solution can not work together with the multiarch patch.
(1) With both patches, "gcc -print-multi-lib" can only print one option.
(2) With only multilib patch, "gcc -print-multi-lib" can print the
multilib list.
* Successfully build a multilib toolchain (Based on the prebuilt
oneiric-sysroot with directory structure change for include and lib.)
without the multiarch patch and c++ support.
Plans:
1. Try crosstool-ng 1.14 to build multilib eglibc.
2. Investigate how to make multiarch and multilib work together.
Planed leaves:
* Mar. 1-2.
Best regards!
-Zhenqiang
Hi Peter, Rusty. I've written down my notes from Connect and drafted
the blueprints at:
https://wiki.linaro.org/MichaelHope/Sandbox/Q1.12ToDo
I've put my priories on things and sorted them by relative importance.
I'll check tomorrow if there's a good match between this, the KVMEpic
spec, and the older task spreadsheet.
Most of these are too short. I don't know what's involved with the
CP15 work. Some still have big PENDINGS. Please edit and correct.
There's a lot of work there. Some is on other teams but I'm concerned.
-- Michael
== GCC ==
* Merged Ira's remaining vectorizer patches into Linaro GCC 4.7.
* Merged Richard's sched-pressure patch (enabled by default
on ARM) into Linaro GCC 4.7. Asked IBM colleagues for
benchmark results on s390 and rs6000 as well.
* Committed patch to enable vect_condition tests in FSF mainline,
FSF 4.6, and Linaro GCC 4.7.
* Verified that Linaro GCC 4.7 now contains every feature that
was present in Linaro GCC 4.6.
* Rebased and re-benchmarked fwprop-subreg patch.
* Ongoing work on a patch to generate usat/ssat instructions
where appropriate.
* Worked on improving end-of-loop value computation.
* Fixed mainline .init_array/.fini_array regression on certain
newlib targets.
* Reviewed several merge requests.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294