> The basic idea is that we add a new RTL optimization pass (or two) that
> assesses the usage of pseudo registers, and makes recommendations about
> what register class each should end up in, if there's a choice. These
> recommendations would then be used by later passes to get a better use
> of NEON. I might call this the "prealloc" pass, or something.
That sounds very much like the pre-reload that "new-ra" had at one
point (http://gcc.gnu.org/viewcvs/branches/new-regalloc-branch/gcc/pre-reload.c).
The problem with pre-reload for new-ra was that it was basically
reload instead of something nicer and cleaner. It also only ran just
before the register allocator, which is too late for the problem you
are trying to solve.
> Firstly, for each pseudo-register in a function, the pass would look at
> the insn constraints for each "def" and "use", and see how the registers
> relate to one another. This might determine things like "if rN is in
> class A, then rM must be also in class A".
At SUSE I tried to do this with the webizer pass (web.c). I wrote down
the ideas we implemented at the time (see
http://gcc.gnu.org/ml/gcc/2005-01/msg00179.html):
- web class, to replace regclass and choose register classes webs
instead of pseudos. This also includes splitting webs if a register
in a web really wants to be in two different classes to satisfy
constraints in two different insns. Right now, as far as I
understand, regclass just picks one and lets reload figure out how to
fix up that mistake.
- A semi-strict RTL mode. Right now there is just strict and
non-strict. On the branch there is a semi-strict mode which is the
same as strict RTL except that pseudo-registers are still allowed.
- pre-reload (which is related to web class) to make sure as many insn
constraints as possible are satisfied before the register allocator
goes to work. Basically, after pre-reload the insns stream should be
in semi-strict RTL form.
I used the webizer to unify defs and uses. I would split a web if it
needed multiple register classes (I inserted a mov, without checking
that a move existed from the source to the target register class), and
I put pseudos r1 and r2 in the same register class if there was an
insn (set (r1) (r2)) somewhere. The selection of the register classes
had a cost function, but I used rtx_cost, which is not very effective,
really. But I never took this experiment very far because for x86-64
the plan didn't work as well as I had hoped. I don't remember the
details, but the biggest problem I had with the experimental
implementation of these ideas (apart from lots of trouble with recog
for semi-strict RTL) was that there is a bit of an ordering problem
between combine on the one hand, and web-based register classes. If
you assign classes too early and don't allow things to change, then
combine fails too often. If you assign register classes after combine,
you may not get the instructions selected the way you want them to be.
This was when GCC still had the old local-alloc.c and global.c
allocators. Things may be different (better) with IRA and the upcoming
LRA stuff.
If you plan to work on this, I would suggest you discuss the plan on
the GCC mailing list also, with Jeff Law and Vladimir Makarov in CC
because they are working on a reload rewrite (LRA).
Ciao!
Steven
Hi,
OpenEmbedded:
* Worked on the meta-linaro layer and added libgcc and crosssdk
recipes to satisfy some bitbake dependencies
* I had to apply a few patches to build the linaro toolchain the OE
way (mostly gcc configury)
* successfully built the sato and Qt images
* Moved on to test the February release of the linaro binary toolchain
and (probably) and hit an issue with unaligned SD card images to used
with QEMU
* the guest kernel fails with: attempt to access beyond end of device
* /proc/partitions shows different block sizes (host vs. guest)
* the image size gets calculated on the fly by OE
* patch posted that introduces allows to specify a rootfs size alignment
* not seen on trunk as they use IDE
* Started to rebase the linaro-meta layer against current OE-core
* created https://wiki.linaro.org/KenWerner/Sandbox/OEMetaLinaroCard
based on the existent card of David R.
Regards,
Ken
== GCC ==
* Fixed mainline regression causing ICE in certain outer-loop
vectorization cases.
* Merged fwprop-subreg patch into Linaro GCC 4.7.
* Completed patch to generate usat/ssat instructions
where appropriate; checked into GCC mainline.
Merge requests to Linaro GCC 4.6 and 4.7 pending.
* Ongoing work on improving end-of-loop value computation.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
==Progress===
* Finished off PGO patch - sent upstream.
* Finished off the ABI tests - sent upstream.
* Investigated fixes for LP 942307 - a problem with kernel builds for
android. Backported a fix from Uli last year.
* Upstream patch review.
* Small configury done for SPEC2k as far as HC partitioning goes.
* Some Android benchmark investigations.
* Recovered from a broken upgrade on my laptop from natty to oneiric
on my laptop and then went all the way to Precise. It works
reasonably !
=== Plans ===
* Commit all approved and tested patches.
* Check on hc partitioning results from SPEC2k and make sure there is
an improvement and the feature works !
* Investigate https://bugs.launchpad.net/gcc-linaro/+bug/924726 in a
little more detail.
* Get back to partial-partial PRE.
Absences.
* 1 week holiday sometime before that - to be booked.
* Linaro Connect Q2.12 - May 28 - June 1 - travel booked - hotel to be booked.
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-??-?? || ||
(new blueprints & reestimate for this one pending)
Historical Milestones:
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| 2012-02-01 ||
== cp15-rework ==
* ploughing through conversion of cp15 registers to new design:
patchset now 20 patches long, still TODO crn={0,1,6,7,9}
== other ==
* reviewed more Xilinx Zynq model patches
* looking at BE8 support: Paul Brook has posted some patches
to support this in user mode
* LP:944645: fixed bug where we weren't clearing the IT bits when
entering an M profile exception handler
* sent out an arm-devs.next pullreq
* trying to track down why linux-user is failing brk() and thus
causing bash segfaults
Hi All,
As you know, the compiler currently has difficulties choosing between
whether to do an operation in NEON or not.
As I see it there are three problems:
1. Simply, is it profitable?
NEON can do many DImode operations in one or two instructions
where 2 to 10 normal ARM/Thumb instructions would be required
(not to mention the added register pressure), but there is a
cost associated with moving the inputs to NEON, and the results
back.
If the data can stay in NEON for more than one operation,
then that's even better.
If the data must be loaded from memory, and the result stored back
to memory, then it's only a question of whether the register space
is available, or not.
Currently these decisions are made in the IRA/reload passes.
2. Values that originate in hard-registers stay there.
This applies to function parameters, mostly, but also in general
where the result of an operation is allocated first.
If there is no instruction that can use the value there then the
value is 'reloaded' to a more suitable register. If there is any
alternative that avoids the move then the register allocator will
use it, regardless of the relatives costs of the other
alternatives.
This problem is reduced where an operation and move can happen in
one instruction, but NEON instructions do not do this much. We can
write insns that appear to do it, but these output multiple
instructions (see my recent core-SI=>NEON-DI extend patch).
3. It all happens too late.
The decision whether to use NEON or not is not made until register
allocation time. Naturally this means that most of the optimization
passes are already completed.
Part of the problem is that the operation almost certainly needs
splitting (into whatever form was chosen) and this might not be
straight forward, post-reload. (However, the split1 pass is
already quite late, so perhaps this isn't such a big deal.)
Another part of the problem is that passes such as the two
lower-subreg passes make assumptions about the register width which
are not accurate if the operation is to end up in NEON.
There are other, lesser problems, such as it being hard to adjust the
costs for different cores (A8 in particular) and the cost of generating
an immediate constant can't be known until it's known what instructions
will be used to generate it.
These problems are not specific to NEON, of course. I believe IWMMXT
suffers from the same issues. Likewise the C6X port, and also the i386
MMX to some degree. Anything that has instructions that only operate on
a subset of registers, basically.
So, Bernd has suggested an outline of a solution. I've quizzed him on
this, added a few of my own ideas, and probably a good selection of
misunderstandings, bad assumptions, and general cock ups, and come up
with something I can write here for comment. I can post something to
upstream later if it doesn't get totally shot down now.
The basic idea is that we add a new RTL optimization pass (or two) that
assesses the usage of pseudo registers, and makes recommendations about
what register class each should end up in, if there's a choice. These
recommendations would then be used by later passes to get a better use
of NEON. I might call this the "prealloc" pass, or something.
Firstly, for each pseudo-register in a function, the pass would look at
the insn constraints for each "def" and "use", and see how the registers
relate to one another. This might determine things like "if rN is in
class A, then rM must be also in class A".
E.g. if you have two registers with constraints like this:
"r,w"
"r,w"
.. (and 'r' and 'w' do not overlap) then you know that there is a choice
between one mode or another, whereas this:
"r,w,r,w"
"r,w,w,r"
.. would impose no restrictions and we can carry on as normal.
Having done that we'd end up with sets of pseudo-registers that must
make a decision one way or the other, and we'd know where the operations
are that would force a move from one class to the other.
There's a fair amount of handwavium in there at present, because I've
not worked out what to do with overlapping register classes (think
VFP_LO_REGS) and all the other complications.
Secondly, the pass would consider the costs of each alternative, and
store a recommended register class for each pseudo-register in a table
somewhere. It would also create new pseudos and insert extra move
instructions at the register file boundaries where an existing register
would have had split recommendations (this would solve problem 2 above).
Again, there's handwavium in "consider the costs". This isn't too hard
for size-optimization (assuming the "length" attributes on the insn is
correct), but more difficult for speed optimization. Factors to include
would be the move costs (here the A8 issues would be addresses) and the
relative speeds of the operations in both alternatives. Also, the
various possible transition points between the two modes might need some
comparisons.
Thirdly, the subsequent passes would need to be modified, as would some
of the back-end bits and bobs.
1. Lower-subreg would need to detect 'word_mode' based on the
recommended register class, not the global value.
2. The many split patterns in the machine description could be adjusted
so that, instead of simply conditionalizing on "reload_completed", they
split at split1 if that's the best option. (Maybe it would be profitable
to insert a new, earlier split pass specifically for this case to take
advantage of the likes of combine? I mean, ideally this decision would
have been made at expand time, if it could have been?) It might be
useful to *not* split too soon, in some cases, so that the register
allocator can still make the final decision based on register pressure,
and whatever other factors it uses. Of course, the existing late-split
option would need to be retained in case the prealloc pass is disabled,
in any case.
3. Various passes would have to be taught not to remove seemingly
superfluous register moves where they actually move between register
classes.
4. Pretty much nothing would need doing to register allocation! The
extra moves should make allocation a register pressure management issue,
rather than a question of making it work. DImode operations preallocated
to core-registers may already have been lowered, one way or the other
(by split1) so there's no decision left there, and if no lowering was
necessary then that option ought to be obviously cheaper. If it insists
on making contrary decisions then it can be taught to use the
recommendation as a hint, perhaps? In specific problem cases it would
also be possible to use instruction attributes to disable (or strongly
discourage) certain alternatives based on the recommended class.
5. The existing 'onlya8'/'nota8' nonsense can be removed.
6. The register move cost can be set correctly for each core.
7. If a constant is destined for a NEON register, most likely,
arm_gen_constant can use the NEON immediate rules to determine the cost.
There's clearly a lot of thought that needs to go into the
pseudo-register scan and decision making logic, but the whole thing
doesn't look like it'll boil down to very much code in the end.
There's also the question of where to put the pass? Too early and you'd
need to put a second one in to reassess the much changed RTL later, and
too late and lower-subreg won't be able to use it.
It's possible that it might be better to treat it more like the
data-flow analysis where it's not actually a stand-alone pass, but
rather a tool other passes can use? That might depend how
computationally expensive it is.
Any thoughts anyone? Might something like this actually work? Would it
be worth spending the time on this?
Andrew
Hi,
184603 fixes an ICE we're running into with Android test builds.
Please pull it in ASAP so I don't have to mess with the CFLAGS as a workaround.
ttyl
bero
Hi Peter, Rusty. I've fleshed out the KVM blueprints a bit more at:
https://wiki.linaro.org/MichaelHope/Sandbox/Q1.12ToDo
Notable is adding validation and de-PENDING a few things.
I need to know the following things before talking to Validation and
Platforms about things that are in their queue:
* What's the host kernel? Christoffers, or a linux-linaro-kvm based
off his plus work in progress?
* What's the guest kernel?
* Anything special that needs to be in the host rootfs outside qemu
and libvirt?
Do you have a host and guest kernel .config that I can pass on to Platforms?
-- Michael
Hi,
GDB on Android:
* Set up cross compiling environment using both the Android toolchain
and
the Linaro toolchain.
* Tested Android and Mozilla GDB running as a native binary on the
Linaro Android image on qemu. None of them were able to see all
threads in a multi-threaded process. The same happens with a native
binary on Android talking to a gdbserver on localhost.
* I was able to produce an Android GDB 7.1.x binary which when running
on an i386 host talking to the Android SDK emulator sees all threads.
It is not able to generate a backtrace though.
* Carnival holidays.
--
[]'s
Thiago Jung Bauermann
Linaro Toolchain Working Group
Investigated and produced and patch for bug lp936863.
Continued work on 64-bit shifts. I updated my neon shifts patch twice:
once to take -Os optimization into account, and once because I noticed
that the CC clobbers were being retained even after they were known to
be not required, and presumably this could be a bad thing.
Posted a patch to improve SI->DImode sign- and zero-extends that also
move values from core registers to neon registers.
Wrote a patch to implement negation in neon register, albeit at the cost
of a scratch register to hold the constant zero. This worked, but
insisted on loading the zero from memory, despite the fact that NEON has
a suitable instruction for loading zero.
Attempted to get "vmov dN, #0" to work. The problem appears to be that
DImode loads are handled by the VFP load pattern in vfp.md and that that
doesn't seem to DTRT for integer constants. I've got as far as trying to
figure out what it does do, but there's more to be done to get to the
bottom of the problem.
Tried to get the benchmarking going for 64-bit shifts. Unfortunately
ursa1 has been experiencing outages. Michael has been trying to fix it,
but so far I've not been able to run anything there. Instead, I've
launched benchmark runs via Michael's normal test infrastructure.
Hopefully this can replace the manual ones anyway. That would be more
convenient.
Hi,
I'm trying to use pre-built version of linaro toolchain for cross-compiler in Ubuntu 11.04 on our 64bit server.
I got it from http://people.linaro.org/~michaelh/incoming/binaries/.
When I run arm-linux-gnueabi-gcc to compile a c source, it says "No such file or directory".
The steps are as below:
1. Unpack gcc-linaro-arm-linux-gnueabi-2011.12-20111219+bzr2309~linux.tar.bz2.
2. Rename gcc-linaro-arm-linux-gnueabi-2011.12-20111219+bzr2309~linux to arm-fsl-linux-gnueabi.
3. Copy to ~/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/
2. Use arm-linux-gnueabi-gcc to compile my code.
The output is:
r65388@shlinux3:~/MEMCPYBM$ ~/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/arm-fsl-linux-gnueabi/bin/arm-linux-gnueabi-gcc
-bash: /home/r65388/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/arm-fsl-linux-gnueabi/bin/arm-linux-gnueabi-gcc: No such file or directory
I have installed lsb 4.0 and libc6 2.13.
r65388@shlinux3:~/MEMCPYBM$ dpkg -s lsb
Package: lsb
Status: install ok installed
Priority: extra
Section: misc
Installed-Size: 48
Maintainer: Ubuntu Developers <ubuntu-devel-discuss(a)lists.ubuntu.com>
Architecture: all
Version: 4.0-0ubuntu16
...
r65388@shlinux3:~/MEMCPYBM$ dpkg -s libc6-dev
Package: libc6-dev
Status: install ok installed
Multi-Arch: same
Priority: optional
Section: libdevel
Installed-Size: 11888
Maintainer: Ubuntu Developers <ubuntu-devel-discuss(a)lists.ubuntu.com>
Architecture: amd64
Source: eglibc
Version: 2.13-20ubuntu5
Provides: libc-dev
Depends: libc6 (= 2.13-20ubuntu5), libc-dev-bin (= 2.13-20ubuntu5), linux-libc-dev
Recommends: gcc | c-compiler
Suggests: glibc-doc, manpages-dev
Breaks: binutils (<< 2.20.1-1), binutils-gold (<< 2.20.1-11), cmake (<< 2.8.4+dfsg.1-5), gcc-4.4 (<< 4.4.6-3ubuntu1), gcc-4.4-base (<< 4.4.6-3ubuntu1), gcc-4.5 (<< 4.5.3-1ubuntu2), gcc-4.5-base (<< 4.5.3-1ubuntu2), gcc-4.6 (<< 4.6.0-12), gcj-4.4-base (<< 4.4.6-2ubuntu2), gcj-4.5-base (<< 4.5.3-1ubuntu2), gnat-4.4-base (<< 4.4.6-1ubuntu3), libhwloc-dev (<< 1.2-3), libjna-java (<< 3.2.7-4), liblouis-dev (<< 2.3.0-2), liblouisxml-dev (<< 2.4.0-2), make (<< 3.81-8.1), pkg-config (<< 0.26-1)
...
Do you know why?
How can I fix this issue?
Thanks~~
Yours
Terry
Summary:
* Linaro binary toolchain 2012.02 release.
* Try the Multilib patch on linaro toolchain.
Details:
1. Linaro binary toolchain 2012.02 release.
* Create linaro license file based on embedded toolchain, add
crosstool-ng and sysroot (eglibc) license.
* Tests on win7, oneiric and RHEL5.
2. Try to apply the multilib patch used in embedded toolchain to
linaro toolchain.
* The multilib solution can not work together with the multiarch patch.
(1) With both patches, "gcc -print-multi-lib" can only print one option.
(2) With only multilib patch, "gcc -print-multi-lib" can print the
multilib list.
* Successfully build a multilib toolchain (Based on the prebuilt
oneiric-sysroot with directory structure change for include and lib.)
without the multiarch patch and c++ support.
Plans:
1. Try crosstool-ng 1.14 to build multilib eglibc.
2. Investigate how to make multiarch and multilib work together.
Planed leaves:
* Mar. 1-2.
Best regards!
-Zhenqiang
Hi Peter, Rusty. I've written down my notes from Connect and drafted
the blueprints at:
https://wiki.linaro.org/MichaelHope/Sandbox/Q1.12ToDo
I've put my priories on things and sorted them by relative importance.
I'll check tomorrow if there's a good match between this, the KVMEpic
spec, and the older task spreadsheet.
Most of these are too short. I don't know what's involved with the
CP15 work. Some still have big PENDINGS. Please edit and correct.
There's a lot of work there. Some is on other teams but I'm concerned.
-- Michael
== GCC ==
* Merged Ira's remaining vectorizer patches into Linaro GCC 4.7.
* Merged Richard's sched-pressure patch (enabled by default
on ARM) into Linaro GCC 4.7. Asked IBM colleagues for
benchmark results on s390 and rs6000 as well.
* Committed patch to enable vect_condition tests in FSF mainline,
FSF 4.6, and Linaro GCC 4.7.
* Verified that Linaro GCC 4.7 now contains every feature that
was present in Linaro GCC 4.6.
* Rebased and re-benchmarked fwprop-subreg patch.
* Ongoing work on a patch to generate usat/ssat instructions
where appropriate.
* Worked on improving end-of-loop value computation.
* Fixed mainline .init_array/.fini_array regression on certain
newlib targets.
* Reviewed several merge requests.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
On holiday last week and at connect week before that.
==Progress===
* Recovered from jet-lag and started to catch up on email.
* Patch review week.
* Cleared out a bit of my patch backlog.
* Helped Asa with some bug triaging.
* Read up a bit about ssat and usat and what we should be doing with
other tuning for them.
* Looked up some performance profiles with v8 benchmarks . Reviewed the Android
benchmark numbers.
* Booked tickets for connect 2012.q2
* Wrote up connect week discussions on libav regressions .
=== Plans ===
* Clear out more pending patches ( PGO next on the hit list and the
vmla investigation)
* Investigate partial-partial PRE results.
* Finish off the ABI tests.
Absences.
* 1 week holiday sometime before that - to be booked.
* Linaro Connect Q2.12 - May 28 - June 1
Hi,
OpenEmbedded:
* added initial support for the Linaro GCC 4.6 based toolchain to
the meta-linaro layer.
* allows to build the Linaro toolchain the OE way
* successfully built the core-image-sato
but when running inside QEMU the GUI isn't usable
(needs investigation)
glibc:
* built glibc+glibc-ports natively on ARM
* the upstream glibc testsuite isn't aware of the multiarch paths
and therefore it won't find some libs (libgcc_s, libstdc++)
* workaround using symlinks; wiki updated:
https://wiki.linaro.org/KenWerner/Sandbox/HowToBuildToolchainComponents#gli…
Regards,
Ken
+ linaro-toolchain
Hello Ulrich,
I want to revisit this old thread. Sorry for the sloppy follow-up. But,
this time around I have more data.
On Tue, Mar 15, 2011 at 9:00 PM, Ulrich Weigand
<Ulrich.Weigand(a)de.ibm.com> wrote:
> Aneesh V <aneesh(a)ti.com> wrote:
>
>> I was trying to build u-boot in Thumb2 for OMAP4. Everything was fine
>> until I added some patches recently. One of these patches introduced an
>> API (let's say foo()) that has a weakly linked alias(let's say
>> __foo()) and a strongly linked implementation(the real foo()) in an
>> assembly file.
>>
>> Although I give -mthumb and -mthumb-interwork for all the files,
>> apparently GCC generates ARM code for assembly files. In the final
>> image foobar() calls foo() using a BL. Since foobar() is in Thumb and
>> foo() in ARM, it ends up crashing. Looks like foobar() assumed foo()
>> to be Thumb because __foo() is Thumb.
>
> I'm unable to reproduce this. Do you have a complete test case?
>
> I've tried with the following small example:
>
> foo1.c:
>
> extern void foo (void) __attribute__ ((weak, alias ("__foo")));
>
> void __foo (void)
> {
> }
>
> int main (void)
> {
> foo ();
> }
>
> foo2.S:
> .text
> .align 2
> .global foo
> .type foo, %function
> foo:
> push {r7}
> add r7, sp, #0
> mov sp, r7
> pop {r7}
> bx lr
> .size foo, .-foo
>
> When building just "gcc foo1.c", I get:
>
> 0000835c <__foo>:
> 835c: b480 push {r7}
> 835e: af00 add r7, sp, #0
> 8360: 46bd mov sp, r7
> 8362: bc80 pop {r7}
> 8364: 4770 bx lr
> 8366: bf00 nop
>
> 00008368 <main>:
> 8368: b580 push {r7, lr}
> 836a: af00 add r7, sp, #0
> 836c: f7ff fff6 bl 835c <__foo>
> 8370: 4618 mov r0, r3
> 8372: bd80 pop {r7, pc}
>
> When building both files "gcc foo1.c foo2.S", I get instead:
>
> 00008368 <main>:
> 8368: b580 push {r7, lr}
> 836a: af00 add r7, sp, #0
> 836c: f000 e802 blx 8374 <foo>
> 8370: 4618 mov r0, r3
> 8372: bd80 pop {r7, pc}
>
> 00008374 <foo>:
> 8374: e92d0080 push {r7}
> 8378: e28d7000 add r7, sp, #0
> 837c: e1a0d007 mov sp, r7
> 8380: e8bd0080 pop {r7}
> 8384: e12fff1e bx lr
>
>
> So it seems to me the linker is handling this correctly ...
>
> (This is on Ubuntu Natty using system gcc and binutils.)
I could reproduce the problem on older tool-chain(Sourcery G++ Lite
2010q1-202) [1] with a modified version of your sample code:
a.c:
====
extern void foo (void) __attribute__ ((weak, alias ("__foo")));
void __foo (void)
{
}
extern void call_foo(void);
int main (void)
{
call_foo ();
}
b.S:
====
.text
.align 2
.global foo
foo:
push {r7}
add r7, sp, #0
mov sp, r7
pop {r7}
bx lr
.size foo, .-foo
c.S
.text
.align 2
.global call_foo
call_foo:
bl foo
bx lr
.global __aeabi_unwind_cpp_pr0
__aeabi_unwind_cpp_pr0:
bx lr
Now, I build them using the following commands, which is similar to
what U-Boot does:
arm-none-linux-gnueabi-gcc -mthumb -mthumb-interwork -c a.c
arm-none-linux-gnueabi-gcc -mthumb -mthumb-interwork -c b.S
arm-none-linux-gnueabi-gcc -mthumb -mthumb-interwork -c c.S
arm-none-linux-gnueabi-ld -r a.o -o alib.o
arm-none-linux-gnueabi-ld -r b.o -o blib.o
arm-none-linux-gnueabi-ld -r c.o -o clib.o
arm-none-linux-gnueabi-ld --start-group clib.o alib.o blib.o
--end-group -o a.out
armobjdump -S --reloc a.out
You will get something like:
00008094 <call_foo>:
8094: fa000006 blx 80b4 <foo>
8098: e12fff1e bx lr
Please note that that the 'blx' is not correct. Now, do the following change:
diff --git a/b.S b/b.S
index e0f2de9..96dba1f 100644
--- a/b.S
+++ b/b.S
@@ -1,5 +1,6 @@
.text
.align 2
+.type foo, %function
.global foo
foo:
push {r7}
And build it again the same way and you will see:
00008094 <call_foo>:
8094: eb000006 bl 80b4 <foo>
8098: e12fff1e bx lr
I can't reproduce this on Linaro GCC 2012.01, so looks like the problem
is solved in recent tool-chains. However, sadly I could reproduce a
different but similar problem with Linaro GCC 2012.01. This time the
call is from C(Thumb) to assembly(ARM) and no weakly linked symbols are
involved.
a.c:
====
int main (void)
{
foo ();
}
b.S:
====
.text
.align 2
.global foo
foo:
push {r7}
add r7, sp, #0
mov sp, r7
pop {r7}
bx lr
.size foo, .-foo
.global __aeabi_unwind_cpp_pr0
__aeabi_unwind_cpp_pr0:
bx lr
arm-linux-gnueabi-gcc -mthumb -mthumb-interwork -c a.c
arm-linux-gnueabi-gcc -mthumb -mthumb-interwork -c b.S
arm-linux-gnueabi-ld -r a.o -o alib.o
arm-linux-gnueabi-ld -r b.o -o blib.o
arm-linux-gnueabi-ld --start-group alib.o blib.o --end-group -o a.out
arm-linux-gnueabi-objdump -S --reloc a.out
gives:
8076: af00 add r7, sp, #0
8078: f000 f802 bl 8080 <foo>
807c: 4618 mov r0, r3
It should have been "blx 8080 <foo>", isn't it? Again, %function
solves it.
I agree that not marking the assembly functions ' %function' is a problem
in the code, so it's not a critical bug. But I would've been happier if
the linker refused to link it rather than branching with the wrong
instruction. Isn't that a problem?
Problem No:2
*************
Linaro GCC 2012.01 is giving a new problem w.r.to Thumb build
that is not existing in Sourcery G++ Lite 2010q1-202. However, I
couldn't reproduce this problem with a small program like above. So,
let me give you reference to the original u-boot code that shows the
problem and steps to reproduce it.
tree: git://github.com/aneeshv/u-boot.git
branch: thumb
The above branch has mainline u-boot with 4 additional patches from me
for enabling Thumb build. You can build it like this:
make CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm distclean
make CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm omap4_sdp4430
This builds two images u-boot and u-boot-spl. SPL is a tiny u-boot that
runs from internal RAM and loads the u-boot to SDRAM. Now, please have
a look at the map file of u-boot-spl which is at: spl/u-boot-spl.map
I see the following in my map file:
/spl/u-boot-spl.map:
=================
.rodata.wkup_padconf_array_essential_4460
0x40309583 0x4 board/ti/sdp4430/libsdp4430.o
0x40309583 wkup_padconf_array_essential_4460
.rodata.wkup_padconf_array_essential
0x40309587 0xc board/ti/sdp4430/libsdp4430.o
0x40309587 wkup_padconf_array_essential
.rodata.core_padconf_array_essential
0x40309593 0x60 board/ti/sdp4430/libsdp4430.o
0x40309593 core_padconf_array_essential
Please note that the .rodata symbols have odd addresses. These arrays
actually need to be aligned at least to half-word boundary. In fact, in
the image I verified that they are put at even addresses. So, the
symbols have been kept as real address +1. I understand that this is
the convention for Thumb functions. I guess the tool-chain did it for
data too?? And I am getting unaligned access aborts on accessing them.
I notice that this doesn't happen with all .rodata. symbols in the
image. I couldn't see any difference between working and non-working
files nor any difference in the command used to build them!
Well, this doesn't happen if I don't use "-fdata-sections" in gcc
options. So, apply the following patch and you will see that those
symbols have even addresses now.
diff --git a/config.mk b/config.mk
index ddaa477..723286a 100644
--- a/config.mk
+++ b/config.mk
@@ -190,7 +190,7 @@ CPPFLAGS := $(DBGFLAGS) $(OPTFLAGS) $(RELFLAGS) \
# Enable garbage collection of un-used sections for SPL
ifeq ($(CONFIG_SPL_BUILD),y)
-CPPFLAGS += -ffunction-sections -fdata-sections
+CPPFLAGS += -ffunction-sections
LDFLAGS_FINAL += --gc-sections
endif
/spl/u-boot-spl.map:
=================
.rodata 0x40309204 0x38c board/ti/sdp4430/libsdp4430.o
0x40309204 core_padconf_array_essential
0x40309264 wkup_padconf_array_essential
0x40309270 wkup_padconf_array_essential_4460
0x40309274 core_padconf_array_non_essential
0x40309540 wkup_padconf_array_non_essential
0x40309588 wkup_padconf_array_non_essential_4430
Will you be able to look into these?
Thanks,
Aneesh
[1] Sourcery G++ Lite 2010q1-202
arm-none-linux-gnueabi-gcc (Sourcery G++ Lite 2010q1-202) 4.4.1
GNU ld (Sourcery G++ Lite 2010q1-202) - binutils 2.19.51.20090709
[2] Linaro 4.6-2012.01
arm-linux-gnueabi-gcc (crosstool-NG linaro-1.13.1-2012.01-20120125 -
Linaro GCC 2012.01) 4.6.3 20120105 (prerelease)
GNU ld (crosstool-NG linaro-1.13.1-2012.01-20120125 - Linaro GCC 2012.01) 2.22
Hi!
* Development benchmarks:
Focused on the implementation of development benchmarks in the cbuild
system. Discussed with Michael about how the comments should look.
I am now testing my solution in the launchpad staging area. You can see a
link to an example of a benchmark result comment below. (Should not be a
double comment of course, and the actual result files are only stored on my
machine for now.)
https://code.staging.launchpad.net/~ramana/gcc-linaro/partial_partial_pre_t…
The real benchmark names are not mentioned, since that might be sensitive.
* Bug reports
I got an action on Monday's meeting to investigate if
https://bugs.launchpad.net/ubuntu/+source/gcc-4.6/+bug/926855
is the same as PR52294.
Investigated with help from Ramana and updated the ticket.
Regards
Åsa
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-??-?? || ||
(new blueprints & reestimate for this one pending Michael H's writeup/etc)
Historical Milestones:
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| 2012-02-01 ||
== cp15-rework ==
* figuring out how various suggestions/conversations on this from
Connect ought to fit together
== other ==
* LP:928580 recent u-boot not booting on QEMU beagle: tracked
this down to a u-boot bug and passed to John Rigby
* investigated a linux-user segfault deadlock which turns out to
be because the guest program is using setrlimit to restrict
memory usage and one of qemu's internal memory allocations then
fails. Not clear how best to fix but was able to nack the suggested
patch...
* most of the patches for passing a device tree to the kernel from
qemu (via -dtb option) are now upstream; rebased the final patch
which should now go in soon I think
* LP885239 qemu-linaro broke booting of zaurus models: fixed a bug
where a trustzone support patch had broken some cp15 registers for
xscale cpus
* minor patch: fixed formatting of list of supported machines help text
* fix an embarrassing segfault bug in a patch of mine committed last week
* it turns out that we can detect at compile time whether glibc's
makecontext() is an always-fails stub : sent configure patch to do this
* travel mostly set up for connect Q2.12
* lots of meetings/training: toolchain call; standup; 1-2-1;
project management training; comms meeting