Hi!
* V8 - SunSpider
The pieces for building in cbuild and parsing with the
linaro-toolchain-benchmark scripts are in place. Ran the SunSpider
benchmark across a few toolchains with the o2-neon and o3-neon variants.
I have documented my work on this page, but not analyzed the results yet.
https://wiki.linaro.org/AsaSandahl/Sandbox/JavaScriptBenchmark
What worries me is that there is too much variation in some of the test
results. Probably caused by the garbage collection kicking in at
unpredictable times. I will try to monitor the gc and investigate if it can
be controlled somehow.
I would like to do a gcc-4.4 run as well. Android's original toolchain is
based on gcc-4.4.3(?) and that is what I compared against when doing
measurements internally.
However, I have problem compiling C++ with gcc-4.4 - it is this one again:
home/asa-san/cbuild/slaves/ursa3/gcc-4.4.5/gcc-binary/bin/../libexec/gcc/armv7l-unknown-linux-gnueabi/4.4.5/cc1plus:
/home/asa-san/cbuild/slaves/ursa3/gcc-4.4.5/gcc-binary/lib/libstdc++.so.6:
version `GLIBCXX_3.4.14' not found (required by /usr/lib/libppl.so.7)
* Handed over daily tasks to Zhenqiang.
Regards
Åsa
Hi,
OpenEmbedded:
* verified that the release candidate of our 2012.03 toolchain
(both source and binary) is able to build the sato and qt4e images of
oe-core+meta-linaro - they are booting fine using QEMU
* out sick starting from Tue :/
Regards,
Ken
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-03-30 || ||
Historical Milestones:
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| 2012-02-01 ||
== cp15-rework ==
* converted crn=1; still TODO: crn=0, some loose ends; then reassess
the design in the light of experience doing register conversion
* I've estimated another two weeks here but this might be out, because
in practice much of my time is sucked up by 'other' issues
== other ==
* tracked down cause of LP:947888 gpg crash bug: newer glibc use
/proc/self/maps to decide whether a printf format string using '%n'
is in writable memory, and qemu's maps emulation wasn't good enough
* fixed a thumb decoder bug where we were treating 'setend' like 'cps'
* investigated whether we had any conveniently testable cores which take
advantage of the ARM ARM latitude for UNDEFfing even on failed condition
code checks (answer, not really but the KVM code to handle this case
should be small enough not to worry about its not-yet-tested nature)
* qemu-linaro 2012.03 release (lots of bug fixes, plus exynos4 and
highbank models thanks to Samsung and Calxeda)
* code review: imx31 board patches
* rebased qemu-linaro on upstream and applied some new patches from
Christoffer for ARM KVM support
* LP:956799: added ppoll to QEMU arm-linux-user (a one liner...)
* boot-wrapper: moved initrd load address up so we can handle large
kernels (like Android!)
* sent pullreqs for target-arm, arm-devs patchqueues
== GCC ==
* Checked first part of fwprop-subreg patch into mainline.
* Checked Ira's vectorizer patches into mainline.
* Ongoing work on improving end-of-loop value computation.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
The final version of the 'Building at -O3' writeup is at:
https://wiki.linaro.org/Internal/ToolChain/BuildingAtO3
This updates the SPEC 2000 results to show that there is a net win
which is held back by a bad (but understood) regression in one
benchmark.
Thank you all for your work on this,
-- Michael
The Linaro Toolchain Working Group is pleased to announce the release of
Linaro QEMU 2012.03.
Linaro QEMU 2012.03 is the latest monthly release of qemu-linaro. Based
off upstream (trunk) QEMU, it includes a number of ARM-focused bug fixes
and enhancements.
Highlights in this month's release:
- we now default to enabling 'reserve memory for guest' on 64 bit hosts
in linux-user-mode. This significantly reduces the chances of QEMU
being unable to satisfy a guest process mmap() request.
- Fix for a bug that was causing spurious failures of the glibc check
for "%n in a format string must be in a read-only area of memory"
when running in linux-user-mode.
- QEMU's built-in boot loader now supports passing a device tree blob
to the kernel: if you boot with -kernel mykernel (and optionally
-initrd myinitrd) you can now also use the new command line option
-dtb my.dtb to pass a device tree.
- This version includes an initial implementation of a model of the
Samsung Exynos4210 SoC, used by board models 'nuri' and 'smdkc210'
(thanks to Evgeny Voevodin, Maksim Kozlov, Igor Mitsyanko and
Dmitry Solodkiy from Samsung, who submitted this work to upstream
QEMU).
- This version includes an initial implementation of a model of the
Calxeda Highbank SoC, used by board model 'highbank' (thanks to Rob
Herring and Mark Langsdorf of Calxeda, who submitted this work to
upstream QEMU).
- various other minor bug fixes (detailed in the Changelog.LINARO).
Known issues:
- Graphics do not work for OMAP3 based models (beagle, overo)
with 11.10 Linaro images.
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2012.03
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The GCC release tested up just fine. The branch is now open for commits.
The next release is Thursday the 12th of April. Note that this is the
week after Easter.
-- Michael
The Linaro Toolchain Working Group is pleased to announce the 2012.03
release of both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 2012.03 is the thirteenth release in the 4.6 series. Based
off the latest GCC 4.6.3 release, it contains a new scheduler pressure pass,
implements new instructions, and contains a number of bug fixes.
Interesting changes include:
* Updates to 4.6.3.
* Better performance by accounting for register pressure when
scheduling instructions.
* Support for the ARMv6 USAT/SSAT saturation instructions.
* Support for the VFP VCVT fixed to floating point conversion instruction.
Fixes:
* LP: #922474 Bug in __sync_lock_release with 64 bit primitives
* LP: #923397 Alignment attribute has no effect under certain conditions
* LP: #926855 [ARMhf] gcc produces assembler it can't compile
* LP: #936863 ICE in constprop.2 (ARM NEON related?)
* LP: #942307 'asm' operand requires impossible reload
* LP: #952565 Not compliant with the ABI for multi-register NEON intrinsics
Linaro GCC 4.5 2012.03 is the nineteenth release in the 4.5 series. Based
off the latest GCC 4.5.3+svn184976 release, this is a maintenance only
update.
Interesting changes include:
* Updates to 4.5.3+svn184976.
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2012.03https://launchpad.net/gcc-linaro/+milestone/4.5-2012.03
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release page:
https://launchpad.net/gcc-linaro/4.6/4.6-2012.03
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? Inquire at support(a)linaro.org
-- Michael
EEMBC have announced AndEBench, a mixed native/Java benchmark for
Android. It's available on the Market.
I don't know much more. It seems to be CPU bound and the test names
sound a bit like CoreMark. The source is available for license and
might be worth the Android guys looking into.
The awesome thing is they used the 2012.01 Linaro Binary toolchain
release to build the native parts - see the compiler version string in
the log :)
-- Michael
Hi,
GDB for Android:
* Wrote first draft of the GDB for Android card.
* Found out that there actually is a libthread_db.so in /system/lib
and was able to compile (after a small hack) an FSF gdbserver 7.3
which uses the libthread_db.so from Android and correctly shows
all threads in the process. So the libthread_db.so from Android
actually works, still have to learn why they don't use it.
* Tried to compile a native GDB which uses libthread_db.so but no
luck so far. There are many differences in bionic's header files
which upset libiberty and gnulib which are not easy to work around.
--
[]'s
Thiago Jung Bauermann
Linaro Toolchain Working Group
We now do a SPEC ref run when benchmarking. I've updated the
linaro-toolchain-benchmarks scripts to handle the changes and spawned
some historical builds to do comparisons. All earlier builds have
been archived to prevent confusion.
The human readable summary is now also included in the results. See:
http://ex.seabright.co.nz/benchmarks/gcc-linaro-4.6-2012.02/logs/armv7l-nat…
for an example.
Half an hour to build, 26 hours to run. Not too bad.
-- Michael
Here's brief notes on running different benchmark variants across the
auto builders. Asa, could you pull these plus your notes into a wiki
page?
Spawn a job:
http://ex.seabright.co.nz/helpers/scheduler/spawn
Merge requests are automatically built. Otherwise, drop arbitrary
tarballs into cbuild@orion:~/snapshots and spawn <tarball name minus
extension>. For example, scp gcc-4.6.3.tar.gz
cbuild@orion:~/snapshots; spawn gcc-4.6.3 into a9-builder.
Jobs:
* gcc-version - build and test GCC
* benchmarks-gcc-version - run coremark, denbench, eembc against the
already built version
* benchmarks-spec2000-gcc-version - run spec2000
Queues:
* a9-builders: anything that can naive build a A9 compiler
* a9-ref: reference A9 boards
* a8-ref: reference A8 boards
Variables:
* BENCHMARKS = list, such as coremark spec2000 pybench - run these
benchmarks instead of the defaults
Variants:
* By default we build o3-neon
* See http://bazaar.launchpad.net/~linaro-toolchain-dev/cbuild/trunk/view/head:/l…
for all names
* Spawn a job with VARIANT_SRC = all and VARIANT_LIST = glob-pattern
Examples:
* VARIANT_LIST = o3-neon o3-vfpv3 (compare NEON with VFPv3D32)
* VARIANT_LIST = o3-neon-cortexa8 o3-neon-cortexa9 (compare
-mtune=cortexa8 vs -mtune=cortexa9(
* VARIANT_LIST = o3-neon-novect o3-neon (compare with/without the vectoriser)
-- Michael
I'd like to announce a change in how the Linaro Toolchain group notify
about our monthly releases. In the past we've sent one email per
product to the linaro-announce list. From this week forward a summary
of all products will be included in the main, end of month
announcement instead.
We'll continue to send a per product emails to the linaro-toolchain
list when the mid-month release is available. If you'd like to get
things two weeks early, please subscribe[1]. You can filter on the
word '[ANNOUNCE]' to filter out the development chatter. A RSS feed
is also available[2].
-- Michael
[1] http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[2] http://feeds.launchpad.net/linaro-toolchain/announcements.atom
All,
I need to supply a Linaro toolchain "aligned" (same source code)
bare-metal compiler to a group doing benchmarking on A15.
First off my assumption is that we will write our own boot and semi
hosting code. (semi-hosting for TI emulators/simulators is different
than ARM RDI semi-hosting.)
I was planning on looking at the two toolchains here[1] and here [2]:
[1] https://launchpad.net/linaro-toolchain-binaries
[2] https://launchpad.net/gcc-arm-embedded
I was then going to build a hybrid that was newlib based but appropriate
for armv7-a (instead of cortext-m3) and maybe even -mtune'ed for A15.
However looking at the gcc-arm-embedded release more[3] I see that it
supports ARMv7-R. It supports both thumb and non-thumb modes, both
softfp and hardfp ABIs.
What would I really gain by building my own? For app code the user
should be able to add -mtune=cortex-a15 and still be compatible with the
pre-built R4/R5 libraries. The only performance difference should be in
the library code and that should be only pipeline tuning if I understand
the difference between armv7-a and armv7-r correctly.
Am I missing something? Should I build my hybrid anyway?
[1] https://launchpadlibrarian.net/88152755/readme.txt
[BTW: has the below project been obsoleted by the gcc-arm-embedded one?
Perhaps gcc-arm-embedded should be referenced in the description of
the page below.
https://launchpad.net/linaro-toolchain-unsupported ]
Thanks,
Bill
Committed the 4.6.3 merge to Linaro GCC 4.6.
Merged from FSF 4.5 to Linaro GCC 4.5.
Thought about how to do register-class allocation for NEON v. core
registers case. Discussed the problem at the GCC performance meeting.
Lots of interesting discussion was had. I have some interesting
experiments to do, but the first step needs to be to get my 64-bit
operator patch to work correctly.
Tried to track down the problems in my NEON-immediates patch. The
problem is that it's putting constants in different pools in stage 3 to
what it does in stage 2. Presumably this is something to do with the
pool-offset attributes in the instruction patterns? My patch has caused
it to switch from using 'fldd' to using 'vmov' in some cases (the
instructions are aliases, but it's indicative of a different machine
description pattern in the compiler), but I don't know why the change
would be different between the different bootstrap stages? I made some
alterations to switch it back to `fldd` when it wasn't supposed to
change, but bootstrap still fails. More investigation required...again.
Rewrote the NEON one's-complement patch and posted it upstream.
Tracked down the cause of the bootstrap failures in my NEON 64-bit
shifts patch: out-of-range shift amounts were not handled leading to
ICEs. Reworked the constant shift handling cases, and resumbitted the
patch for testing. It's failed again. A job for next week.
Summary:
* Multilib test for linaro toolchain.
* Code size test for embedded toolchain.
Details:
1. Multilib test for linaro toolchain.
* Workaround the marm/march=armv5t build by setting the
MULTILIB_DEFAULT to mthumb.
* Investigate how to make multiarch and multilib work together.
Based on current multiarch/multilib patches, it is hard to make them
work together. Trying to set the default multilib dir to the multiarch
dir to workaround it. Need more work to build libgcc.
2. Code size investigation for embedded toolchain.
* Try to test benchmarks.
* Run gcc regression test with –mthumb/-mcpu=cortex=m3/-Os and
analyze the new failed cases. After skipping the cases based on
scan-assembler, warning/error message, etc, there are 5 new failed
cases in three categories:
(1) gcc.target/arm/neon/vst1_lanes64.c: known issue
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51631
(2) Wrong optimization for gcc.dg/atomic-lockfree.c,
gcc.dg/atomic-noinline.c and gcc.dg/pr30951.c.
(3) g++.dg/eh/filter1.C abort in libsupc++.
Plans:
* Root cause the new failed cases.
* Continue to investigate the multilib/multiarch issue.
Best regards!
-Zhenqiang
Hi Michael,
Spec2000 is now running successfully: there are results at
http://ex.seabright.co.nz/benchmarks/gcc-linaro-4.7%2bbzr114965/logs/armv7l…
However, they seem to have been run in 'train' mode, and none of the
runs are long enough for comparison.
I thought the a9-ref queue was supposed to run in full 'ref' mode?
Andrew
Hi,
OpenEmbedded:
* removed the recipe for building the linux-linaro-3.1 kernel
* add support for the default OE-core kernel
* allows to build the linux-yocto_3.2 kernel for the
qemuarmv7a MACHINE using a vexpress defconfig
* updated the wiki on how to build OE-core with meta-linaro
https://wiki.linaro.org/KenWerner/Sandbox/OpenEmbedded-Core
* worked OEMetaLinaroCard
* started to talk to CI folks about automating the oe-core+meta-linaro
* wrote a script that automates the process
* ran it on tcserver01 to build the qt4e and sato images against the
source gcc-linaro-4.6-2012.02
Regards,
Ken
Hi Ken. In follow up to our 1-on-1 yesterday, here's what I'd like done next.
The goal is to use OE Core as a release test suite. The releases are
tarballs so we can keep the current recipe format and punt bzr support
for later. The first step is to be able to reliably build a release
in the cloud or validation lab.
In all cases keep the other teams in mind. Much of this is related to
Validation. Platform will be involved later. Ping them early.
Kernel:
We're starting with GCC and need a kernel to supply headers and to
boot some type of ARMv7 image. I don't want a linux-linaro recipe as
people will use it and it's too early for that.
Find a kernel, preferably from OE Core, that is recent, ARMv7, >= 512
MB RAM, and works well with qemu-linaro. Prefer vexpress-a9, else
OMAP?
Talking:
Say Hi to Validation re: EC2 and plans
Say Hi to the ARM landing team re: vexpress upstream support
Say Hi to Beth Flanagan re: Yocto's existing auto builders and any hints
Cloud builds:
Find out who is already doing OE builds in the cloud and how
Run a build locally and time
Push ~/downloads into the cloud, build, and time[1]
Figure how much this build will cost in dollars
[1] c1.xlarge might be best. Builds are normally I/O bound and the
cloud is I/O poor. Put /tmp and other chunks in a tmpfs? EC2 rounds
up to the nearest hour as well.
If the cloud is too expensive then we'll get a machine installed.
S3 for storage:
(only proceed if affordable)
Use S3 for storing the input tarballs
Use S3 either as a pre-mirror by serving over HTTP, or use s3cmd to
sync down the tarballs before starting the build
Scripting:
Re-use existing scripts if feasible. Integrate with LAVA providing we
can run exactly the same scripts on a laptop for debugging.
Script the bitbake, OE meta layer, and Linaro meta layer setup.
Script the configuration including setting the release tarball URL and
GCC preferred version.
Script the build and result capture, especially the log, any ICEs, and
the final sizes
Future:
OE can grab a repository seed then update based on that. Check if the
bzr backend supports this. If so, play with seeding to do tip builds.
Let me know what you think then we'll spawn blueprints. Let's keep an
eye on this as it's sounding expensive.
-- Michael
Hi Ken, Thiago. Could you try your hand at writing cards for the
OpenEmbedded Core meta-layer and GDB and Android? Here are some past
cards:
https://linaro-public.papyrs.com/TCWG2011-GCC-O3https://linaro-public.papyrs.com/TCWG2011-OPENOCD-SUPPORThttps://linaro-public.papyrs.com/TCWG2011-WINDOWS-TOOLCHAIN
They cover:
* An introduction
* The why/advantages
* The what/features
* The how/steps
* Dependencies
* Acceptance criteria (which is a post-tense version of the body)
A card should cover three calendar months. Check the unknowns - we
need to investigate the acceptance criteria and make sure there is no
unexpected side work in there.
We use roadmap cards as the highest level of organising our work.
Cards are the interface between working groups and the TSC and sit at
the project brief / deep but concise level.
Draft them on the wiki (cf https://wiki.linaro.org/KenWerner/Sandbox)
and we'll go from there.
-- Michael
== Issues ==
It would be nice to have perf installed on the porter boxes in the
canonical data center as well if we are allowed to run benchmarks
there. Filed RT request.
==Progress===
* Understood STB_GNU_UNIQUE_NOTE - Helped fix a problem with compiz
crashing but then it was a very nice testcase to investigate the
toolchain issue with. Thanks Alexandros.
* Merged pending patches.
* ABI fix
* Fix for ICE - LP bug 942307
* Off sick on Monday.
* Did some work to improve code generation for addressing modes in VFP
registers.
* Did some work in setting up SPEC2k for running hc partitioning
patches. Still needs to be completed.
=== Plans ===
* Ping configure.ac changes.
* Complete pending benchmark run with hc partitioning and look at
results with hc partitioning and run things. Was unable to run the vfp
benchmarks today as ex.seabright.co.nz was down.
* Resurrect partial-partial PRE
Absences.
* 1 week holiday sometime before that - to be booked.
* Linaro Connect Q2.12 - May 28 - June 1 - travel booked - hotel to be booked.
Hi Ken. This month's release is next week. Let's be aggressive and
see if we can use your meta-linaro layer to check the source release.
I want to use tcserver01 in the validation lab instead of the cloud
until we know how much it costs. I've added you an account and done a
test build to check that the dependencies are there. There's a shared
directory in /home/shared that includes downloads and a sstate-cache.
The machine seems fast enough for the job.
The tarball should be ready around Tuesday next week. Once ready,
could you mix it in with the meta-linaro layer, do a build, boot the
rootfs in qemu, and document as you go? Anything past that is
welcome.
It's all manual but a nice proof of concept. Can we check the binary
build as well?
-- Michael
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-??-?? || ||
(new blueprints & reestimate for this one pending still; will try to
do this early next week)
Historical Milestones:
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| 2012-02-01 ||
== cp15-rework ==
* converted cp15 crn={6,7,9}, still TODO just crn={0,1}. These take
longer than I was expecting because to find some useful compromise
between QEMU's previous (usually broken) behaviour and reality I
have to cross reference half a dozen different TRMs and several
revisions of the ARM ARM.
== other ==
* getting qemu-linaro into shape for next week's release:
applying patches, writing changelog, etc
* patch review for more exynos4 devices
* tracked down a regression in the spitz (zaurus) model,
sent patch
* first pass comments on blueprints for this quarter
* investigating LP:947888 (gpg crashes under qemu)
-- PMM
Hi Alexandros,
Could you use the linaro-toolchain list for stuff like this please?
You're more likely to find somebody who knows the answer that way.
I'm pretty sure the problem is not the compiler because, as far as I can
see, both architectures' compilers emit ".weak" directives. If there is
a problem, I'd say it's in the linker.
Your test case gives two different addresses on Lucid x86, and on ARM
(so you say, I've not tested it), but the same address twice on Precise.
This is a surprising result. *I* would have expected that static values
in different dlopen'd libraries would not be unified, but apparently
they are ... somtimes.
I'm afraid I don't really have any insight here. :(
Anyway, regardless of whether one is correct, or not, I'd suggest *not*
relying on this behaviour - it's clearly not portable. I say leave it at
arm's length in production software for a few years.
Andrew
On 06/03/12 14:27, Alexandros Frantzis wrote:
> On Tue, Mar 06, 2012 at 09:51:01AM +0800, Sam Spilsbury wrote:
>> On Mon, Mar 5, 2012 at 11:50 PM, Alexandros Frantzis
>> <alexandros.frantzis(a)linaro.org> wrote:
>>> Hi all,
>>>
>>> this is an update on my progress with the updated compiz branches.
>>>
>>> I have been trying to run our update compiz branches
>>> (compiz-*/linaro-gles2-update) on ARM (precise armhf), but I have stumbled onto
>>> the same issue Marc reported some days ago. In particular, I get:
>>>
>>> /usr/bin/compiz (core) - Fatal: Private index value "15CompositeWindow_index_4" already stored in screen.
>>> /usr/bin/compiz (core) - Fatal: Private index value "15CompositeScreen_index_4" already stored in screen.
>>>
>>> and then a segfault when I try to run compiz.
>>>
>>> Note that I *don't* have this problem when running on x86_64 precise.
>>>
>>> The issue can be recreated with:
>>>
>>> $ compiz composite opengl
>>>
>>> I added some debugging messages to pluginclasshandler.h to get a better
>>> feeling of what is going on, and ran on both my desktop and on ARM.
>>> This is the output near the point when GLScreen get initialized:
>>>
>>> ...
>>>
>>> compiz (core) - Info: get(): mIndex.initiated for "8GLScreen_index_4" : 0
>>> compiz (core) - Info: initializeIndex(): Initializining index value "8GLScreen_index_4"
>>> compiz (core) - Info: initializeIndex(): Private index value added for "8GLScreen_index_4"
>>> compiz (core) - Info: getInstance(): Get instance for "8GLScreen_index_4"
>>> compiz (core) - Info: getInstance(): Spawning new class for "8GLScreen_index_4"
>>> compiz (core) - Info: ctor(): mIndex.initiated for "8GLScreen_index_4" : 1
>>> compiz (core) - Info: ctor(): Increasing reference count for "8GLScreen_index_4": 1
>>>
>>> --- x86_64 ---
>>> compiz (core) - Info: get(): mIndex.initiated for "15CompositeScreen_index_4" : 1
>>> --- armhf ---
>>> compiz (core) - Info: get(): mIndex.initiated for "15CompositeScreen_index_4" : 0
>>> compiz (core) - Info: initializeIndex(): Initializining index value "15CompositeScreen_index_4"
>>> compiz (core) - Fatal: initializeIndex(): Private index value "15CompositeScreen_index_4" already stored in screen.
>>
>> After the composite plugin loads and mIndex.initiated is set to 1,
>> place a watchpoint on mIndex.initiated (it should be a separate
>> template instantiation for each different class) and check if it
>> changes, or check if we are reading mIndex.initiated from a different
>> address, and if so, check the addresses of this for each constructor
>> and destructor being called. (could be a compiler bug, I've hit these
>> on this part of the code before).
>>
>>> -------------
>>>
>>> In the armhf case, CompositeScreen is erroneously considered not
>>> initialized, and is initialiazed again, therefore messing up the plugin system.
>>>
>>> I am trying to figure out if this is a manifestation of some kind of memory
>>> corruption that doesn't affect us on x86_64 for whatever reason (alignment,
>>> integer size etc), or something completely different.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Alexandros
>>
>>
>>
>> --
>> Sam Spilsbury
>>
>
> Hi all,
>
> (I have also added Michael, Andrew and Ulrich from the Linaro toolchain group
> to the recipients. Hi!)
>
> Checking the addresses, as Sam suggested, I found that there are two different
> PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex and
> PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex objects.
>
> After a bit of investigation, objdump gave an explanation:
>
> objdump -t /usr/lib/compiz/libcomposite.so | c++filt | grep mIndex
>
> -- x86_64 --
> 0000000000277a80 u O .bss 0000000000000010 PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex
> 0000000000277a70 u O .bss 0000000000000010 PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex
> -- armhf --
> 00065648 w O .bss 00000010 PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex
> 00065658 w O .bss 00000010 PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex
> ------------
>
> And the same kind of output for libopengl.so
>
> On x86_64 the symbols are marked 'u': 'unique global', whereas on armhf
> they are marked 'w': 'weak'. This seems to be causing our troubles.
>
> I have produced a small test case for this:
>
> http://people.linaro.org/~afrantzis/cpp_unique_global.tar.gz
>
> Building and running 'LD_LIBRARY_PATH=. ./main' on x86_64 prints out f1 and f2
> with the same address, whereas on armhf the addresses are different (i.e. two
> different objects). On x86_64 the symbol A<int>::a is 'u', on armhf it is 'w'.
>
> For completeness, when running without templates (edit a.h to change) the two
> printed addresses are different on both x86_64 and armhf. Also A::a is 'g':
> 'normal global' for both.
>
> Michael, Andrew, Ulrich can you please give us some insight into the situation?
> Does this seem like a compiler or linker bug on ARM, or is the code depending
> on undefined behavior, or something different? I have pasted the used g++
> versions at the end of the email.
>
> Thanks,
> Alexandros
>
> --- g++ x86_64 --
> Using built-in specs.
> COLLECT_GCC=g++
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu2)
>
> --- g++ armhf --
> Using built-in specs.
> COLLECT_GCC=g++
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
> Target: arm-linux-gnueabihf
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
> Thread model: posix
> gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu2)
>
Hi!
* Bug reports
Made an effort clean up among the remaining not-triaged bug. Michael will
help out with 941676, where the failure is on power-pc.
* Wiki
Created a wiki page for running benchmarks in cbuild:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/RunningBenchmark…
* V8
Build system has changed in V8 from SCones to GYP. With GYP I can pass the
normal flags like CXXFLAGS to control the build, so this looks like a good
change.
I have created a cbuild make file, patterned after the other benchmark make
files.
Working on x86 sofar, will go for arm next week. Will also add a parser for
the results.
Regards
Åsa
== GCC ==
* Checked patch to generate usat/ssat instructions into
Linaro GCC 4.6 and 4.7.
* Submitted (first part of) fwprop-subreg patch for mainline.
* Submitted Ira's vectorizer patches for mainline.
* Ongoing work on improving end-of-loop value computation.
* Patch review week.
== Misc ==
* On leave 3/9 - 3/14.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi
Updated sysroots for binary toolchain are available at [1]. This time I
split -dev and -dbg sysroots so as long as dbgsym are not needed less
disk space is used.
Please take a look at them and report any issues.
1. http://people.linaro.org/~hrw/generic-linux/sysroots/20120301/
Hello,
My name is Jiandong Zheng and am working on a few ARM specific projects.
I have been using Linaro toolchain coming with Ubuntu to do Cortext-A9
building. I also need to maintain code for old ARM11 SoC so that Linaro
toolchain is my preference and it did the job well back in Maverick days
if I remember correctly.
However, in Precise, I have troubles running this ARM11 u-boot and
kernel built with Linaro toolchain, eventually I found that my old gcc
4.3.2 toolchain can still build workable u-boot and kernel so that
Linaro toolchain (gcc 4.6.2) seems the problem. I understand that
Linaro's focus has been ARMv7 or above but I am wondering if its
toolchain can still be used for older ARM architectures.
Thanks,
JD
Hi Michael,
A new bug triaging question.
https://bugs.launchpad.net/ubuntu/+source/ppl/+bug/941676
This one is special too because the failure is on powerpc.
This means, I cannot reproduce it easily, and scan through the toolchains
(without building).
Matthias has given some information and a guess of what is failing. My idea
was to just summarize this, and let the person who takes on the bug make
the powerpc investigation.
How do we handle bugs on other architectures in general?
Best regards
Åsa
Hi!
I need a little help with triaging of this bug, it is a little different
from the ones I have triaged so far:
https://bugs.launchpad.net/launchpad/+bug/945503 - gcc-4.7 branch imports
fail (timeouts)
It is already set to Confirmed, my question is what is needed to go to the
triaged stage?
Also, the comments from wgrant and jelmer somewhat indicates that there is
no problem.
Best regards
Åsa
Hello,
since this long-standing problem just hit me again, I had a quick look at
test failures in our farm that appear to occur whenever the directory name
contains a string that is being checked for via a "scan-assembler" test,
e.g.:
+FAIL: gcc.target/arm/sat-1.c scan-assembler-times ssat 4
+FAIL: gcc.target/arm/sat-1.c scan-assembler-times usat 4
I had been assuming this happens because the directory name somehow finds
its name into the assembler file, e.g. via a .file directive or DWARF data,
and therefore an additional instance of the search string is being detected
erroneously.
However, when I attemted to re-create the scenario by building myself
within a directory that has the same name, but on my local machine, the
tests when through fine. And indeed, inspecting the assembler source shows
that that there is no debug info at all; while there is a .file directive,
it contains just the file base name without any directory name; and in fact
the directory name does not show up at all within the assembler file.
Something must still be different in the runs that take place on the build
farm; but I currently do not understand what that might be.
Michael, is there a way to interact with the build process on the build
farm machines themselves to try and find out what's going on here?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi toolchain folks,
I have a problem with U-boot compiled for an ARMv4 system (Integrator)
using Linaro 2012.02-20120222, it just crashes. Compiling the same
U-Boot with CodeSourcery 2010-q1 works fine.
Symptom:
Resetting CPU ...
undefined instruction
pc : [<07fdecd4>] lr : [<07fdeb34>]
sp : 07f91380 ip : 00000000 fp : 00000001
r10: 010258fc r9 : 00000000 r8 : 07f94f64
r7 : 07f94eb0 r6 : 00989680 r5 : 000186a0 r4 : 000186a0
r3 : 000003e8 r2 : 000f423f r1 : 000f4240 r0 : 05f5e100
Flags: nzCv IRQs on FIQs on Mode SVC_32
(repeated ad nauseam)
The only hint I have is the constant 000186a0, which appears
here in the put_dec() routine from U-boots vsprintf(), which is
nothing special, it's Douglas Jones' binary to decimal conversion
code from the Linux kernel, but the compiles objects DOES
contain calls to __aeabi_uidivmod, __udivsi3, __div64_32.
Do we know of any potential trouble in these helpers on ARMv4?
The file containing these functions is compiled like this:
arm-linux-gnueabi-gcc -M -g -Os -fno-common -ffixed-r8 -msoft-float
-D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x01000000
-I/var/linus/u-boot/build-integrator/include2
-I/var/linus/u-boot/build-integrator/include
-I/var/linus/u-boot/include -fno-builtin -ffreestanding -nostdinc
-isystem /home/linus/src/gcc-linaro-arm-linux-gnueabi-2012.02-20120222_linux/bin/../lib/gcc/arm-linux-gnueabi/4.6.3/include
-pipe -DCONFIG_ARM -D__ARM__ -marm -mabi=aapcs-linux
-mno-thumb-interwork -march=armv4 -MQ
/var/linus/u-boot/build-integrator/lib/vsprintf.o vsprintf.c
>/var/linus/u-boot/build-integrator/lib/.depend.vsprintf
Yours,
Linus Walleij
Hi,
GDB on Android:
* Spent some time understanding the intricate build process used
to compile NDK's gdbserver.
* After a lot of tinkering and trial and error I was finally able
to produce a gdbserver binary which when running on the Android
emulator and talking to a GDB (also compiled by me) on the host
is able to show all threads and provide a useful backtrace.
--
[]'s
Thiago Jung Bauermann
Linaro Toolchain Working Group
Hi there. When you do a commit that fixes a bug, try adding a
'--fixes lp:123456' to the bzr commit. Launchpad recognises these and
automatically link the branch and merge request to the bug report.
This makes it easier for the reporter to see progress. If you forget
to update the bug status (such as bumping it from 'Triaged' to 'In
progress') then it's obvious that it's being worked on.
See https://bugs.launchpad.net/gcc-linaro/+bug/942307 for an example.
The status was 'Triaged' but there's an approved merge request which
shows work is being done.
-- Michael
Merged FSF GCC 4.7 to the Linaro GCC 4.7 branch.
Merged from GCC 4.6.3 release to Linaro GCC 4.6 branch.
Wrote and posted a patch to load DImode immediate constant into NEON
registers properly. Unfortunately, testing showed a bootstrap stage 2
vs. 3 miscompare, so there's something not quite right. However,
disassembly of the binaries hasn't revealed any problems, so this
failure is still a mystery. More investigation required.
Wrote and posted a patch to do DImode negation in NEON. Realised that I
had forgotten to do the core-register fall-back case; posted a new
version. Again, there's something annoyingly subtle that prevents
bootstrap. This time it looks like some sort of wrong code bug.
Investigating.
Wrote and posted a patch to do DImode one's complement in NEON. Richard
E questioned how it was written though. The tests passed successfully,
so that's a novelty this week!
Looked for other NEON instructions missing support. Didn't find any ...
but the machine description isn't exactly straight forward.
Considered the problems with choosing whether or not to do an operation
in NEON, or not. Discussed the existing state and possible solutions
with Ramana and Benrd (thanks Guys). Thought about it some more. Posted
a vague description of what might fix it to the linaro-toolchain list.
Awaiting replies.
Hi!
* Development benchmarks:
Finished the first implementation, sent to Michael for review.
* This became a very short week because of sickness.
Plans for week 10 is to triage existing bugs, and to get going again with
the SunSpider benchmark.
Regards
Åsa
Summary:
* Multilib on linaro toolchain.
Details:
1. Multilib test for linaro toolchain.
* Fix the multilib build issue when multiarch patch is applied.
* Fix a bug in crosstool-ng upstream to build multilib for glibc/eglibc.
* Successfully build multilib toolchain for armv6t2, cortex-a8 and
cortex-a9. And tests show it can link the correct libraries.
2. Still issues:
* Multilib and multiarch uses different directory structures.
Multilib build can not find the correct directories in the prebuilt
oneiric-sysroot.
* Build armv5t arm mode lib when default mode is thumb for armv7-a.
Annual leaves:
* Mar. 1-2.
Plans:
* Finalize the multilib solution for linaro binary toolchain.
* Work on code size optimization for the embedded toolchain.
Best regards!
-Zhenqiang
> The basic idea is that we add a new RTL optimization pass (or two) that
> assesses the usage of pseudo registers, and makes recommendations about
> what register class each should end up in, if there's a choice. These
> recommendations would then be used by later passes to get a better use
> of NEON. I might call this the "prealloc" pass, or something.
That sounds very much like the pre-reload that "new-ra" had at one
point (http://gcc.gnu.org/viewcvs/branches/new-regalloc-branch/gcc/pre-reload.c).
The problem with pre-reload for new-ra was that it was basically
reload instead of something nicer and cleaner. It also only ran just
before the register allocator, which is too late for the problem you
are trying to solve.
> Firstly, for each pseudo-register in a function, the pass would look at
> the insn constraints for each "def" and "use", and see how the registers
> relate to one another. This might determine things like "if rN is in
> class A, then rM must be also in class A".
At SUSE I tried to do this with the webizer pass (web.c). I wrote down
the ideas we implemented at the time (see
http://gcc.gnu.org/ml/gcc/2005-01/msg00179.html):
- web class, to replace regclass and choose register classes webs
instead of pseudos. This also includes splitting webs if a register
in a web really wants to be in two different classes to satisfy
constraints in two different insns. Right now, as far as I
understand, regclass just picks one and lets reload figure out how to
fix up that mistake.
- A semi-strict RTL mode. Right now there is just strict and
non-strict. On the branch there is a semi-strict mode which is the
same as strict RTL except that pseudo-registers are still allowed.
- pre-reload (which is related to web class) to make sure as many insn
constraints as possible are satisfied before the register allocator
goes to work. Basically, after pre-reload the insns stream should be
in semi-strict RTL form.
I used the webizer to unify defs and uses. I would split a web if it
needed multiple register classes (I inserted a mov, without checking
that a move existed from the source to the target register class), and
I put pseudos r1 and r2 in the same register class if there was an
insn (set (r1) (r2)) somewhere. The selection of the register classes
had a cost function, but I used rtx_cost, which is not very effective,
really. But I never took this experiment very far because for x86-64
the plan didn't work as well as I had hoped. I don't remember the
details, but the biggest problem I had with the experimental
implementation of these ideas (apart from lots of trouble with recog
for semi-strict RTL) was that there is a bit of an ordering problem
between combine on the one hand, and web-based register classes. If
you assign classes too early and don't allow things to change, then
combine fails too often. If you assign register classes after combine,
you may not get the instructions selected the way you want them to be.
This was when GCC still had the old local-alloc.c and global.c
allocators. Things may be different (better) with IRA and the upcoming
LRA stuff.
If you plan to work on this, I would suggest you discuss the plan on
the GCC mailing list also, with Jeff Law and Vladimir Makarov in CC
because they are working on a reload rewrite (LRA).
Ciao!
Steven
Hi,
OpenEmbedded:
* Worked on the meta-linaro layer and added libgcc and crosssdk
recipes to satisfy some bitbake dependencies
* I had to apply a few patches to build the linaro toolchain the OE
way (mostly gcc configury)
* successfully built the sato and Qt images
* Moved on to test the February release of the linaro binary toolchain
and (probably) and hit an issue with unaligned SD card images to used
with QEMU
* the guest kernel fails with: attempt to access beyond end of device
* /proc/partitions shows different block sizes (host vs. guest)
* the image size gets calculated on the fly by OE
* patch posted that introduces allows to specify a rootfs size alignment
* not seen on trunk as they use IDE
* Started to rebase the linaro-meta layer against current OE-core
* created https://wiki.linaro.org/KenWerner/Sandbox/OEMetaLinaroCard
based on the existent card of David R.
Regards,
Ken
== GCC ==
* Fixed mainline regression causing ICE in certain outer-loop
vectorization cases.
* Merged fwprop-subreg patch into Linaro GCC 4.7.
* Completed patch to generate usat/ssat instructions
where appropriate; checked into GCC mainline.
Merge requests to Linaro GCC 4.6 and 4.7 pending.
* Ongoing work on improving end-of-loop value computation.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
==Progress===
* Finished off PGO patch - sent upstream.
* Finished off the ABI tests - sent upstream.
* Investigated fixes for LP 942307 - a problem with kernel builds for
android. Backported a fix from Uli last year.
* Upstream patch review.
* Small configury done for SPEC2k as far as HC partitioning goes.
* Some Android benchmark investigations.
* Recovered from a broken upgrade on my laptop from natty to oneiric
on my laptop and then went all the way to Precise. It works
reasonably !
=== Plans ===
* Commit all approved and tested patches.
* Check on hc partitioning results from SPEC2k and make sure there is
an improvement and the feature works !
* Investigate https://bugs.launchpad.net/gcc-linaro/+bug/924726 in a
little more detail.
* Get back to partial-partial PRE.
Absences.
* 1 week holiday sometime before that - to be booked.
* Linaro Connect Q2.12 - May 28 - June 1 - travel booked - hotel to be booked.
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-??-?? || ||
(new blueprints & reestimate for this one pending)
Historical Milestones:
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| 2012-02-01 ||
== cp15-rework ==
* ploughing through conversion of cp15 registers to new design:
patchset now 20 patches long, still TODO crn={0,1,6,7,9}
== other ==
* reviewed more Xilinx Zynq model patches
* looking at BE8 support: Paul Brook has posted some patches
to support this in user mode
* LP:944645: fixed bug where we weren't clearing the IT bits when
entering an M profile exception handler
* sent out an arm-devs.next pullreq
* trying to track down why linux-user is failing brk() and thus
causing bash segfaults
Hi All,
As you know, the compiler currently has difficulties choosing between
whether to do an operation in NEON or not.
As I see it there are three problems:
1. Simply, is it profitable?
NEON can do many DImode operations in one or two instructions
where 2 to 10 normal ARM/Thumb instructions would be required
(not to mention the added register pressure), but there is a
cost associated with moving the inputs to NEON, and the results
back.
If the data can stay in NEON for more than one operation,
then that's even better.
If the data must be loaded from memory, and the result stored back
to memory, then it's only a question of whether the register space
is available, or not.
Currently these decisions are made in the IRA/reload passes.
2. Values that originate in hard-registers stay there.
This applies to function parameters, mostly, but also in general
where the result of an operation is allocated first.
If there is no instruction that can use the value there then the
value is 'reloaded' to a more suitable register. If there is any
alternative that avoids the move then the register allocator will
use it, regardless of the relatives costs of the other
alternatives.
This problem is reduced where an operation and move can happen in
one instruction, but NEON instructions do not do this much. We can
write insns that appear to do it, but these output multiple
instructions (see my recent core-SI=>NEON-DI extend patch).
3. It all happens too late.
The decision whether to use NEON or not is not made until register
allocation time. Naturally this means that most of the optimization
passes are already completed.
Part of the problem is that the operation almost certainly needs
splitting (into whatever form was chosen) and this might not be
straight forward, post-reload. (However, the split1 pass is
already quite late, so perhaps this isn't such a big deal.)
Another part of the problem is that passes such as the two
lower-subreg passes make assumptions about the register width which
are not accurate if the operation is to end up in NEON.
There are other, lesser problems, such as it being hard to adjust the
costs for different cores (A8 in particular) and the cost of generating
an immediate constant can't be known until it's known what instructions
will be used to generate it.
These problems are not specific to NEON, of course. I believe IWMMXT
suffers from the same issues. Likewise the C6X port, and also the i386
MMX to some degree. Anything that has instructions that only operate on
a subset of registers, basically.
So, Bernd has suggested an outline of a solution. I've quizzed him on
this, added a few of my own ideas, and probably a good selection of
misunderstandings, bad assumptions, and general cock ups, and come up
with something I can write here for comment. I can post something to
upstream later if it doesn't get totally shot down now.
The basic idea is that we add a new RTL optimization pass (or two) that
assesses the usage of pseudo registers, and makes recommendations about
what register class each should end up in, if there's a choice. These
recommendations would then be used by later passes to get a better use
of NEON. I might call this the "prealloc" pass, or something.
Firstly, for each pseudo-register in a function, the pass would look at
the insn constraints for each "def" and "use", and see how the registers
relate to one another. This might determine things like "if rN is in
class A, then rM must be also in class A".
E.g. if you have two registers with constraints like this:
"r,w"
"r,w"
.. (and 'r' and 'w' do not overlap) then you know that there is a choice
between one mode or another, whereas this:
"r,w,r,w"
"r,w,w,r"
.. would impose no restrictions and we can carry on as normal.
Having done that we'd end up with sets of pseudo-registers that must
make a decision one way or the other, and we'd know where the operations
are that would force a move from one class to the other.
There's a fair amount of handwavium in there at present, because I've
not worked out what to do with overlapping register classes (think
VFP_LO_REGS) and all the other complications.
Secondly, the pass would consider the costs of each alternative, and
store a recommended register class for each pseudo-register in a table
somewhere. It would also create new pseudos and insert extra move
instructions at the register file boundaries where an existing register
would have had split recommendations (this would solve problem 2 above).
Again, there's handwavium in "consider the costs". This isn't too hard
for size-optimization (assuming the "length" attributes on the insn is
correct), but more difficult for speed optimization. Factors to include
would be the move costs (here the A8 issues would be addresses) and the
relative speeds of the operations in both alternatives. Also, the
various possible transition points between the two modes might need some
comparisons.
Thirdly, the subsequent passes would need to be modified, as would some
of the back-end bits and bobs.
1. Lower-subreg would need to detect 'word_mode' based on the
recommended register class, not the global value.
2. The many split patterns in the machine description could be adjusted
so that, instead of simply conditionalizing on "reload_completed", they
split at split1 if that's the best option. (Maybe it would be profitable
to insert a new, earlier split pass specifically for this case to take
advantage of the likes of combine? I mean, ideally this decision would
have been made at expand time, if it could have been?) It might be
useful to *not* split too soon, in some cases, so that the register
allocator can still make the final decision based on register pressure,
and whatever other factors it uses. Of course, the existing late-split
option would need to be retained in case the prealloc pass is disabled,
in any case.
3. Various passes would have to be taught not to remove seemingly
superfluous register moves where they actually move between register
classes.
4. Pretty much nothing would need doing to register allocation! The
extra moves should make allocation a register pressure management issue,
rather than a question of making it work. DImode operations preallocated
to core-registers may already have been lowered, one way or the other
(by split1) so there's no decision left there, and if no lowering was
necessary then that option ought to be obviously cheaper. If it insists
on making contrary decisions then it can be taught to use the
recommendation as a hint, perhaps? In specific problem cases it would
also be possible to use instruction attributes to disable (or strongly
discourage) certain alternatives based on the recommended class.
5. The existing 'onlya8'/'nota8' nonsense can be removed.
6. The register move cost can be set correctly for each core.
7. If a constant is destined for a NEON register, most likely,
arm_gen_constant can use the NEON immediate rules to determine the cost.
There's clearly a lot of thought that needs to go into the
pseudo-register scan and decision making logic, but the whole thing
doesn't look like it'll boil down to very much code in the end.
There's also the question of where to put the pass? Too early and you'd
need to put a second one in to reassess the much changed RTL later, and
too late and lower-subreg won't be able to use it.
It's possible that it might be better to treat it more like the
data-flow analysis where it's not actually a stand-alone pass, but
rather a tool other passes can use? That might depend how
computationally expensive it is.
Any thoughts anyone? Might something like this actually work? Would it
be worth spending the time on this?
Andrew
Hi,
184603 fixes an ICE we're running into with Android test builds.
Please pull it in ASAP so I don't have to mess with the CFLAGS as a workaround.
ttyl
bero