Virtual Connect is up next week. We've got two sessions lined up: the
first on profile guided optimisation and link time optimisation, and
the second on next steps with the vectoriser. Some other highlights
are the ones on system trace, Dalvik, and Aarch64 via OpenEmbedded
bootstrap.
The schedule is up at:
http://www.linaro.org/linaro-blog/2012/08/07/linaro-announces-virtual-conne…
Our sessions are "Analyzing vectorizer performance regressions in GCC
4.7 and 4.8" at 1000 UTC on Monday and "Exploring The Performance
Impact of PGO and LTO on ARM" at 1000 UTC on Thursday.
You'll need Google Hangout set up to join. For those who can't make
it, the sessions also get recorded out to the Linaro OnAir YouTube
channel.
I've cancelled the Monday regular and Thursday stand up calls. See
you next week!
-- Michael
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-06-23 || 2012-06-24 ||
||a15-lpae-support || 2012-07-13 || 2012-07-20 || 2012-07-20 ||
||clean-up-kvm-patches || || || ||
||track-kvm-abi-changes || || || ||
||fake-trustzone || || || ||
Overall KVM plan for 'do by end August': QEMU parts of this are a mix
of clean-up-kvm-patches and track-kvm-abi-changes blueprints, mostly.
http://cards.linaro.org/browse/CARD-167
== clean-up-kvm-patches ==
* did enough cleanup to be able to send a coherent initial RFC
patchset to qemu-devel/kvmarm.
== other ==
* upstream patch review, in preparation for QEMU 1.2 freeze next week
* put together qemu-linaro 2012.08 tarball (release next week)
KVM blueprint progress tracker:
http://apus.seabright.co.nz/helpers/backlog?group_by=topic&colour_by=state&…
-- PMM
== GCC ==
* Back-ported patch to change vector alignment to 8
to FSF GCC 4.6 and 4.7 and Linaro GCC 4.6 and 4.7.
* Investigated mp3player benchmark regression with
Linaro GCC 4.7 backport of vector alignment patch.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== Progress ==
* Started Linaro ramp up process
* Lots of admin, paperwork, and PC setup
* Successfully built compilers from Linaro and upstream trees
* Out of office Friday 10
== Next Week ==
* Out of office Friday 17
* Attend appropriate Virtual Connect sessions
* Do the 2012.08 release of the Toolchain
* Start working on symbol_ref split benchmarking.
== Future ==
* Find a small patch to GCC to use to pipeclean the submission process
--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann(a)linaro.org
Hello,
I've had a look at the mp3player performance regressions (just with *some*
data sets) with the vector-alignment patch. Interestingly it turns out
that the patch basically does not change the generated code for the hot
spot (inv_mdct routine) at all. (The *only* change is which bits of the
incoming pointer the run-time alignment check generated by the vectorizer
tests for. But this has no practical consequences, since the check itself
is not hot, and the *decision* made by the check is the same anyway --
everything is in fact properly aligned at runtime.)
The other difference, outside of code, introduced by the vector-alignment
patch is that some global arrays used to be forcibly aligned to 16 bytes by
the vectorizer, and they are now only aligned to 8 bytes. To check whether
this makes a difference, I've modified the compiler as a hack to always
force all global arrays to be 16 byte aligned. And interestingly enough,
this appears to fix this particular performance regression ...
Any thoughts as to why this might be the case? What are the
recommendations on the ARM hardware side as to what alignment is prefered?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
I have noticed gcc has a preference for generating UXTB instructions
when an AND with #255 would do the same thing. This is bad, because
on A9 UXTB has two cycles latency compared to one cycle for AND. On
A8 both instructions have one cycle latency.
--
Mans Rullgard / mru
== GCC ==
* Checked in patch to change vector alignment to 8
to GCC mainline.
* Started investigating benchmark regressions with
Linaro GCC 4.7 backport of vector alignment patch.
== GDB ==
* Checked in patch to fix hardware breakpoints on
non-4-byte aligned (Thumb) instructions.
* Checked in patch to properly report unsupported
watchpoint address/length combinations in gdbserver.
* Checked in patch to fix regression accessing /proc
files on older Linux kernels.
* Checked in 5 more patches to fix miscellaneous
test suite regressions.
* Re-tested GDB 7.5 pre-release on multiple platforms.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
[ Also posted to debian-arm; not cross-posted to avoid subscription
complaints... ]
Hi folks,
We're currently carrying patches in glibc in Debian (and Ubuntu) that
I wrote which are used to work out whether an ELF binary is hard-float
or soft-float. We're using these to allow us to do the right thing on
a multi-arch system, which is to pick a consistent set of binaries
(programs and libraries) at runtime; if you try to mix binaries using
different ABIs, you're prone to all kinds of weird and wonderful
results but generally badness occurs.
Upstream glibc have generally not been welcoming of these patches, and
I understand this; the approach taken (reading ARM-specific build
attributes) is far from clean and doesn't fit well in the design of
ld.so in particular. So, I've been looking into alternative methods
for achieving the goal of identifying ABI. After a couple of false
starts and discussion with some of the helpful toolchain and ABI folks
in ARM, I think we have a solution that will work well in the long
term. I just wish we'd thought about this *way* back when we first
started the armhf port, as it would have been much easier to work on
and standardise this back then. Modulo availability of time machines,
there's not much we can do on that front... :-)
What I'm proposing is to use two new values in the OSABI field in the
ELF header:
#define ELFOSABI_LINUX_ARM_AEABI_SF 65
#define ELFOSABI_LINUX_ARM_AEABI_HF 66
and use these values in the future for soft- and hard-float binaries
so that can unambiguously identify them.
There's already precedent for binaries using different values in this
field, with support in glibc for parsing and understanding
them. Adding more possible values is quite easy, assuming that the
maintainers are amenable. I'm about to post a similar message there.
I have a plan of attack for how to make a staged switch over,
deliberately to minimise any potential compatibility problems. See the
attached doc for that. It's deliberately not very specific in terms of
timeline, as that's something I'm hoping to get feedback
about. Comments very welcome; please point out if you think there are
problems with this approach, or if there are any more implementations
of toolchain / linker that will need to be addressed.
Cheers,
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
For reference, if you see link time errors about a missing
'__dso_handle' symbol when building Android, then check if you're
using any global class instances in your multimedia libraries.
Each shared library has a __dso_handle symbol which is filled in on
load by the dynamic loader. Global class instances use this unique
value to make sure the destructor is called when the library is
unloaded. The symbol itself is defined in crtbegin_so.o, but the
multimedia rules forbid using this for an unknown reason. Either
create your global instances in a different way or change the
multimedia rules :)
-- Michael
== Progress ==
* Fixed PR54051
* Improved neon intrinsics testsuite. While still not an execution
based testsuite atleast we get compile time tests that are sensible C.
Exposed issues - wrote patches.
that improve vabal , vaba intrinsics. Fix an issue with costs,
fixed an issue with splitters for large mode moves for Neon with
hardfp port etc.
* Some upstream patch and bug review.
* Fixed a minor testism for vld1q_s64 tests.
== Plans ==
* Write a patch to check md5sums between local tarball and uploaded
tarball in the release script.
* Look at auto-inc-dec patches more and investigate benchmark results.
* Submit intrinsics work upstream and sheperd it through.
* Finish looking at PR53664 and clean up testsuite further.
* Follow-up on my intrinsics patches upstream.
== Absences ==
* 17th Sept - 5th Oct - Vacation approved.
== GCC ==
* Checked in fix fix for incorrect pool placement with -O0
by splitting all insns in machine-dependent reorg.
* Created blueprint to investigate -funroll-loops and
-fvariable-expansion-in-unroller.
* Took over patch to change vector alignment to 8 from
Richard; reworked according to review comments; found
and fixed two vectorizer bugs triggered by the change;
submitted for mainline approval.
* Continued investigation of reload bug reported by ARM.
Posted potential fix to gcc-patches for discussion.
== GDB ==
* Worked on fixing HW breakpoint/watchpoint regressions.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-06-23 || 2012-06-24 ||
||a15-lpae-support || 2012-07-13 || 2012-07-20 || 2012-07-20 ||
||clean-up-kvm-patches || || || ||
||track-kvm-abi-changes || || || ||
||fake-trustzone || || || ||
Overall KVM plan for 'do by end August': QEMU parts of this are a mix
of clean-up-kvm-patches and track-kvm-abi-changes blueprints, mostly.
http://cards.linaro.org/browse/CARD-167
== clean-up-kvm-patches ==
* sent patch series to try to clean up some QEMU kvm x86isms
that block cleanup of some of the ARM KVM support code;
dealt with review comments and sent v2
== other ==
* started on cleaning up the QEMU benchmarking setup so we
can put it on a server machine somewhere
* fixed a crash in the QEMU ARMv7M models which was introduced
by one of my earlier GIC/NVIC refactoring series
* upstream review/maintainer duties
KVM blueprint progress tracker:
http://apus.seabright.co.nz/helpers/backlog?group_by=topic&colour_by=state&…
-- PMM
FYI GCC trunk r189808 fails to build with a bootstrap comparison error:
Comparing stages 2 and 3
warning: gcc/cc1-checksum.o differs
warning: gcc/cc1plus-checksum.o differs
warning: gcc/cc1obj-checksum.o differs
warning: gcc/cc1objplus-checksum.o differs
Bootstrap comparison failure!
arm-linux-gnueabi/libgcc/unwind-arm.o differs
arm-linux-gnueabi/libgcc/unwind-arm_s.o differs
189575 was fine on hard float. 189745 is fine on softfp.
-- Michael
---------- Forwarded message ----------
From: Linaro Toolchain Builder <michael.hope+cbuild(a)linaro.org>
Date: 25 July 2012 15:59
Subject: [cbuild] gcc-4.8~svn189808 armv7l failed
To: "michael.hope+notify(a)linaro.org" <michael.hope+notify(a)linaro.org>
ursa3 finished running job gcc-4.8~svn189808 on
armv7l-precise-cbuild348-ursa3-cortexa9hfr1.
The results are here:
http://builds.linaro.org/toolchain/gcc-4.8~svn189808
This email is sent from a cbuild (https://launchpad.net/cbuild) based
bot which is administered by Michael Hope <michael.hope(a)linaro.org>.
Hello Ramana,
For your PGO list:
* please note that I've been working on PGO for switch code, and also
for chains of if-statements with a common condition variable (with Tom
de Vries)
* turning conditional execution off will not make a difference, your
profile information will be exactly the same. Profile instrumentation
happens very early in the pipe line (on purpose, PGO is more
accurately "coverage guided optimization", not profiling in the
prof/gprof/oprofile sense). And the parts of the CFG that have profile
instrumentation cannot be if-converted anyway.
* you can use the script "analyze_brprob" in contrib/ to measure the
accuracy of the branch predictors. The script needs some TLC, fixing
it is on my TODO list but let me know if linaro folks are going to
take care of that. You'll find that the predictors are heavily tuned
towards the original Opteron, I'm not aware of much tuning for other
architectures.
* The heuristics for profile-guided optimizations are also not tuned
for arm. In the past we found that some params have more influence
than others (the TRACER* parameters for example).
Hope this helps,
What do you mean with "Only conditionalise those parts that benefit"?
Ciao!
Steven
== Progress ==
* Looking at auto-inc-dec patches.
* sched-pressure now on by default in FSF 4.8
* Background look into neon costs and vdup improvements.
* Some upstream patch review.
* Discovered http://gcc.gnu.org/PR54051 while testing a neon
intrinsics patch and wrote a patch to fix it.
== Plans ==
* Write a patch to check md5sums between local tarball and uploaded
tarball in the release script.
* Look at auto-inc-dec patches more and investigate benchmark results.
* Finish submitting PR54051 patch upstream.
* Finish vdup folding patch.
The Linaro Toolchain Working Group is pleased to announce the 2012.07
release of the Linaro Toolchain Binaries, a pre-built version of
Linaro GCC and Linaro GDB that runs on generic Linux or Windows and
targets the glibc Linaro Evaluation Build.
Uses include:
* Cross compiling ARM applications from your laptop
* Remote debugging
* Build the Linux kernel for your board
What's included:
* Linaro GCC 4.7 2012.07
* Linaro GDB 7.4 2012.06
* A statically linked gdbserver
* A system root
* Manuals under share/doc/
The system root contains the basic header files and libraries to link
your programs against.
Interesting changes include:
* Change c++, gcc and ld to symlinks in Linux package
The Linux version is supported on Ubuntu 10.04.3 and 12.04, Debian
6.0.2, Fedora 16, openSUSE 12.1, Red Hat Enterprise Linux Workstation
5.7 and later, and should run on any Linux Standard Base 3.0
compatible distribution. Please see the README about running on
x86_64 hosts.
The Windows version is supported on Windows XP Pro SP3, Windows Vista
Business SP2, and Windows 7 Pro SP1.
The binaries and build scripts are available from:
https://launchpad.net/linaro-toolchain-binaries/trunk/2012.07
Need help? Ask a question on https://ask.linaro.org/
Already on Launchpad? Submit a bug at
https://bugs.launchpad.net/linaro-toolchain-binaries
On IRC? See us on #linaro on Freenode.
Other ways that you can contact us or get involved are listed at
https://wiki.linaro.org/GettingInvolved.
We've just started running a weekly benchmark of GCC trunk and Linaro
GCC tip. I've written a short script that compares against a baseline
and spits out a graph:
http://ex.seabright.co.nz/benchmarks/gcc-4.8~svn.pnghttp://ex.seabright.co.nz/benchmarks/gcc-linaro-4.7%2bbzr.png
I'll switch the baseline to GCC 4.7.0 once the build and benchmark run
completes. The gcc-linaro results need more data before they'll make
sense.
Part way there. An automatic email would be next. We should check
the graphs before each performance call.
-- Michael, who needs to get moving on LAVA
== GCC ==
* Checked in fix to LP bug 1020601 (missed optimization with
multiple __builtin_unreachable calls) to Linaro GCC 4.7.
* Implemented and tested alternative fix for incorrect pool
placement with -O0 by splitting all insns in machine-
dependent reorg.
* Continued investigation of reload bug reported by ARM.
== GDB ==
* Tested GDB 7.5 branch on ARM, found a couple of regressions.
Worked on fixing HW breakpoint/watchpoint regressions.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-06-23 || 2012-06-24 ||
||a15-lpae-support || 2012-07-13 || 2012-07-20 || 2012-07-20 ||
||clean-up-kvm-patches || || || ||
||track-kvm-abi-changes || || || ||
||fake-trustzone || || || ||
Overall KVM plan for 'do by end August': QEMU parts of this are a mix
of clean-up-kvm-patches and track-kvm-abi-changes blueprints, mostly.
http://cards.linaro.org/browse/CARD-167
== a15-lpae-support ==
* LPAE patches now merged upstream
* v2 of vexpress-large-ram-size sent upstream, code reviewed
and put into arm-devs pullreq. Hasn't hit master yet but
I expect that to happen over the next week.
== clean-up-kvm-patches ==
* squashed together some kvm patches in the qemu-linaro tree
* sent upstream a few patches where we can avoid an ARM-KVM
specific change by instead generalising the upstream code not
to have an explicit list of KVM supporting architectures
* started looking at how best to clean up some working-but-ugly
code handling interrupts in the QEMU KVM-ARM patchset. Among
other problems, this is messy to fix because at the moment
upstream is overloading "is there an in kernel irqchip?" to
mean both "should we use QEMU's irqchip model or not?" and
"is the interrupt injection model synchronous or asynchronous?"
because on x86 they are (for historical reasons) the same.
For ARM we only want to decide which irqchip model to use,
not anything else...
== other ==
* upstream review (various exynos patches, mostly)
* some patches fixing problems with compiler warnings in
configure test fragments
* arm-devs pullreq
KVM blueprint progress tracker:
http://apus.seabright.co.nz/helpers/backlog?group_by=topic&colour_by=state&…
-- PMM
Hi Ramana, Ulrich. Could I have some help with an unexpected
testsuite failure while backporting Carrot's adddi patch?
testsuite/gcc.misc-tests/gcov-7.c builds and runs but aborts during
leave() due to unexpected results.
The merge request is here:
https://code.launchpad.net/~michaelh1/gcc-linaro/core-adddi/+merge/113111
The testsuite diff is here:
http://ex.seabright.co.nz/build/gcc-linaro-4.7+bzr115001~michaelh1~core-add…
The build tree is at:
cbuild@tcpanda02.v:/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build
The failing and working versions are on tcpanda02 as ~/gcov-7.exe and
~/gcov-7-ok.
Here's the details:
* The test is fine when built from the command line
* The test is fine on the hard float Precise build
* The failing binary works fine when run on Precise
* The disassembled body (not libraries) is identical modulo changes
in addresses
* The fault goes away with a static linking via adding "--tool_opts '-static'"
* The fault persists with binutils 2.22
* The fault persists with the eglibc 2.15 loader
I assume the testsuite picks up a different libgcc and libgcov somehow
which gives a different executable. It's strange that the static
linked version is fine, and that the failing binary works fine on a
different host.
Could you have a poke in the build tree?
-- Michael