[Short week: 3 days]
* looked at (but failed to reproduce) a hang in QEMU reported
by Christoffer when shutting down a KVM ARM guest using TUN/TAP
networking
* investigated LP:1084148 (segfault in qemu usermode) sufficiently
to diagnose it as probably another of qemu's "can't handle
multithreaded guest programs" bugs
* fixed some problems with QEMU's secondary CPU boot code which
were masked by errors in QEMU's GIC model but revealed by
real hardware (ie KVM); fixed the GIC model bugs as well
* investigated LP:955379 (cmake hangs under qemu-arm-static).
Tracked down to a race condition involving signal delivery,
the fix to which would require the significant redesign I
sketched out here a year or so ago:
http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00384.html
KVM blueprint progress tracker:
http://ex.seabright.co.nz/helpers/backlog?group_by=topic&colour_by=state&pr…
-- PMM
== Blueprints ==
Initial Current Actual
initial-aarch64-backport 31 Oct 2012 7 Dec 2012*
aarch64-baremetal-testing 31 Oct 2012 7 Dec 2012*
fix-gcc-multiarch-testing 31 Dec 2012 31 Dec 2012
backport-fma-intrinsic 31 Dec 2012 31 Dec 2012
fused-multiply-add-support 31 Dec 2012 31 Dec 2012
gcc-investigate-lra-for-arm 31 Dec 2012 31 Dec 2012
== Progress ==
* Admin
* Interviewing
* Preparation for taking over from Michael
* Investigate patches for literal pool layout bug
* Applied
* PINGed triplet backport patches upstream
* Other bug issues
* Including an issue running SPEC2K on x86 with recent trunk
* And a 4.6 gcc-linaro only issue
== Next Week ==
* Start leading Toolchain team
* Run HOT/COLD partitioning benchmarks
* Analyse ARM results
* On x86_64 to see what the actual benefit we could get
* initial-aarch64-backport & aarch64-baremetal-testing
* Finish documentation
* gcc-investigate-lra-for-arm
* Analyse benchmarks
* fix-gcc-multiarch-testing
* Come up with strawman proposal for updating testsuite to handle
testing with varying command-line options.
== Future ==
* backport-fma-intrinsic & fused-multiply-add-support
* Backport patches once fix-gcc-multiarch-testing has been done.
== Planned Leave ==
* Monday 24 December - Monday 31 December
--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann(a)linaro.org
Hi,
I think I have identified some issues with the atomic builtins, but I want
your advises.
For instance :
A: __atomic_store_n (addr, val, __ATOMIC_SEQ_CST);
gives the armv7 code:
DMB sy
STR r1, [r0]
DMB sy
but if I have well understood, the DMBs instructions only provide the
property that the
code is sequentially consistent, but not the atomicity for which we have to
use the
LDREX/STREX instructions. Thus I think that the code should be :
DMB sy
1: LDREX r2, [r0]
STREX r1, r2, [r0]
TEQ r1, #0
BNE 1b
B: __atomic_load_n (addr, __ATOMIC_ACQUIRE);
gives the armv7 code:
DMB sy
LDR r0, [r0]
but the load-acquire semantique specifies that all loads and stores
appearing in program order
after the load-acquire will be observed after the load-acquire, thus the
DMB should be after the
LDR, no ?
--
Yvan
Hi,
I'm working on the libatomic-ops (part of the Boehm gc) AArch64 support,
I mainly use GCC's __atomic builtins to do this, but in our 4.7 version
they don't use the load acquire / store release instructions now available
in the ARMv8 ISA. These instructions are used in the mainline GCC
(in atomic.md) but not in their exclusive form, I understand that it should
be due to the performance penalty, but I want your feeling on that point
as I don't find the ARMv8 ISA really clear.
If we want to implement an atomic load acquire, is
LDAR x1, [x0]
sufficient, or do we have to write it like that :
L: LDAXR x0, [x3]
STEX x1, x0, [x3]
CBZ x0, L1
Thanks
Yvan
All,
[Editiorial: Michael & I discussed making what we do as a working
group more visible at Connect. One thing we discussed was making our
meeting minutes more visible by emailing actions out after each
meeting. This will be part of the job of the 'minuter' - a job I plan
to spread around as I am useless at it whilst also running a call -
more info on the Wiki:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings]
The minutes of the performance call held on 27 November 2012 can be found at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2012-11-27
In summary the actions from the meting are:
* mgrettondann split LRA blueprint
* Christophe to update Hot/Cold partitioning bugzilla
* mgrettondann: benchmark on Hold/Cold partitioning
* Michael to log a ticket to improve reporting of benchmarks when the
run complete.
* Ramana to log EEMBC failure with Hot/Cold partitioning into bugzilla.
* Christophe to backport bswap16 builtin, except for the testcase
which fails in one of our configurations (Thumb1 + hard FP ABI)
The next performance call will be on 11 December 2012 and the agenda
can be found at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2012-12-11
Thanks,
Matt
--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann(a)linaro.org
The Linaro Toolchain Working Group is pleased to announce the 2012.11
release of the Linaro Toolchain Binaries, a pre-built version of
Linaro GCC and Linaro GDB that runs on generic Linux or Windows and
targets the glibc Linaro Evaluation Build.
Uses include:
* Cross compiling ARM applications from your laptop
* Remote debugging
* Build the Linux kernel for your board
What's included:
* Linaro GCC 4.7 2012.11
* Linaro GDB 7.5 2012.09
* A statically linked gdbserver
* A system root
* Manuals under share/doc/
The system root contains the basic header files and libraries to link
your programs against.
The Linux version is supported on Ubuntu 10.04.3 and 12.04, Debian
6.0.2, Fedora 16, openSUSE 12.1, Red Hat Enterprise Linux Workstation
5.7 and later, and should run on any Linux Standard Base 3.0
compatible distribution. Please see the README about running on
x86_64 hosts.
The Windows version is supported on Windows XP Pro SP3, Windows Vista
Business SP2, and Windows 7 Pro SP1.
The binaries and build scripts are available from:
https://launchpad.net/linaro-toolchain-binaries/trunk/2012.11
Need help? Ask a question on https://ask.linaro.org/
Already on Launchpad? Submit a bug at
https://bugs.launchpad.net/linaro-toolchain-binaries
On IRC? See us on #linaro on Freenode.
Other ways that you can contact us or get involved are listed at
https://wiki.linaro.org/GettingInvolved.
Summary:
* Investigate shrink-wrap result.
* Prepare for Linaro toolchain binary release, script merge and aarch64 test.
Details:
1. Investigate shrink-wrap result of function Ray_In_Bound. By
default, ARM/MIPS/PPC/X86 toolchain can not shrink-wrap the function.
For ARM, there is copy "r6 = r1" which blocks the optimization. By
hacking the assemble code, I got ~3% performance improvement for
453.povray benchmark.
2. Setup AARCH64 simulation environment by following
http://www.linaro.org/engineering/armv8.
3. Write scripts to collect branch cost performance. It will take
weeks to get all the benchmark results.
4. Smoke test Linaro toolchain binaries 2012.11 release.
5. Try export crosstool-ng trunk to a bzr project. bzr fast-import
always fail on Ubuntu 10.04, but it works on 12.04.
6. RM toolchain related work.
Plans:
* Collect performance data for branch cost tuning.
* Linaro binary toolchain 2012.11 release.
* Verify shrink-wrap bugs.
Best regards!
-Zhenqiang
== Progress ==
* Turn off 64-bits bitops in Neon: initial implementation under
benchmarking.
Currently it modifies the handling of: add, sub, and, or, xor, shifts,
not. In some case the generated code is quite larger, so it will careful
benchmarking.
* Started looking at "disable peeling" blue-print. Reading GCC source code
to get more familiar with that area.
* Internal support
== Blueprints ==
Initial Current Actual
initial-aarch64-backport 31 Oct 2012 30 Nov 2012
aarch64-baremetal-testing 31 Oct 2012 30 Nov 2012
fix-gcc-multiarch-testing 31 Dec 2012 31 Dec 2012
backport-fma-intrinsic 31 Dec 2012 31 Dec 2012
fused-multiply-add-support 31 Dec 2012 31 Dec 2012
gcc-investigate-lra-for-arm 31 Dec 2012 31 Dec 2012
== Progress ==
* Admin
* Interviewing
* Investigate patches for literal pool layout bug
* Took longer than expected as the 'simple' fix is wrong due to GCC not
knowing how large instructions actually are.
* Patch posted upstream
* Post triplet backport patches upstream
* Other bug issues
* Including an issue running SPEC2K on x86 with recent trunk
== Next Week ==
* Run HOT/COLD partitioning benchmarks
* Analyse ARM results
* On x86_64 to see what the actual benefit we could get
* initial-aarch64-backport & aarch64-baremetal-testing
* Finish documentation
* gcc-investigate-lra-for-arm
* Analyse benchmarks
* fix-gcc-multiarch-testing
* Come up with strawman proposal for updating testsuite to handle
testing with varying command-line options.
== Future ==
* backport-fma-intrinsic & fused-multiply-add-support
* Backport patches once fix-gcc-multiarch-testing has been done.
--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann(a)linaro.org
Hi,
I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can
shrink-wrap function Ray_In_Bound.
Here is:
bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
{
...
for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling)
{...}
return (true);
}
For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy
r6 = r1 before
testing Bound != NULL
The copy (using r6) blocks the shrink-wrap optimization since r6
should be saved. Need enhance shrink-wrap to handle this case.
Overall, for povray benchmark,
54 functions are shrink-wrapped for ARM;
59 functions are shrink-wrapped for X86;
25 functions are shrink-wrapped for MIPS;
26 functions are shrink-wrapped for PowerPC.
Thanks!
-Zhenqiang
On 15 November 2012 01:58, 남관우 <kw46.nam(a)samsung.com> wrote:
>
> Hi,
>
>
>
> As your guide, i tried to build again.
>
>
>
> without : -mapcs -fno-common -fstack-protector --param==ssp=buffer-size=4
>
>
> and -fPIC instead of -fpic
>
>
>
> But it is failed with same the message. (/usr/lib/libnfc-common-lib.so.1: unexpected reloc type 0x03)
>
>
>
> Thank you,
>
> Kwanwoo Nam.
>
>
>
> ------- Original Message -------
>
> Sender : 남관우<kw46.nam(a)samsung.com> S4(선임)/선임/SLP개발그룹(무선)/삼성전자
>
> Date : 2012-11-14 21:45 (GMT+09:00)
>
> Title : Re: Re: Re: unexpected reloc type 0x03 error with gcc-4.6.4 (2012.10 version)
>
>
>
> Hi,
>
>
>
> Here is our LDFLAGS.
>
> -Wl,--rpath=/usr/lib -Wl,--as-needed
>
>
>
> And i try to build with your guide.
>
> without : -mapcs -fno-common
> and -fPIC instead of -fpic
>
>
>
> But it is failed with same the message. (/usr/lib/libnfc-common-lib.so.1: unexpected reloc type 0x03)
Ta. I'm afraid we don't have enough information to solve this.
Could you please send a full build log and we can go from there.
gzipped on a public server is best.
-- Michael
== Blueprints ==
Initial Current Actual
initial-aarch64-backport 31 Oct 2012 30 Nov 2012
aarch64-baremetal-testing 31 Oct 2012 30 Nov 2012
fix-gcc-multiarch-testing 31 Dec 2012 31 Dec 2012
backport-fma-intrinsic 31 Dec 2012 31 Dec 2012
fused-multiply-add-support 31 Dec 2012 31 Dec 2012
gcc-investigate-lra-for-arm 31 Dec 2012 31 Dec 2012
== Progress ==
* Admin
* Interviewing
* Hand over prep with Michael
* Release Week
* Made 2012.11 releases of gcc-linaro 4.6 and 4.7.
* LEG interations:
* Investigated CILK+ and how much work to port to AArch64.
* HOT/COLD partitioning
* Ran benchmarks on ARM
* LRA
* Ran x86-64 benchmarks
== Next Week ==
* Investigate patches for literal pool layout bug
* Post triplet backport patches upstream
* Run HOT/COLD partitioning benchmarks
* Analyse ARM results
* On x86_64 to see what the actual benefit we could get
* initial-aarch64-backport & aarch64-baremetal-testing
* Finish documentation
* gcc-investigate-lra-for-arm
* Analyse benchmarks
* fix-gcc-multiarch-testing
* Come up with strawman proposal for updating testsuite to handle
testing with varying command-line options.
== Future ==
* backport-fma-intrinsic & fused-multiply-add-support
* Backport patches once fix-gcc-multiarch-testing has been done.
--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann(a)linaro.org
== Progress ==
* Infrastructure:
- Managed to have my laptop re-installed by IT with a native Ubuntu 12.04,
(as a beta tester).
- Re-setup my working environment.
* GCC release process familiarization.
* Boehm GC AArch64 support:
- Resume libatomic-ops work.
* Some internal support
== Next ==
* Continue on the Boehm GC AArch64 support.
== Progress ==
* Started working on "Turn off 64 bits Bitops in Neon in GCC" blueprint.
* branch review for aarch64-4.7 merge.
A lot of time wasted due network instability making it difficult to
checkout a GCC branch from launchpad/bzr.
* Internal support for infrastructure problems.
* Resumed discussions with our internal IT and Christian Bejram to try to
decrease our constraints.
The Linaro Toolchain Working Group is pleased to announce the 2012.11
release of both Linaro GCC 4.7 and Linaro GCC 4.6.
Linaro GCC 4.7 2012.11 is the eigth release in the 4.7 series. Based
off the latest GCC 4.7.2+svn193200 release, it includes ARM-focused
performance improvements and bug fixes.
Interesting changes include:
* Updates to GCC 4.7.2+svn193200
* Also includes arm/aarch64-4.7-branch up to svn revision 193328.
Fixes:
* LP #1065122
* LP #1065559
* LP #1067760
Linaro GCC 4.6 2012.11 is the 21st release in the 4.6 series. Based
off the latest GCC 4.6.3+svn193199 release, this is the eigth release
after entering maintenance.
Interesting changes include:
* Updates to 4.6.3+svn193199
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.7-2012.11https://launchpad.net/gcc-linaro/+milestone/4.6-2012.11
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release pages:
https://launchpad.net/gcc-linaro/4.7/4.7-2012.11https://launchpad.net/gcc-linaro/4.6/4.6-2012.11
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? Inquire at support(a)linaro.org
--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann(a)linaro.org
Hi,
I've encountered a case where gcc produces a broken program: a branch that should never be taken is taken, and wrong values are written to memory (and printed out).
The code is fairly ordinary and small. It can be seen here: http://pastebin.com/0Hspz8mw
This happens when -funroll-loops flag is used in conjunction with -O2 or -O3. It doesn't seem to happen when it is used with -O1.
Another few things that influences the program flow from from incorrect to correct run (gives expected outpus) are:
- Adding/removing printf's inside the inner loop
- Changing the order of the expressions in the "if" clause. i.e. from this:
if ((y < mu) || (y >= H - md) ||
(x < ml) || (x >= W - mr))
to this:
if ((x < ml) || (y >= H - md) ||
(y < mu) || (x >= W - mr))
- Assigning ml inside f() to the same value (3) it's getting from the function arguments.
All of these shouldn't change how the program behaves but it does.
I compiled this with two different compilers/environments:
1. g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, running on 3.2.1-42-linaro-lt-mx6 (native compilation on the ARM board)
Compilation command:
g++ -march=armv7-a -mfpu=neon -mfloat-abi=hard -mtune=cortex-a9 -O3 -std=c++0x -funroll-loops -o test_bug_sa_loops_linaro test_bug.cxx
2. arm-fsl-linux-gnueabi-g++ (Freescale MAD -- Linaro 2011.07 -- Built at 2011/08/10 09:20) 4.6.2 20110630 (prerelease)
Running on a freescale LTIB built linux (3.0.15-1359-g1b64ead)
Compilation command:
arm-fsl-linux-gnueabi-g++ -march=armv7-a -mfpu=neon -mfloat-abi=hard -mtune=cortex-a9 -O3 -std=c++0x -funroll-loops -o test_bug_sa_loops test_bug.cxx
In all the variations I tried it seems that -funroll-loops is critical for this problem to appear.
I'd be glad to hear some comments on this.
Mickey.
This mail was sent via Mail-SeCure system.
On 10 November 2012 05:11, "Frank Müller" <franky1976(a)gmx.net> wrote:
> Michael Hope <michael.hope(a)linaro.org>:
>> My suspicion is that we/crosstool-NG enable extra features like
>> Graphite or GCC is built with a different level of checking. If you
>
> I suspected Graphite as well and removed it in my own builds without noticable difference.
>
>> have the time, could you check the flags passed to GCCs configure?
>> You can do this on Ubuntu using:
>>
>> apt-get build-dep gcc
>> apt-get source gcc
>> dpkg-buildpackage -uc -us -b
>>
>> and compare the configure line with the one in crosstool-NG's build.log.
>
> Isn't this the same as gcc -v? I've posted the lines at http://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002913.html
Good point. There's nothing obvious in the list. Ubuntu explicitly
adds --enable-checking=release but it's the default for release
branches like ours.
I can reproduce the slowdown in a smaller testcase. Compiling pcre
with -O3 -mfpu=neon -march=armv7-a -mtune=cortex-a8 takes 18.8 s for
the Ubuntu Precise 4.6 compiler, 17.8 s for the Ubuntu Quantal 4.7
compiler, and 41.2 s for the Linaro 4.7 2012.10 build. I've logged
LP: #1077739 to track. I'll spin a --enable-checking=release build
just to check.
> The above lines do not work for me, the last line misses a changelog file:
>
> # dpkg-buildpackage -uc -us -b
> tail: cannot open `debian/changelog' for reading: No such file or directory
> dpkg-buildpackage: error: tail of debian/changelog gave error exit status 1
Yip, you need to change to the just-extracted source directory first.
-- Michael
On 14 November 2012 00:48, 남관우 <kw46.nam(a)samsung.com> wrote:
>
> Hi,
>
>
>
> First, our CFLAGS is here.
>
>
>
> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Wl,--as-needed
> -fmessage-length=0 -march=armv7-a -mtune=cortex-a8 -mfpu=vfpv3-d16 -mfloat-abi=hard -mthumb -Wa,-mimplicit-it=thumb
> -mapcs -mno-sched-prolog -mabi=aapcs-linux -Uarm -fno-common -fpic
>
>
>
> It was occurred with the message. (/usr/lib/libnfc-common-lib.so.1: unexpected reloc type 0x03)
>
>
>
> Second,
>
> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Wl,--as-needed
> -fmessage-length=0 -march=armv7-a -mtune=cortex-a8 -mfpu=vfpv3-d16 -mfloat-abi=hard -mthumb -Wa,-mimplicit-it=thumb
> -mapcs -mno-sched-prolog -fno-common -fpic
>
>
>
> It was occurred too. (/usr/lib/libnfc-common-lib.so.1: unexpected reloc type 0x03)
Hi there. I don't know the cause but I'm suspicious of a few things.
Could you try the following builds?
The most likely:
* Without -mapcs
* Without -fstack-protector --param=ssp-buffer-size=4
Less likely:
* Without -fno-common
* With -fPIC instead of -fpic (should make no difference on ARM)
Could you also send through the linker command line? It would be
great to get a full log up on pastebin or similar.
-- Michael