Adjusted my 64-bit shifts patch to address Richard Earnshaw's concerns,
tested it, and posted the new one upstream.
Continued trying to figure out how ira-costs.c works, and in particular,
why it doesn't choose to do 64-bit stuff in NEON when I think it should.
Basically, the problem seems to be that when hard regs are *already*
assigned, prior to IRA (say because they are function parameters or
return values), then the allocator does not even consider the
possibility of moving that value to another register unless it
absolutely has to. It will merrily choose the worst possible option just
because it's the easiest decision.
Merged FSF GCC 4.5 into Linaro GCC 4.5. Likewise for 4.6. Pushed the
branches to Launchpad for testing. The 4.5 testing did not come back
totally clear, so this may delay the release a little. Hmmm.
Updated my FSF GCC 4.7 checkout and rebuilt it. This time the build
succeeded, so I've used it as the basis for a shiny new launchpad branch
"lp:gcc-linaro/4.7". I've created the release series to go with it.
Applied my new 64-bit shifts patch to the new GCC 4.7 Linaro branch and
submitted a merge request. This is mostly for the purposes of getting
the test results at this point.
Hi guys,
I compile a native gdb using linaro 2011.10 by “./configure
--host=arm-none-linux-gnueabi --target=arm-none-linux-gnueabi”, and
the gdb runs on arm target boards directly.
# gdb
GNU gdb (Linaro GDB) 7.3-2011.10
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-none-linux-gnueabi".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>.
(gdb)
I can use it to debug native programs on target boards directly. For
example, attach process, set breakpoints, check registers and memory.
One issue is I can't see multi-threads, for example:
PID 646 is system_server by ps
"646 1000 159m S system_server"
Then I use gdb to attach it:
# gdb attach 646
(gdb) info threads
Id Target Id Frame
* 1 process 646 "system_server" __ioctl ()
at bionic/libc/arch-arm/syscalls/__ioctl.S:15
as you see, “info threads” only shows one thread but there are several
threads in system_server.
But if I compile a new program based on glibc and gnu libthread, I can
see multi-threads by the gdb.
So my questions are:
1. Should I compile the native gdb using android toolchain and android
bionic/libthread libraries?
2. Why can’t the current gdb capture multithreads for android
processes? This question is actually about the theory for gdb to know
multi-threads. In my opinion, both gnu and android use clone() to fork
threads and threads in one process have same tgid in kernel and all
threads return same getpid() value. Why not gdb just travel process
lists to find multi-threads?
Thanks
Barry
Hi,
* Learned the basics of bzr and examined the gdb-linaro repository.
* Went through Michael Hope's steps to import upstream's 7.4 branch into
bzr.
* Explored gdb-linaro bugs and blueprints in Launchpad to familiarize
myself with what has been done
and is planned or proposed to be done.
* Went through the gdb-linaro/7.3 branch to verify what needs to be
forward-ported to gdb-linaro/7.4.
Forward-ported 10 patches.
* Checked which Linaro Connect sessions would be of interest for me to
attend remotely, but
found out that only one will be available for remote participation.
* Worked very little on Wednesday since my laptop refused to turn on
again after I hibernated it.
I found out on the next day that plugging in an external monitor makes
it happy again (I didn't
have a monitor on Wed to try this out so I was stuck). Apparently the
LCD screen died.
--
[]'s
Thiago Jung Bauermann
Linaro Toolchain Working Group
Hi,
libunwind
* reviewed small patch from T. R. of Nokia who provided a bugfix
when searching for unwind table entry for an IP
OpenEmbedded
* build the OE-core images (minimal, sato and qt4e) with -O1 and -O0
* collected the ELF size and memory footprint and updated the charts
* encountered an issue when compiling Qt 4.8.0 using -O0. It causes
qdbusviewer fail to link
because an .LTHUNK symbol survives
* tested various compilers and optimization levels and
noticed that the .LTHUNK symbols do also survive with higher
optimization levels
* only the Linaro and ARM CSL toolchains seem to be affected
(FSF trunk, 46branch and 46release seem to work)
* provided a reduced testcase and opened lp #924726
* Linaro cc1 emits undefined label when using -fPIC -Os (lp #924889)
* already fixed upstream, Ramana is backporting to Linaro GCC
* look into the external-toolchain branch from C. Larson:
https://github.com/kergoth/oe-core/tree/external-toolchain
and tested it against CSL 2011.03 -> works fine
* started to document:
https://wiki.linaro.org/KenWerner/Sandbox/OpenEmbedded-Core
Regards,
Ken
== GCC ==
* Benchmarking the 4.6 backport of subreg forward-propagation
confirmed that this is a net loss. On 4.7, microbenchmarks
suggest a different outcome (due to register allocator
enhancements), so I've created a 4.7 Bazaar branch including
the patch and submitted it for benchmarking.
* Implemented a patch to allow memory operands with vec_set and
vec_extract to avoid excessive vmov generation in the PR 51819
test case. Patch shows no regression on microbenchmarks; full
testing and benchmarking still outstanding.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-02-20 || ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?||2012-02-01 ||
(for blueprint definitions: https://wiki.linaro.org/PeterMaydell/QemuKVM)
Historical Milestones:
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
== cp15-rework ==
* Since this isn't in the critical path any more we'll work on it
next quarter, and adjust its priority/due dates post Connect.
== qemu-kvm-getting-started ==
* finished writing up HowTo documents for reproducing the KVM
prototype and running it on a Fast Model:
https://wiki.linaro.org/PeterMaydell/KVM/HowTo
* did the last bits of testing enough to be able to say we've done
the initial prototype work for TCWG2011-A15-KVM, which means we
can close out the qemu-kvm-getting-started blueprint.
== other ==
* more upstream patch review, etc
* LP:926012: patches to support prctl(PR_SET_NAME, ...) in linux-user
mode, for the benefit of perl 5.14.
Summary:
* Analyze PATH issues for win32 binary toolchain.
Details:
1. Analyze PATH issues for win32 binary toolchain.
* gcc can not find the install dir if user set
PATH="INSTALL_DIR\bin" in dos cmdline, which leads to compile fail
since gcc can not find ../lib, ../libexec, etc.
* Root cause: in dos, " is taken as part of dirs during set PATH,
which is different from cygwin or linux. And dir with " is invalid in
dos.
* Work out a patch to filter " in function make_relative_prefix_1
and discuss it with Michael.
* Create an windows install package, so users do not need to set
PATH. Will discuss more detail with Michael for the following
releases.
2. Read document to ramp-up on gcc.
Plans:
* Feb 6-10: Linaro Connect Q1.12
Best regards!
-Zhenqiang
I've set up a new user on my laptop that we can use for experimenting
with benchmarks during Connect. Here's what we've got:
* A user called 'connect'
* Ramana, Ulrich, Åsa, and myself can log in via SSH
* bzr repo for trunk, 4.6, and gcc-linaro 4.6
* Tarballs of all gcc-linaro 4.4, 4.5, and 4.6 releases
* Tarballs of the recent CSL, Google 4.6, and Android 4.4 compilers
* A sysroot in ~/sysroot
* Cross-binutils etch in ~/cross
* A shared ccache in ~/ccache primed with these
* A ~/env.sh that sets up the right paths etc
* A ~/builds/doconfigure.sh that configures for our standard ARMv7-A
C/C++/Fortran configuration
* EEMBC, DENBench, and SPEC 2000 under ~/benchmarks
The trees under build/ build into build/$ver/build and install into
build/$ver/install.
The benchmarks are set up to cross build and cross run. I've pulled
ursa2 out of the farm and nuked it, so we can do something like:
make CROSS_COMPILE=arm-linux-gnueabi- RUN="$PWD/sshrun ursa2"
to run the benchmarks.
I'll see about perf. We still need to re-baseline the benchmarks and
get perf traces.
Anything else needed?
You can log in and try things if you want. Go through the bounce host
and try connecting as connect@crucis.
-- Michael
Hi Ramana,
as you pointed out, in the gcc.dg/vect/vect-double-reduc-6.c test case,
using compiler options as described in PR 51819, we see the following
inefficient code generation:
vmov.32 r2, d28[0] @ 57 vec_extractv4si [length = 4]
vmov.32 r1, d22[0] @ 84 vec_extractv4si [length = 4]
str r2, [r0, #4] @ 58 *thumb2_movsi_vfp/7 [length =
4]
vmov.32 r3, d0[0] @ 111 vec_extractv4si [length = 4]
str r1, [r0, #8] @ 85 *thumb2_movsi_vfp/7 [length =
4]
vst1.32 {d2[0]}, [r0:64] @ 31 neon_vst1_lanev4si
[length = 4]
str r3, [r0, #12] @ 112 *thumb2_movsi_vfp/7 [length =
4]
bx lr @ 120 *thumb2_return [length = 12]
(The :64 alignment in vst1.32 is incorrect; that is that actual problem in
PR 51819, which is now fixed.)
The reason for this particular code sequence turns out to be as follows:
The middle end tries to store the LSB vector lane to memory, and uses the
vec_extract named pattern to do so. This pattern currently only supports
an "s_register_operand" destination, and is implemented via vmov to a core
register. The contents of that register are then stored to memory. Now
why does any vst1 instruction show up? This is because combine is able to
merge the vec_extract back into the store pattern and ends up with a
pattern that matches neon_vst1_lanev4si. Note that the latter pattern is
actually intended to implement NEON built-ins (vst1_lane_... etc).
Now there seem to be two problems with this scenario:
First of all, the neon_vst1_lane<mode> patterns seem to be actually
incorrect on big-endian systems due to lane-numbering problems. As I
understand it, all NEON intrinsics are supposed to take lane numbers
according to the NEON ordering scheme, while the vec_select RTX pattern is
defined to take lane numbers according to the in-memory order. Those
disagree in the big-endian case. All other patterns implementing NEON
intrinsics therefore avoid using vec_select, and instead resort to using
special UNSPEC codes -- the sole exception to this happens to be
neon_vst1_lane<mode>. It would appear that this is actually incorrect, and
the pattern ought to use a UNSPEC_VST1_LANE unspec instead (that UNSPEC
code is already defined, but nowhere used).
Now if we make that change, then the above code sequence will contain no
vst1 any more. But in any case, expanding first into a vec_extract
followed by a store pattern, only to rely on combine to merge them back
together, is a suboptimal approach. One obvious drawback is that the
auto-inc-dec pass runs before reload, and therefore only sees plain stores
-- with no reason whatsoever to attempt to introduce post-inc operations.
Also, just in general it normally works out best to allow the final choice
between register and memory operands to be make in reload ...
Therefore, I think the vec_extract patterns ought to support *both*
register and memory destination operands, and implement those via vmov or
vst1 in final code generation, as appropriate. This means that we can make
the choice in reload, guided as usual by alternative ordering and/or
penalties -- for example, we can choose to reload the address and still use
vst1 over reloading the contents to a core register and then using an
offsetted store.
Finally, this sequence will also allow the auto-inc-dec pass to do a better
job. The current in-tree pass doesn't manage unfortunately, but with
Richard's proposed replacement, assisted by a tweak to the cost function to
make sure the (future) address reload is "priced in" correctly, I'm now
seeing what appears to be the optimal code sequence:
vst1.32 {d6[0]}, [r0:64]! @ 30 vec_extractv4si/1
[length = 4]
vst1.32 {d22[0]}, [r0]! @ 56 vec_extractv4si/1 [length =
4]
vst1.32 {d2[0]}, [r0:64]! @ 82 vec_extractv4si/1
[length = 4]
vst1.32 {d4[0]}, [r0] @ 108 vec_extractv4si/1 [length =
4]
bx lr @ 116 *thumb2_return [length = 12]
(Again the :64 is wrong; it's already fixed on mainline but I haven't
pulled that change in yet.)
The attached patch implements the above-mentioned changes. Any comments?
I'll try to get some performance numbers as well before moving forward with
the patch ...
(As an aside, it might likewise be helpful to update the vec_set patterns
to allow for memory operands, implemented via vld1.)
(See attached file: diff-gcc-arm-vecextractmem)
B.t.w. I'm wondering how I can properly test:
- that the NEON intrinsics still work
- that everything works on big-endian
Any suggestions would be appreciated!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi Andrew. gcc-4.7~svn183693 just finished running through the auto
builders and builds and tests in the ARM, Cortex-A9, i686, and x86_64
configurations.
You could base the Linaro 4.7 branch off that. It's r114897 in bzr
and, suitably, Sandra is the author.
-- Michael
For your records:
Loïc wrote a script that pushes the revisions from a GCC branch into
the current development focus. This makes branching on a shared
repository much faster. The script is here:
http://bazaar.launchpad.net/~linaro-toolchain-dev/cbuild/tools/view/head:/e…
and it runs daily on a cronjob as the cbuild user on the 'apus' EC2
instance (phew).
We'll need to update this when 4.7 comes along.
-- Michael
I've gone through the documentation on the wiki about how to get
going with KVM on Fast Models, to clean it up and reorganise it
and add some of the missing bits (notably how to set up a KVM
guest kernel and filesystem). It's now at:
https://wiki.linaro.org/PeterMaydell/KVM/HowTo
The only minor point I'd still like to address is that at the
moment we document the old-style "build your kernel and arguments
into an .axf file" boot-wrapper, because my changes to support
specifying them at model runtime haven't yet gone into the
boot-wrapper git repo. When they do land I'll update the wiki.
(Yes, technically TCWG2011-A15-KVM says "one page summary" but
I thought splitting it into four pages was much clearer :-))
-- PMM
Ken, Åsa: could you add a -O0 and -O1 build to the size and benchmark
results? I'm looking at the writeup and it would be interesting to
contrast the speed/size of -O0 with -O2.
Ta,
-- Michael
Continued work on 64-bit shifts in core registers. This has now been
posted to gcc-patches, and is awaiting review.
64-bit shifts in NEON are also working correctly, but the register
allocator chooses not to use them most of the time. I've begun trying to
work out why, but it's quite involved in ira-costs.c and will take some
unpicking, I think.
Attempted to create a Linaro GCC 4.7 branch, but my test build failed,
so that'll have to wait until it's stabilized a little.
Hi Marcin, Ricardo. How is the work on pre-built sysroots coming
along? I'd like to use/reference them in the next binary toolchain
release.
I'm looking for:
* Scripts that produce the sysroots
* A README that covers what they contain and how to reproduce them
* Test plan
* An official tagged branch holding the above
* A tarball release of the above
* A tarball release of the different sysroots
* Done in a way so they easily integrate with the binary builds[1]
and are useful to others as a sysroot
* Relocatable
* Usable on win32 and Linux
all hosted somewhere. Zhenqiang and I can test and give you feedback.
For reference, here's the README for the binary builds:
http://launchpadlibrarian.net/90998258/README.txt
Here's the simple script I used to make a libc only sysroot:
http://bazaar.launchpad.net/~linaro-toolchain-dev/crosstool-ng/linaro/view/…
I used chdist as I don't know multistrap. I guessed and added
build-dep support to download the build dependencies. It also fixes
up the absolute symlinks to relative.
-- Michael
[1] https://launchpad.net/linaro-toolchain-binaries
==Progress===
* Fixed PR48308 on FSF trunk. Needs backporting to FSF GCC 4.6 branch
* Fixed a number of failing testcases on trunk.
* Read up on Partial-partial PRE . Slow progress but getting a handle
on the theory now. A couple of approaches being benchmarked . Still
slow progress.
* Debugged Andrew's issues with 64 bit shifts. Nice that skype screen
sharing works well on Ubuntu.
* Started notes for Connect 2012.q1.
* Looked into the strd / strexd failure on the testcase in trunk.
Looked at a small patch to implement sync_lock_releasedi for ARM but
needs some more time and effort. Filed issue
https://bugs.launchpad.net/gcc-linaro/+bug/922474
=== Plans ===
* Finish 1x AFDS
* Continue with partial-partial PRE .
* Finish backport of fix for PR50313 to appropriate branches
* Start preparing for Connect 2012.q1
* Do something about the PGO and ABI patches next week.
Absences.
* Feb 6-10 : Linaro Connect Q1.12.
* Feb 13- 18 : Holiday.
Hi,
* libunwind
* reviewed small patch from T. R. of Nokia who provided a
bugfix in case unwind instructions are popping VFP registers
* exchanged mails with P. W. from Bosch who encountered a crash
in case DWARF info is involved
* OpenEmbedded
* changed Qt build to respect the optimization flags
* wrote a script around QEMU to measure the memory footprint
using named pipes to communicate with the serial console of the guest
* rebuilt the minimal, sato and qt4e images using the
gcc-linaro-arm-linux-gnueabi-2012.01-20120125_linux
(optimization levels: -O2, -O3 -fno-tree-vectorize and -O3)
* collected/updated the ELF size and memory footprint results:
*
https://docs.google.com/spreadsheet/ccc?key=0AmsCLxCMnnISdDNQSEM2ZHIxd3dVNj…
*
https://docs.google.com/spreadsheet/ccc?key=0AmsCLxCMnnISdHFtZll0OWdiTlhpdj…
* uploaded the image packs to: http://people.linaro.org/~kwerner/oe-core/
* prepared the environment to build the images using -O1
Regards,
Ken
== GDB ==
* Committed follow-on patch to fix cosmetic issues resulting from
the remote "info proc" patch set.
== GCC ==
* Created 4.6 backport branch including Richard's subreg forward-
propagation branch, the modes-tieable patch, and the combine.c
regression fix, and evaluated for correctness and performance.
Investigation of performance regressions uncovered a problem in
the register allocator: tied subregs (validly) cause somewhat
larger register pressure in certain cases, and this caused the
4.6 IRA to generate spills. Note that the 4.7 IRA is still
able to allocate every pseudo in the very same code; this is
a result of Vladmir's significant IRA rewrite in 4.7 ...
* Investigated Ramana's vld1 patches, and a problem with excessive
vmov's in the PR 51819 test case pointed out by Ramana. It turns
out that when we extract a lane from a vector to memory using an
offsetted address, we move the value through a core register
instead of simply computing the address and using vst1.
I'm working on a patch to address this.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||cp15-rework || 2012-01-06 || 2012-02-20 || ||
||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| ||
(for blueprint definitions: https://wiki.linaro.org/PeterMaydell/QemuKVM)
Historical Milestones:
||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
== cp15-rework ==
* discussions with Rusty
== qemu-kvm-getting-started ==
* wrote patches for the A15 Fast Models boot-wrapper that allow you
to pass the kernel/initrd/kernel command line via a parameter to
the model rather than having to hard code them into an ELF file
along with the boot-wrapper. This should make life a lot easier for
the validation folk.
* various discussions with validation/etc about what you can and
can't do with a model and what the best setup is going to be
* sorted out a Debian fs to use for KVM guest : this is needed because
the kernel module doesn't do Thumb or VFP yet.
* started on rearranging and improving the writeup/HOWTO on the wiki
== other ==
* target-arm/arm-devs queues pushed upstream [Calxeda Highbank model
is now in upstream QEMU]
* helping Will Deacon track down what looks like a race condition in
Linux's handling of VFP on SMP where signal handlers are involved
* a QEMU Object Model patch series has landed upstream -- this will
be a nasty rebase for qemu-linaro next week I think
Hi!
* -o2/-o3 benchmarking:
Worked on and shared charts for the -o2/-o3 benchmark runs with
gcc-linaro-4.6-2012.01. I have created charts for both the score of the
benchmarks and the sizes (text segment) of the executables. Making
improvements continuously as I get feedback from the team.
The charts are stored in GoogleDocs:
https://docs.google.com/a/linaro.org/leaf?id=0B1IP4dZWVaxZOGZlM2UwMGQtOGQyZ…
* gcc4.7 benchmarking:
Did benchmark runs on a late revision of gcc4.7.
Next week I will process the data and list regressions/improvements against
gcc4.6.2 and gcc-linaro-4.6-2012.01.
* v8:
Investigated possibility of running SunSpider stand-alone on the v8 engine.
Found out that there is a stand-alone driver included in WebKit.
Have built v8 on x86 and ARM (on ARM it takes ~18 min) and run SunSpider
and the v8 benchmark suite.
Next step I will try to build with another toolchain. It is not Make but
SCons that controls the build, so I am not sure how to do that.
https://wiki.linaro.org/AsaSandahl/Sandbox/JavaScriptBenchmarks
* Bug triaging:
I have triaged all "easy" ones in the list of new tickets. Michael will
help me look through the rest of new and incomplete bugs which are mostly
old ones (waiting for someone to take some action) or very tricky ones.
Regards
Åsa
On Wed, Jan 25, 2012 at 7:02 AM, Ulrich Weigand
<Ulrich.Weigand(a)de.ibm.com> wrote:
>
> Hi Michael,
>
> GDB 7.4 has finally been released. What do we need to do to get the
> 7.4-branch imported into Launchpad? I think last time (for 7.3) you did
> that somehow ...
Done as:
lp:~linaro-toolchain-dev/gdb/7.4
and documented at:
https://wiki.linaro.org/MichaelHope/Sandbox/GDBImport
-- Michael
Peter, could you review:
https://wiki.linaro.org/WorkingGroups/ToolChain/QEMUARMGuestQ411
please? I'm interesting in anything I've missed or exaggerated. The
audience is the TSC or people outside Linaro who are interested in a
summary of the changes.
-- Michael