Public bug reported:
FTBFS on armel
https://launchpadlibrarian.net/68239668/buildlog_ubuntu-natty-armel.augeas_…
not apparent from the log but the failing of test-interpreter.sh is due to a core dump.
Starting program: /home/jani/work/ftbfs/aug/augeas-0.8.0/src/.libs/lt-
augparse --nostdinc -I . fail_let_no_exp.aug
Program received signal SIGSEGV, Segmentation fault.
strlen () at ../ports/sysdeps/arm/strlen.S:29
29 ../ports/sysdeps/arm/strlen.S: No such file or directory.
in ../ports/sysdeps/arm/strlen.S
(gdb) bt
#0 strlen () at ../ports/sysdeps/arm/strlen.S:29
#1 0x4016c050 in _IO_vfprintf_internal (s=<value optimized out>, format=<value optimized out>, ap=<value optimized out>) at vfprintf.c:1620
#2 0x401d7b66 in __vasprintf_chk (result_ptr=0xbee5097c, flags=1, format=0x400d961c "%s", args=...) at vasprintf_chk.c:68
#3 0x400bfad6 in vasprintf (info=<value optimized out>, code=<value optimized out>, format=0x400d961c "%s", ap=...) at /usr/include/bits/stdio2.h:199
#4 format_error (info=<value optimized out>, code=<value optimized out>, format=0x400d961c "%s", ap=...) at syntax.c:96
#5 0x400bfd98 in syntax_error (info=0x1, format=0x400d961c "%s") at syntax.c:124
#6 0x400c3e96 in augl_error (locp=<value optimized out>, term=<value optimized out>, scanner=<value optimized out>, s=0x400d7abc "syntax error") at parser.y:628
#7 0x400c54f8 in augl_parse_file (aug=0x1ef1878, name=<value optimized out>, term=0xbee50a64) at parser.y:362
#8 0x400c153a in load_module_file (aug=0x1ef1878, filename=0xbee50ddb "fail_let_no_exp.aug") at syntax.c:1951
#9 0x400bbf0a in __aug_load_module_file (aug=0x1ef1878, filename=0xbee50ddb "fail_let_no_exp.aug") at augeas.c:1447
#10 0x00008b04 in main (argc=<value optimized out>, argv=0xbee50c84) at augparse.c:131
** Affects: gcc-linaro
Importance: Undecided
Status: New
** Affects: augeas (Ubuntu)
Importance: Undecided
Status: New
** Tags: arm-porting-queue
** Also affects: gcc-linaro
Importance: Undecided
Status: New
** Summary changed:
- segfaults in make check pass when built with optimization
+ [armel] segfaults in make check pass when built with optimization
** Tags added: arm-porting-queue
--
You received this bug notification because you are a member of Linaro
Toolchain Developers, which is subscribed to Linaro GCC.
https://bugs.launchpad.net/bugs/758082
Title:
[armel] segfaults in make check pass when built with optimization
Status in Linaro GCC:
New
Status in “augeas” package in Ubuntu:
New
Bug description:
FTBFS on armel
https://launchpadlibrarian.net/68239668/buildlog_ubuntu-natty-armel.augeas_…
not apparent from the log but the failing of test-interpreter.sh is due to a core dump.
Starting program: /home/jani/work/ftbfs/aug/augeas-0.8.0/src/.libs/lt-
augparse --nostdinc -I . fail_let_no_exp.aug
Program received signal SIGSEGV, Segmentation fault.
strlen () at ../ports/sysdeps/arm/strlen.S:29
29 ../ports/sysdeps/arm/strlen.S: No such file or directory.
in ../ports/sysdeps/arm/strlen.S
(gdb) bt
#0 strlen () at ../ports/sysdeps/arm/strlen.S:29
#1 0x4016c050 in _IO_vfprintf_internal (s=<value optimized out>, format=<value optimized out>, ap=<value optimized out>) at vfprintf.c:1620
#2 0x401d7b66 in __vasprintf_chk (result_ptr=0xbee5097c, flags=1, format=0x400d961c "%s", args=...) at vasprintf_chk.c:68
#3 0x400bfad6 in vasprintf (info=<value optimized out>, code=<value optimized out>, format=0x400d961c "%s", ap=...) at /usr/include/bits/stdio2.h:199
#4 format_error (info=<value optimized out>, code=<value optimized out>, format=0x400d961c "%s", ap=...) at syntax.c:96
#5 0x400bfd98 in syntax_error (info=0x1, format=0x400d961c "%s") at syntax.c:124
#6 0x400c3e96 in augl_error (locp=<value optimized out>, term=<value optimized out>, scanner=<value optimized out>, s=0x400d7abc "syntax error") at parser.y:628
#7 0x400c54f8 in augl_parse_file (aug=0x1ef1878, name=<value optimized out>, term=0xbee50a64) at parser.y:362
#8 0x400c153a in load_module_file (aug=0x1ef1878, filename=0xbee50ddb "fail_let_no_exp.aug") at syntax.c:1951
#9 0x400bbf0a in __aug_load_module_file (aug=0x1ef1878, filename=0xbee50ddb "fail_let_no_exp.aug") at augeas.c:1447
#10 0x00008b04 in main (argc=<value optimized out>, argv=0xbee50c84) at augparse.c:131
Public bug reported:
The 2.32.2 upload of gconf is likely miscompiled and segfaults. This
leads to other armel FTBFSs in the archive when calling gconftool-2 as
part of the install phase.
** Affects: gcc-linaro
Importance: Undecided
Status: New
** Affects: gconf (Ubuntu)
Importance: Undecided
Status: New
** Tags: arm-porting-queue
** Package changed: ubuntu => gconf (Ubuntu)
** Also affects: gcc-linaro
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Linaro
Toolchain Developers, which is subscribed to Linaro GCC.
https://bugs.launchpad.net/bugs/757427
Title:
gconftool-2 segfaults on arm
Status in Linaro GCC:
New
Status in “gconf” package in Ubuntu:
New
Bug description:
The 2.32.2 upload of gconf is likely miscompiled and segfaults. This
leads to other armel FTBFSs in the archive when calling gconftool-2 as
part of the install phase.
== Last week ==
* Sent a fix for PR target/46329 upstream.
* Discussed with Richard Guenther how to represent the interleaved
load/store "functions" that we're adding to gimple. Sent a patch
upstream for comments. Richard confirmed on IRC that he was happy
with it, and no-one else has objected.
* Spent most of the week on the vectorisation itself, and on the
testsuite.
== This week ==
* Finish work on vectorisation testsuite and submit.
Richard
== Last week ==
* Mon/Tue (Apr.4--5): Tomb-sweeping Day, public holiday.
* PR48250 / CS Issue #9845 / Launchpad #723185. Unaligned DImode reload
under NEON. Worked on new patch, submitted to gcc-patches after testing
on Friday. Awaiting review.
== This week ==
* CoreMark ARMv6/v7 regressions: working on new combine patch.
The test results for the patch for lp:675347 on GCC 4.6 came back clean,
so I merged it to Linaro GCC 4.6.
The test results for lp:675347 on 4.5 had problems though, but they
might be unrelated to the patch. The test results for the "discourage
NEON on A8" patch had similar failures, and that's a 4.6 testsuite.
Richard Earnshaw approved the Thumb register allocation patch. I've
committed it upstream, and updated the patch trackers. It was already on
the Linaro 4.6 branch.
Now that GCC 4.6 is released, switched all the Linaro tracking tickets
from 'Fix committed' to 'Fix released'.
Merged from FSF 4.5 to Linaro 4.5 and submitted the patch for test. The
tests came back clean, so I pushed it to the 4.5 branch. (Yay for
Michael's new test service!)
Merged more patches from SG++ to Linaro. Or, at least considered them
for merge. Mostly I decided that they were not appropriate for Linaro,
at least, not just yet. I have yet to push these patches to Launchpad.
Reviewed Richard Sandiford's patch for LP:714921.
Retried the Android build with a view to integrating Android support in
Linaro GCC 4.5 (4.6 should already support it). Eventually, after
downloading many different git repositories and branches, and maxing out
the memory on my machine a few times, I managed a successful build using
the toolchain the Android team are using. I then backported Maxim's
patches to Linaro GCC 4.5, and built and tested that, and got another
successful Android build. I've pushed the patched toolchain to Launchpad
at lp:~ams-codesourcery/gcc-linaro/android for testing. All being well,
I'll merge Android support into the 4.5 trunk in time for the next release.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
- Back from holiday, short week.
== Porting jam ==
* We seem to have picked up a lot of ftbfs in the last couple of
weeks - which is unfortunate because it may well be too close to the
Natty
release to do anything about them
* Bug 745843 is a repeatable segfault in part of the build process
of a package called vtk that is used by a few other things ; I've got
this
down to a particular call of one function - although gdb is getting
rather confused (r0 & r1 changing as I single step across a branch)
* Bug 745861 petsc build failure; I'm getting one of two different
link errors depending which mood it is in - mpi related?
* Bug 745873 - a meta package that just didn't have a list of
packages to build with for armel; easy to do a simple fix (provided
branch that built) for but the maintainer
says it's too late for natty anyway and some more thought is needed.
== Other ==
* Reading over some optimisation documents
* Tested weekly release on Beagle-c4 (still no OTG usb and hence no
networking for me)
* Also simple boot test on panda; not much time for more thorough
test. (seems to work)
Dave
Hi,
== libunwind ==
* created a generic and local variant of the extbl parser
* continued to look into testsuite failures
* down to 12 failures: https://wiki.linaro.org/KenWerner/Sandbox/libunwind
* continue to post patches upsteam
Note: I'll be out of office to attend a class starting from Wednesday till
Friday next week.
Regards
Ken
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
== maintain-beagle-models ==
* I spent a couple of days on initial cleanup of the omap3 patchstack
in qemu-linaro. It's still some way from being upstreamable but at
least now every patch in the stack compiles; this should make
rebasing on upstream a bit less painful.
* the board-ram-limits patchset is still stalled with upstream :-(
== merge-correctness-fixes ==
* Aurelien applied lots of patches so the pipeline has drained again
* cleaning up/reworking patches which fix handling of Neon UNDEF cases.
Not very exciting but it will get a large set of patches out of the
qemu-linaro patchstack.
== other ==
* meetings: toolchain, standup
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
Hello
The Launchpad user named 'Michael Hope (michaelh1)' requested the
registration of 'linaro-toolchain(a)lists.linaro.org' as the contact email address
of team 'Linaro Toolchain Developers'. This request can only be made by a team
owner/administrator, so if this change request was unexpected or was
not requested by one of the team's administrators, please contact
system-error(a)launchpad.net.
If you want to make this email address the contact email of
'Linaro Toolchain Developers', please click on the link below and follow the
instructions.
https://launchpad.net/token/6sQQKQ6kx3XP9MlWMwmX
Thanks,
The Launchpad Team
Hi there. The new porter boxes are now available for use. See:
https://wiki.linaro.org/WorkingGroups/ToolChain/Hardware
for details.
These are PandaBoards with 768 MB of memory, a USB HDD, and a good
internet connection. They can be used for day to day jobs like
building programs, triaging bugs, and running benchmarks.
Use dchroot natty to switch into the chroot. Use sudo apt-get install
yyy to install packages. The build dependences for GCC, GDB, and
binutils should already be installed.
-- Michael
Hi,
* continued bringing patches upstream
- changing default vector size to 128 - resubmitted with changes
according to comments, awaiting review
- if-conversion improvement - committed
* PR 48252 - bug in vzip/vuzp/vtrn implementation - patch submitted
* opened PR 48454 - a test failure with -mvectorize-with-neon-quad
Next week - vacation.
April 18-27 - Passover Holiday, I'll only work half days on April 18
and April 24. And possibly half days on April 20 and 21.
Ira
== GDB ==
* Ongoing work to fix single-stepping over signal handlers (bug #615978).
* Posted patch to support NEON registers in core files (bug #615972).
* Failure to disable address space randomization (bug #616001) is shown
to be a kernel problem; created stand-alone test case and opened bug
against kernel team.
== Schedule ==
* On vacation 04/07 - 04/15.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
On 25/03/11 21:48, Diane Holt wrote:
> I hope you don't mind me sending you mail, but I'm a bit stuck...I've
> been told I need the Linaro 4.5.2 toolchain because it has some "neon
> optimizations" that the CS 4.5.1 doesn't have.
In general, you'd be better addressing these questions on the Linaro
Toolchain mailing list: linaro-toolchain(a)lists.linaro.org (I've copied
it in).
Not least because I'm on vacation for the next week. :)
> Unfortunately, the Linaro
> 4.5.2 that's available for download (already built) won't work in my
> Scratchbox environment, since it was compiled against a glibc that's too
> new. The CS 4.5.1 works fine -- but I'm not allowed to use it, because
> of the neon stuff.
The CS and Linaro compilers are really very similar, but CodeSourcery
has not made a release since the autumn, so Linaro will have some extra
features.
> Do you know whether CS actually does have (or will have) the same neon
> optimizations Linaro has?
It depends which optimizations you are referring to? The existing CS
release had the latest improvements at the time it was released, and I
believe that the upcoming release will probably be very similar to
Linaro (at least, with respect to ARMv7 - there'll be many differences
for other architecture variants), but I'm not promising that.
Sorry if that's a bit vague, but I the contents of the next CS release
is still not finalised.
> If it doesn't (and won't), then I'm going to have to build the Linaro
> one from source. Unfortunately, I've not been able to find any detailed
> information on how to go about doing that. Do you know if that's
> documented anywhere?
Are you talking about building native compiler, or a cross-compiler? The
former is very simple (provided you have all the dependencies), while
the latter is more involved.
Here's the recipe to build a native compiler:
tar xf gcc-linaro.....tar.bz2
mkdir objdir
cd objdir
../gcc-linaro....../configure --prefix=<your-install-path> <opts>
make bootstrap
make install
You can copy the configure <opts> from another compiler using 'gcc -v'
and './configure --help' in the source tree should tell you what they mean.
If you want to build a cross compiler, I suggest you look at crosstool
or crosstool-ng, or OpenEmbedded. Building cross-toolchains is non-trivial.
Hope that helps.
Andrew
== Last week ==
* Finished the patch that I was working on last week to use memory operands
rather than register operands in neon.md. Submitted upstream:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html
Among other things, this allows the intrinsics to use post-modified
addresses.
* Submitted patches to make the number of rtl generator arguments
(as opposed to insn operands) available to the expand-time code:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02227.htmlhttp://gcc.gnu.org/ml/gcc-patches/2011-03/msg02228.htmlhttp://gcc.gnu.org/ml/gcc-patches/2011-03/msg02229.html
This is part of the tree-rtl expansion "cleanups" that I've been
doing in preparation for the vectoriser work.
* More discussion about the handling of type modes vs. per-function
target switching. I've think we've agreed what the right approach is,
although it's probably outside the scope of this project. The discussion
was still useful because it meant I could submit & defend the next patch.
* Submitted a patch to use non-BLK modes for arrays of vectors
(like uint32x2x2_t & co. in arm_neon.h);
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02192.html
This avoids that stack spilling that was discussed during the week.
Richard Guenther seemed happy with the patch in principle, but
understandably wanted to see how the optabs stuff worked out first.
Also, the testcase he asked me to try exposed another instance of:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46329
so that needs to be fixed first.
* Started writing & testing a fix for that PR (46329).
== This week ==
* Finish fix for PR46329.
* More vld & vst stuff.
Richard
== Last week ==
* PR48250 / CS Issue #9845 / Launchpad #723185. Unaligned DImode reload
under NEON. Went back and forth with Richard Earnshaw on gcc-patches for
most of the week. The issues should finally be clear, and I think it
would be better to modify the significant parts of
arm_legitimize_reload_address() to do the right thing rather than just
fixing the bug. I have a new patch done over the weekend, though it
still shows a few regressions after some testing. I hope this gets done
by this week...
* PR48325 / Launchpad #744754, another NEON ICE in postreload. This
appears to be the IA/DB modes for VLDM/VSTM for NEON struct modes were
not enabled. This ICE actually does not happen currently on upstream
trunk, but sent patch anyways. Pending review.
* Spent some time on Launchpad #736661 (C++ ICE in expr.c), and looked
at upstream testsuite regressions of gcc.dg/pr17957.c and
gcc.dg/torture/pr47975.c under -mfpu=neon (ICE on OImode const0_rtx
assignment).
* Call with Ramana on ARM optimization work.
== This week ==
* Get PR48250/Launchpad #723185 nailed.
* Other pending GCC issues.
* TW Public Holiday, Mon. and Tue. (Apr.4-5)
The Bazaar team have been working on improving the performance of bzr
on the gcc-linaro tree. Here's how long the steps take on my machine
with the current 2.4 development version:
Update tip before branching:
bzr pull 20.4 s (no revisions)
Make the branch:
bzr branch --hardlink 4.5 optspace 26.8 s
Do some work and commit it:
...change two files
bzr status 1:05
(again) bzr status 1.7 s
bzr commit . 3.6 s
Push the changes up:
bzr push lp:~michaelh1/gcc-linaro/optspace 3:47 ~9 MB (~40 kB/s
which is saturating my uplink)
Later, the merge master pulls the branch down and merges:
bzr branch --no-tree lp:~michaelh1/gcc-linaro/optspace 36 s ~900 k
bzr merge ../optspace 3:26
The bzr status and bzr commit are quite good. I've asked them to look
into bzr merge.
-- Michael
Hi All,
After downloading linaro toolchain by apt-get in ubuntu, I compiled
the uboot for ARM1136 SoC with -march=armv5 option. And it can compile
successfully. Then I let the uboot run on target boards and system
failed due to "undefined instructions". Checked linaro toolchain
options, it is:
#arm-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.5.2-5ubuntu2~ppa1'
--with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.5 --enable-shared --enable-multiarch
--enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/arm-linux-gnueabi/include/c++/4.5.2
--libdir=/usr/lib --enable-nls --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin
--enable-gold --enable-ld=default --with-plugin-ld=ld.gold
--enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a
--with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb
--disable-werror --enable-checking=release
--program-prefix=arm-linux-gnueabi-
--includedir=/usr/arm-linux-gnueabi/include --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=arm-linux-gnueabi
--with-headers=/usr/arm-linux-gnueabi/include
--with-libs=/usr/arm-linux-gnueabi/lib
Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-5ubuntu2~ppa1)
The imporant options are "--with-arch=armv7-a --with-float=softfp
--with-fpu=vfpv3-d16". I just want to ask whether these options stop
arm-linux-gnueabi-gcc to support old arch? If so, according to gcc
documents at http://gcc.gnu.org/install/configure.html,
"
--with-cpu=cpu
--with-cpu-32=cpu
--with-cpu-64=cpu
Specify which cpu variant the compiler should generate code for by
default. cpu will be used as the default value of the -mcpu= switch.
This option is only supported on some targets, including ARM, i386,
M68k, PowerPC, and SPARC. The --with-cpu-32 and --with-cpu-64 options
specify separate default CPUs for 32-bit and 64-bit modes; these
options are only supported for i386, x86-64 and PowerPC.
--with-schedule=cpu
--with-arch=cpu
--with-arch-32=cpu
--with-arch-64=cpu
--with-tune=cpu
--with-tune-32=cpu
--with-tune-64=cpu
--with-abi=abi
--with-fpu=type
--with-float=type
These configure options provide default values for the
-mschedule=, -march=, -mtune=, -mabi=, and -mfpu= options and for
-mhard-float or -msoft-float. As with --with-cpu, which switches will
be accepted and acceptable values of the arguments depend on the
target.
"
There are only default values for later compiling. Users should be
able to swith to other values by setting other options. But why did
arm-linux-gnueabi-gcc still build "undefined instructions" to arm1136
with "arch=armv5"? In fact arm1136 is armv6.
Then i compiled a toolchain for linaro gcc-linaro-4.4-2011.02-0 codes
by myself, the options are simple:
#arm-none-linux-gnueabi-gcc -v
Using built-in specs.
Target: arm-none-linux-gnueabi
Configured with: ../gcc-linaro-4.4-2011.02-0/configure
--target=arm-none-linux-gnueabi
--prefix=/home/vmuser/development/toolchain/build-toolchain/tools
--enable-languages=c,c++ --disable-libgomp
Thread model: posix
gcc version 4.4.5 (Linaro GCC 4.4-2011.02-0)
Then I compiled uboot by this toolchain again, the uboot can work.
Then why can the toolchain compiled by myself support more arch? And
what performance is lost in my compiling?
Thanks
Barry
== GCC ==
Progress:
* Investigated excessive VFP moves . Partially investigating ways forward.
* Polished up my divmodsi4 patch. Discussed it during the call.
Looking for ways to do it properly at the tree level.
* Got Panda board on Friday.
* Off on Wednesday.
* Conversations with Revital and Chung-Lin. Need to sync up with
Andrew next week.
* Found an issue with binutils and Neon and this is now LP:747837
Plans:
* Continue looking at excessive VFP moves.
* Finish working through Thumb2 speed tickets.
* Set up new Panda board.
* Conversation with Andrew sometime this week.
Meetings:
* 1-1s
* Linaro toolchain meeting
Absences:
* April 15 – 26 -> Booked Holiday.
* May 9-14 - LDS Budapest
== GDB ==
* Committed patch to fix single-stepping across bad ARM/Thumb boundary
(bug #667309) to mainline and Linaro GDB.
* Committed patch to fix accessing "fpscr" register to mainline.
* Ongoing work to fix single-stepping over signal handlers (bug #615978).
Posted yet another updated patch to gdb-patches for comments.
* Implemented patch to support NEON registers in core files (bug #615972).
* Investigated failure to disable address space randomization
(bug #616001).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
== pandaboard ==
* noticed that hw perf events are not working on 2.6.38-1001-linaro-omap
* it seems that the omap kernel has not configured its PMU properly
* perf_event_open syscall returns ENODEV
* started discussion with agreen (#744458)
* noticed that natty puts its glibc into a multilib path
* prevents linaro gcc (and upstream) from being built
== libunwind ==
* created a generic and local variant of the extbl parser
* ran the test suite a few times using different unwind methods
* started to look into the test suite failures
* started to fix a couple of the failures on ARM
Regards
Ken
RAG:
Red:
Amber:
Green: the aircon has been fixed; blessed quiet again
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
== maintain-beagle-models ==
* the board-ram-limits patchset has been expanded significantly to
address upstream suggestions; it now includes a lot of refactoring
of sun4m (sparc) board code to use the new generic max-ram
functionality instead of a sun4m-specific bit of code. Unfortunately
there is still some pushback upstream on the grounds that a simple
max-ram limit doesn't cater for complicated NUMA situations :-(
== merge-correctness-fixes ==
* working on moving implementation of VLD/VST "multiple structures" forms
into qemu helper functions; the current implementation is correct but
can expand to hundreds of TCG ops which is well beyond the maximum
permitted value, so could potentially overrun a TCG buffer
== other ==
* wrote up some technical/engineering input into what we ought to be
doing with qemu next cycle
* review of a patch by Dmitry Eremin-Solenikov adding ARMv4/v4T support
* some review of s390 TCG patches (not because we have a direct interest
in s390 but as part of being a good citizen upstream)
* sent a pull request for some neon patches that had been on the list
a few weeks; hopefully this will help drain the patch pipeline
* meetings: toolchain, standup, pdsw-tools
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hello,
* Submitted merge requests for SMS patch to gcc-linaro and gcc-linaro/4.6.
* Testing SMS patch which extends the current implementation to
consider loops that contain
instructions with REG_INC_NOTE.
* Filed PRs 48336 48380 for recent fails of trunk on ARM.
* Had a chat with Ramana about the DENbench benchmarks, directions and findings.
* Filed PR 745743 in linaro gcc-bugzilla
Thanks,
Revital
Hi,
* continued bringing patches upstream
- auto-detection of vector size - committed
- changing default vector size to 128 - submitted and testing the
final version
- if-conversion improvement - submitted and now testing the final version
* gcc-linaro-4.6
- submitted a merge request for store sink patch (this patch is
already upstream)
Ira
For reference. We know that the NEON intrinsics in GCC have issues.
I came across this page:
http://hilbert-space.de/?p=22
which has a colour to greyscale conversion done using intrinsics.
gcc-linaro-4.5-2011.03-0 does poorly through saving intermediate
values on the stack. The core of the loop is:
.L3:
mov ip, r4
vld3.8 {d16-d18}, [r6]
vstmia r4, {d16-d18}
ldmia ip!, {r0, r1, r2, r3}
mov sl, r9
adds r7, r7, #1
adds r6, r6, #24
stmia sl!, {r0, r1, r2, r3}
fldd d16, [sp, #24]
fldd d18, [sp, #32]
ldmia ip, {r0, r1}
vmull.u8 q8, d16, d19
stmia sl, {r0, r1}
vmlal.u8 q8, d18, d20
fldd d18, [sp, #40]
vmlal.u8 q8, d18, d21
vshrn.i16 d16, q8, #8
vst1.8 {d16}, [r5]
adds r5, r5, #8
cmp r8, r7
bgt .L3
llvm-2.9~svn128540 does much better:
vld3.8 {d20, d21, d22}, [r1]!
add r3, r3, #1
cmp r3, r2
vmull.u8 q12, d21, d16
vmlal.u8 q12, d20, d17
vmlal.u8 q12, d22, d18
vshrn.i16 d19, q12, #8
vst1.8 {d19}, [r0]!
blt .LBB0_1
and may actually be better than the had-written assembler on Nils's
page due to scheduling the loop comparison earlier.
Richard S, were you looking into this?
-- Michael
Hi there. A reminder that today's call has shifted due to the
European daylight savings change. It's now at 0800 UTC which is 9 am
in the UK, 10 am in central Europe, and 10 am in Israel.
-- Michael
== Last week ==
* PR46934: Thumb-1 ICE, small fix in the "casesi" jump-table expand
code. Quickly approved and committed upstream.
* Enhance XOR patch for gcc/simplify-rtx.c. Updated comments and
committed upstream.
* PR48250 / CS Issue #9845 / Launchpad #723185. Unaligned DImode reload
under NEON. Submitted patch upstream, but still need to do some more
verification that older pre-ARMv5TE cases are safe. Should complete this
week.
* Working on a type of ICE seen currently on upstream trunk, a few
testcases failing under '-O3 -g'. It seems VTA related, but also might
have something to do with register elimination not fully done for
(var_location (entry_value ...)) expressions, leaving [afp+#num] memory
addresses existing in debug insns after reload. Still investigating.
* Launchpad #689887, ICE in get_arm_condition_code(). Pushed a merge
request to Linaro 4.5 for this patch. Also another LP#742961 appeared as
another case of this ICE...
* Still working on (what I think should be) the last of the CoreMark
ARMv6 regressions. The problem is to combine uxtb+cmp into ands #255.
This could be done by adding (set (cc) (compare (zero_extend...)))
patterns, implemented by ands assembly, but still looking if this can be
done (probably more elegantly) by something like CANONICALIZE_COMPARISON
(replacing compare operands) in the ARM backend.
* Launchpad #736007, ICE immed_double_const under -mfpu=neon -g. Some
discussion on gcc-patches about this, still unclear on what should be
done...
== This week ==
* Push forward on above issues.
Committed Dan's RVCT interoperation patch, both upstream and to Linaro
GCC 4.6.
Adjusted Benrd's "Discourage NEON on Cortex-A8" patch following Richard
Earnshaw's comments, and reposted upstream. The new version was
approved, and committed. I've also submitted a merge proposal to Linaro
GCC 4.6.
Dropped Tom's patch for marking smalls strings read-only. This
optimization seems to have no visible effect for ARM in GCC 4.6. I'll
leave it it to Tom to forward-port, if it's still meaningful for MIPS.
Julian has committed the patch for lp:675347, so I've submitted merge
requests to both Linaro GCC 4.5 and 4.6.
Bernd has posted the shrink wrapping patches upstream. I've posted this
info in all the relevant Linaro tracking tickets.
Talked Revital Eres through the Bazaar/Launchpad merge request system.
Tried to understand why GCC 4.6 does not use multiply-and-accumulate
efficiently, when used with 64-bit values. It seems that the compiler
sometimes uses (subreg:SI (reg:DI ...)) and sometimes just uses a plain
(reg:SI ..) and those don't combine to give useful patterns, but I
haven't got to the bottom of it yet.
Tested an FSF GCC 4.6 snapshot from the 23rd. All well, so I've merged
it to the Linaro GCC 4.6 branch.
* Future Absence
Away Monday 28th to Friday 1st April.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
Hi,
== libunwind ==
* modified the extbtl-parser to operate on the DWARF model directly
* this adds support for unwinding call stacks with mixed (DWARF and extbl)
frames on ARM
* did a few other fixes and cleanups
* posted the patches on the libunwind ml
* set up a tree on git.linaro.org
* attended a class on friday
Regards
Ken
== GDB ==
* Completed glibc patch to add ARM unwind tables to system call stubs
(bug #684218), patch committed upstream and backported to Ubuntu glibc.
* Posted kernel patch to fixes GDB inferior calls while stopped in a
restartable system call (bug #615974); waiting for review.
* Ongoing work to fix single-stepping over signal handlers (bug #615978).
* Implemented patch to fix single-stepping across bad ARM/Thumb boundary
(bug #667309); posted to mailing list for comments.
* Contributed two fixes for valgrind on ARM (to enable running GDB under
valgrind); both now accepted mainline.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== This week ==
* Moved the discussion about the RTL and gimple representation of
strided loads/stores to the gcc@ list. Got some good feedback:
http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
* Started a subdiscussion about the handling of modes:
http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
This is a tricky one. I'll add more fuel to the fire next week.
* Committed two GCC patches to clean up the expand interface.
Dealt with the fallout (some expected, but unfortunately some not).
* Submitted two of the patches to improve code generation for
strided load/store intrinsics:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01631.htmlhttp://gcc.gnu.org/ml/gcc-patches/2011-03/msg01634.html
* Spent a lot of the week reworking the way the load/store intrinsics
are handled, to fix both correctness and performance bugs. The new
rtl patterns should have the right form for the vectoriser.
Made what feels like good progress, but it's not complete yet.
* Sent separate R_ARM_IRELATIVE patch to glibc, after feedback from
glibc-ports.
* Booked flight and hotel for Budapest summit.
* Pinged unreviewed patches.
== Next week ==
* More intrinsics improvements. I think these are necessary to get good
code out of the vectoriser too.
Richard
== String routines ==
* Wrote a thumb optimised strchr
- As expected it's got nice performance for longer runs but at
sizes <16 bytes it's slower, and a lot of the strchr
calls are very short, so it's probably not of benefit in most cases
( https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr?ac…
)
* Wrote a neon-memcpy
- As previously found with memset, it performs well on A8 but
poorly on A9 - it does however do the case where
the source/destination isn't aligned quite well even on A9 ; the vld1
unaligned case works with relatively little penalty.
(it performs comparably to the Bionic implementation - mine is a
bit faster on shorter calls, Bionic is better
on longer uses - I think that's because they've got some careful use
of preloads where I have so far got none).
I'm on holiday up to and including 5th April.
Dave
== GCC ==
Progress:
* Investigated excessive VFP moves . Investigating ways forward.
* Went through some of the test results with 4.6 RC2 upstream - looking
through test results etc.
* Setup SPEC2k6 cross on my Linaro machine.
* Waiting for my new Panda board sometime next week.
* Some small bug fixes upstream. Need to rework a couple of
documentation patches after review.
Plans:
* Continue looking at excessive VFP moves.
* Continue to look at some patches upstream.
* Finish working through Thumb2 speed tickets.
* Set up new Panda board.
* Start looking at DENBench results and identify
potential speed up areas.
Meetings:
* 1-1s
* Linaro toolchain meeting
Absences:
* March 30th (maybe): WC Cricket Semi-final. (Ind v Pak)
* April 15 – 26 -> Booked Holiday.
* May 9-14 - LDS Budapest
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
== maintain-beagle-models ==
* benchmarking/testing of the TCG locking fix: oddly benchmarks
seem to come out with less slowdown (1% or less) than a system
mode bootup/shutdown. (I used scimark and dhrystone. scimark
is the same speed, which is to be expected because we spend all
our time doing floating point emulation. I was expecting a bigger
perf hit on dhrystone, though.)
* submitted patches to make qemu fail cleanly if you ask for more RAM
than a board supports
== merge-correctness-fixes ==
* tested the Neon element load/store instructions; wrote patches to
fix UNDEF handling (which are blocked waiting for the patch pipeline
to be drained) and confirmed there aren't any other bugs.
There is a meego patch to use helper functions for multi-element
load/store which is apparently to avoid overflowing a TCG buffer:
need to test and upstream this.
* investigated android qemu tree for any missing correctness fixes:
looking through the changelog I think we have fixes upstream for
everything that was fixed in the android tree.
== other ==
* patch: fix versatilepb/realview handling of multiple nic options
* patch: better diagnosis of more nics requested than board supports
[this is needed to get the vexpress patch committed]
* reviewed a patch to add ARMv4/v4T support to qemu
(mostly consists of making sure we UNDEF in the right places)
* meetings: toolchain, standup, pdsw-tools, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hello,
Implemented a patch to apply SMS in the presence of instructions with
REG_INC_NOTE. (this occurs in telecom/autocor thus SMS needs to be run
with -fno-auto-inc-dec
flag to be applied)
Sent a merge request to gcc-linaro for the SMS patches.
Thanks to Andrew Stubbs for his help.
https://code.launchpad.net/~eres/gcc-linaro/SMS_doloop_for_ARM
I intend to send a request to gcc-linaro.4.6 as well.
Thanks,
Revital
Hi,
* resubmitted and committed store sink patch to trunk, I'll commit it
to gcc-linaro-4.6 next week
* submitted autodetection of vector size patch to gcc-patches, I'l
commit it next week
* started testing a patch that makes mvectorize-with-neon-quad the default
* DenBench: found some more cases where vectorization of strided
accesses using vzip/vuzp causes degradation. Since Richard is making a
lot of progress with vlsd/vst, I think it doesn't make sense to spend
too much time on vzip/vuzp, and I am going to run DenBench without
this patch.
Ira
Philipp Kern <trash(a)philkern.de> writes:
> On 2011-03-23, Goswin von Brederlow <goswin-v-b(a)web.de> wrote:
>> Also does the testing transition consider the Built-Using? If I specify
>> 'Built-Using: gcc-4.5 (= 4.5.2-5)' will the package be blocked from
>> entering testing until gcc-4.5 (= 4.5.2-5) has entered and block gcc-4.5
>> (= 4.5.2-5) from being replaced from testing?
>
> It doesn't need to. All we want is compliance on the archive side so that the
> sources are not expired away, as long as that binary is carried in a suite.
> No need to involve britney at that point.
>
> Kind regards
> Philipp Kern
Not quite. For ia32-libs it would be nice if ia32-libs could be blocked
from testing as long as the source packages it includes aren't in
testing. Currently that is solved by building against testing in the
first palce. But that is something we can live with.
As a side note the debian-cd package needs to also consider Built-Using
when creating source images. Will the Sources.gz file list multiple
entries for a source if multiple versions are relevant?
MfG
Goswin
Hi,
2009/11/2 Mark Hymers <mhy(a)debian.org>:
> On Mon, 02, Nov, 2009 at 12:43:42PM +0000, Philipp Kern spoke thus..
>> Of course it is a sane approach but very special care needs to be taken when
>> releasing to ensure GPL compliance. So what we should get is support in the
>> toolchain to declare against what source package the upload was built to
>> keep that around.
> We haven't implemented that yet for the archive software but it's on the
> TODO list (and not that difficult). None of us have had time to do the
> d-d-a post from the ftpteam meeting yet, but I'll make sure information
> is in there about it.
>
> I'm hoping to the archive-side support done in the next week or so.
Squeeze has already been released, cross toolchains were not released
along Debian main, but found at Emdebian repository.
Marcin Juszkiewicz has been working out cross compiler packages for
armel as part of his work for Linaro, which I attempt to include into
Debian main archive. As a result of the work done, linux-armel,
binutils-armel, eglibc-armel are merged into a single source package
named `cross-toolchain-base', the package is not optimal, but once we
got multiarch support, it should be renamed to `binutils-armel' (or
similar name) and use linux and eglibc libraries and headers provided
by multiarch.
Along this package I also plan to upload `gcc-4.5-cross' (#590465).
At the moment we are targeting one target architecture on two build
hosts ('{amd64,i386}->armel'), not sure if it is desired to be
supported on more build hosts. Target architecture support might grow
up in future, but right now it is not a priority.
Not sure if that is an issue for someone? Comments?
Best regards,
--
Héctor Orón
"Our Sun unleashes tremendous flares expelling hot gas into the Solar
System, which one day will disconnect us."
-- Day DVB-T stop working nicely
Video flare: http://antwrp.gsfc.nasa.gov/apod/ap100510.html
== Last week ==
* Committed STT_GNU_IFUNC changes to binutils.
* Submitted the STT_GNU_IFUNC changes to GLIBC ports. Got feedback
on Friday, which I'll deal with this week.
* Worked on the expand and rtl-level parts of the load/store lane
representation, with new optabs for each operation. This seems
to be working pretty well, but I still need to make some changes
to the way the existing intrinsics work.
* Wrote a patch to clean up the way we handle optabs during expand,
so that the new optabs mentioned above will need a bit less
cut-&-paste. Submitted upstream. Got some positive feedback.
* Committed testcase for PR rtl-optimization/47166 upstream.
== This week ==
* Deal with GLIBC feedback.
* More load/store lanes.
Richard
* Linaro GCC
Tested and merged both the latest Linaro merge requests, and various bug
fixes to the Shrink Wrap optimization from CS, into Linaro GCC 4.5.
Merged and tested from FSF GCC 4.6.
Richard and Ramana have approved some of my upstream patches! I just
need to wait for stage one so I can commit them upstream. I'll commit
them internally when I get time to do the final integration test.
Continued benchmarking GCC 4.6 with the patches merged from GCC 4.5.
Decided to discard a couple of extra patches since they don't appear to
be of any value.
* Other
On leave Wednesday to Friday playing daddy. :)
* Future Absence
Away Monday 28th to Friday 1st April.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
Hey
I'm trying to extend the *link: specs to pass a different
-dynamic-linker depending on the float ABI. But I didn't manage to
build a construct which would preserve the order of the flags; if I do
something like:
%{msoft-float:-dynamic-linker V1} %{mfloat-abi=softfp:-dynamic-linker V2}
Then I get V2 for "-mfloat-abi=softfp -msoft-float" instead of V1.
In gcc/gcc.c I found some docs on spec file syntax; I see one can use
%{S*&T*} and %{S*:X}, but apparently %{S*&T*:X} isn't allowed, so I
can't manipulate the value. I tried to use
%{msoft-float*:-dynamic-linker V1} %{mfloat-abi=softfp*:-dynamic-linker V2}
but that gives the same effect (the msoft-float flags are
grouped together in the original order and put first, then the
mfloat-abi=softfp are grouped together in the original order and put
second).
I didn't manage to get %{msoft-float*:%<msoft-float -dynamic-linker V1}
to work; in fact I didn't get supressions to work.
Any idea?
Thanks!
PS: float-abit=softfp/soft-float are just convenient examples; the
actual target is to use different -dynamic-linker for hard vs soft
float-abi
--
Loïc Minier
I went to the first QEMU Users Forum in Grenoble last week;
this is my impressions and summary of what happened. Sorry if
it's a bit TLDR...
== Summary and general observations ==
This was a day long set of talks tacked onto the end of the DATE
conference. There were about 40 attendees; the focus of the talks was
mostly industrial and academic research QEMU users/hackers (a set of
people who use and modify QEMU but who aren't very well represented on
the qemu-devel list).
A lot of the talks related to SystemC; at the moment people are
rolling their own SystemC<->QEMU bridges. In addition to the usual
problems when you try to put two simulation engines together (each of
which thinks it should be in control of the world) QEMU doesn't make
this easy because it is not very modular and makes the assumption that
only one QEMU exists in a process (lots of global variables, no
locking, etc).
There was a general perception from attendees that QEMU "development
community" is biased towards KVM rather than TCG. I tend to agree with
this, but think this is simply because (a) that's where the bulk of
the contributors are and (b) people doing TCG related work don't
always appear on the mailing list. (The "quick throwaway prototype"
approach often used for research doesn't really mesh well with
upstream's desire for solid long-term maintainable code, I guess.)
QEMU could certainly be made more convenient for this group of users:
greater modularisation and provision of "just the instruction set
simulator" as a pluggable library, for instance. Also the work by
STMicroelectronics on tracing/instrumentation plugins looks like
it should be useful to reduce the need to hack extra instrumentation
directly into QEMU's frontends.
People generally seemed to think the forum was useful, but it hasn't
been decided yet whether to repeat it next year, or perhaps to have
some sort of joint event with the open-source qemu community.
More detailed notes on each of the talks are below;
the proceedings/slides should also appear at http://adt.cs.upb.de/quf
within a few weeks. Of particular Linaro/ARM interest are:
* the STMicroelectronics plugin framework so your DLL can get
callbacks on interesting events and/or insert tracing or
instrumentation into generated code
* Nokia's work on getting useful timing/power type estimates out of
QEMU by measuring key events (insn exec, cache miss, TLB miss, etc)
and calibrating against real hardware to see how to weight these
* a talk on parallelising QEMU, ie "multicore on multicore"
* speeding up Neon by adding SIMD IR ops and translating to SSE
The forum started with a brief introduction by the organiser, followed
by an informal Q&A session with Nathan Froyd from CodeSourcery
(...since his laptop with his presentation slides had died on the
journey over from the US...)
== Talk 1: QEMU and SystemC ==
M. Monton from GreenSocs presented a couple of approaches to using
QEMU with SystemC. "QEMU-SC" is for systems which are mostly QEMU
based with one or two SystemC devices -- QEMU is the master. Speed
penalty is 8-14% over implementing the device natively. "QBox" makes
the SystemC simulation the master, and QEMU is implemented as a TLM2
Initiator; this works for systems which are almost all SystemC and
which you just want to add a QEMU core to. Speed penalty 100% (!)
although they suspect this is an artifact of the current
implementation and could be reduced to more like 25-30%. They'd like
to see a unified effort to do SystemC and QEMU integration (you'll
note that there are several talks here where the presenters had rolled
their own integration). Source available from www.greensocs.com.
== Talk 2: Combined Use of Dynamic Binary Translation and
SystemC for Fast and Accurate MPSoc Simulation ==
Description of a system where QEMU is used as the core model in a
SystemC simulation of a multiprocessor ARM system. The SystemC side
includes models of caches, write buffers and so on; this looked like
quite a low level detailed (high overhead) simulation. They simulate
multiple clusters of multiple cores, which is tricky with QEMU because
it has a design assumption of only one QEMU per process address space
(lots of global variables, no locking, etc); they handle this by
saving and restoring globals at SystemC synchronisation points, which
sounded rather hacky to me. They get timing information out of their
model by annotating the TCG intermediate representation ops with new
ops indicating number of cycles used, whether to check for
Icache/Dcache hit/miss, and so on. Clearly they've put a lot of work
into this. They'd like a standalone, reentrant ISS, basically so it's
easier to plug into other frameworks like SystemC.
== Talk 3: QEMU/SystemC Cosimulation at Different Abstraction Levels ==
This talk was about modelling an RTOS in SystemC; I have to say I
didn't really understand the motivation for doing this. Rather than
running an RTOS under emulation, they have a SystemC component which
provides the scheduler/mutex type APIs an RTOS would, and then model
RTOS tasks as other SystemC components. Some of these SystemC
components embed user-mode QEMU, so you can have a combination of
native and target-binare RTOS tasks. They're estimating time usage by
annotating QEMU translation blocks (but not doing any accounting for
cache effects).
== Talk 4: Timing Aspects in QEMU/SystemC Synchronisation ==
Slightly academic-feeling talk about how to handle the problem of
trying to run several separate simulations in parallel and keep their
timing in sync. (In particular, QEMU and a SystemC world.) If you just
alternate running each simulation there is no problem but it's not
making best use of the host CPU. If you run them in parallel you can
have the problem that sim A wants to send an event to sim B at time T,
but sim B has already run past time T. He described a couple of
possible approaches, but they were all "if you do this you might still
hit the problem but there's a tunable parameter to reduce the
probability of something going wrong"; also they only actually
implemented the simplest one. In some sense this is really all
workarounds for the fact that SystemC is being retrofitted/bolted
onto the outside of a QEMU simulation.
== Talk 5: Program Instrumentation with QEMU ==
Presentation by STMicroelectronics, about work they'd done adding
instrumentation to QEMU so you can use it for execution trace
generation, performance analysis, and profiling-driven optimisation
when compiling. It's basically a plugin architecture so you can
register hooks to be called at various interesting points (eg every
time a TB is executed); there are also translation time hooks so
plugins can insert extra code into the IR stream. Because it works at
the IR level it's CPU-agnostic. They've used this to do real work
like optimising/debugging of the Adobe Flash JIT for ARM. They're
hoping to be able to submit this upstream.
I liked this; I think it's a reasonably maintainable approach, and it
ought to alleviate the need for hacking extra ops directly into QEMU
for instrumentation (which is the approach you see in some of the
other presentations). In particular it ought to work well with the
Nokia work described in the next talk...
== Talk 6: Using QEMU in Timing Estimation for Mobile Software
Development ==
Work by Nokia's research division and Aalto university. This was
about getting useful timing estimates out of a QEMU model by adding
some instrumentation (instructions executed, cache misses, etc) and
then calibrating against real hardware to identify what weightings to
apply to each of these (weightings differ for different cores/devices;
eg on A8 your estimates are very poor if you don't account for L2
cache misses, but for some other cores TLB misses are more important
and adding L2 cache miss instrumentation gives only a small
improvement in accuracy.) The cache model is not a proper functional
cache model, it's just enough to be able to give cache hit/miss stats.
They reckon that three or four key statistics (cache miss, TLB miss, a
basic classification of insns into slow or fast) give estimated
execution times with about 10% level of inaccuracy; the claim was that
this is "feasible for practical usage". Git tree available.
This would be useful in conjunction with the STMicroelectronics
instrumentation plugin work; alternatively it might be interesting
to do this as a Valgrind plugin, since Valgrind has much more
mature support for arbitrary plugins. (Of course as a Valgrind
plugin you'd be restricted to running on an ARM host, and you're
only measuring one process, not whole-system effects.)
== Talk 7: QEMU in Digital Preservation Strategies ==
A less technical talk from a researcher who's working on the problems
of how museums should deal with preserving and conserving "digital
artifacts" (OSes, applications, games). There are a lot of reasons
why "just run natively" becomes infeasible: media decay, the connector
conspiracy, old and dying hardware, APIs and environments becoming
unsupported, proprietary file formats and on and on. If you emulate
hardware (with QEMU) then you only have to deal with emulating a few
(tens of) hardware platforms, rather than hundreds of operating
systems or thousands of file formats, so it's the most practical
approach. They're working on web interfaces for non-technical users.
Most interesting for the QEMU dev community is that they're
effectively building up a large set of regression tests (ie images of
old OSes and applications) which they are going to be able to run
automatic testing on.
== Talk 8: MARSS-x86: QEMU-based Micro-Architectural and Systems
Simulator for x86 Multicore Processors ==
This is about using QEMU for microarchitectural level modelling
(branch predictor, load/store unit, etc); their target audience is
academic researchers. There's an existing x86 pipeline level simulator
(PLTsim) but it has problems: it uses Xen for its system simulation so
it's hard to get installed (need a custom kernel on the host!), and it
doesn't cope with multicore. So they've basically taken PLTsim's
pipeline model and ported it into the QEMU system emulation
environment. When enabled it replaces the TCG dynamic translation
implementation; since the core state is stored in the same structures
it is possible to "fast forward" a simulation running under TCG and
then switch to "full microarchitecture simulation" for the interesting
parts of a benchmark. They get 200-400KIPS.
== Talk 9: Showing and Debugging Haiku with QEMU ==
Haiku is an x86 OS inspired by BeOS. The speaker talked about how they
use QEMU for demos and also for kernel and bootloader debugging.
== Talk 10: PQEMU : A parallel system emulator based on QEMU ==
This was a group from a Taiwan university who were essentially
claiming to have solved the "multicore on multicore" problem, so you
can run a simulated MPx4 ARM core on a quad-core x86 box and have it
actually use all the cores. They had some benchmarking graphs which
indicated that you do indeed get ~3.x times speedup over emulated
single-core, ie your scaling gain isn't swamped in locking overhead.
However, the presentation concentrated on the locking required for
code generation (which is in my opinion the easy part) and I wasn't really
convinced that they'd actually solved all the hard problems in getting
the whole system to be multithreaded. ("It only crashes once every
hundred runs...") Also their work is based on QEMU 0.12, which is now
quite old. We should definitely have a look at the source which they
hope to make available in a few months.
== Talk 11: PRoot: A Step Forward for QEMU User-Mode ==
STMicroelectronics again, presenting an alternative to the usual
"chroot plus binfmt_misc" approach for running target binaries
seamlessly under qemu's linux-user mode. It's a wrapper around qemu
which uses ptrace to intercept the syscalls qemu makes to the host; in
particular it can add the target-directory prefix to all filesystem
access syscalls, and can turn an attempt to exec "/bin/ls" into an
exec of "qemu-linux-arm /bin/ls". The advantage over chroot is that
it's more flexible and doesn't need root access to set up. They didn't
give figures for how much overhead the syscall interception adds,
though.
== Talk 12: QEMU TCG Enhancements for Speeding up Emulation of SIMD ==
Simple idea -- make emulation of Neon instructions faster by adding
some new SIMD IR ops and then implementing them with SSE instructions
in the x86 backend. Some basic benchmarking shows that they can be ten
times faster this way. Issues:
* what is the best set of "generic" SIMD ops to add to the QEMU IR?
* is making Neon faster the best use of resource for speeding up
QEMU overall, or should we be looking at parallelism or other
problems first?
* are there nasty edge cases (flags, corner case input values etc)
which would be a pain to handle?
Interesting, though, and I think it takes the right general approach
(ie not horrifically Neon specific). My feeling is that for this to go
upstream it would need uses in two different QEMU front ends (to
demonstrate that the ops are generic) and implementations in at least
the x86 backend, plus fallback code so backends need not implement the
ops; that's a fair bit of work beyond what they've currently
implemented.
== Talk 13: A SysML-based Framework with QEMU-SystemC Code Generation ==
This was the last talk, and the speaker ran through it very fast as we
were running out of time. They have a code generator for taking a UML
description of a device and turning it into SystemC (for VHDL) and C++
(for a QEMU device) and then cosimulating them for verification.
-- PMM
Hello list,
Recently, Android team is working on integrating Linaro toolchain for
Android and NDK. According to the initial benchmark results[1],
Linaro GCC is competitive comparing to Google toolchain. In the
meanwhile, we are trying to enable gcc-4.5 specific features such as
Graphite and LTO (Link Time Optimization) in order to make the best
choice for Android build system and NDK usage. However, I encountered
a problem about LTO and would like to ask help from toolchain WG.
Assuming Linaro Toolchain for Android is installed in directory
/tmp/android-toolchain-eabi, you can obtain Google's toolchain
benchmark suite by git:
# git clone git://android.git.kernel.org/toolchain/benchmark.git
You have to apply the attached patch in order to make benchmark suite
work[2]. Then, change directory to skia:
# cd benchmark/skia
And build skia bench with LTO enabled:
# ../scripts/bench.py --action=build
--toolchain=/tmp/android-toolchain-eabi --add_cflags="-flto
-user-linker-plugin"
The build process would be interrupted by gcc:
make -j4 --warn-undefined-variables -f ../scripts/build/main.mk
TOOLCHAIN=/tmp/android-toolchain-eabi ADD_CFLAGS="-flto
-user-linker-plugin" build
CPP ARM obj/src/core/Sk64.o <= src/src/core/Sk64.cpp
CPP ARM obj/src/core/SkAlphaRuns.o <= src/src/core/SkAlphaRuns.cpp
CPP ARM obj/src/core/SkBitmap.o <= src/src/core/SkBitmap.cpp
CPP ARM obj/src/core/SkBitmapProcShader.o <= src/src/core/SkBitmapProcShader.cpp
CPP ARM obj/src/core/SkBitmapProcState.o <= src/src/core/SkBitmapProcState.cpp
CPP ARM obj/src/core/SkBitmapProcState_matrixProcs.o <=
src/src/core/SkBitmapProcState_matrixProcs.cpp
src/src/core/SkBitmapProcShader.cpp: In function
'SkShader::CreateBitmapShader(SkBitmap const&, SkShader::TileMode,
SkShader::TileMode, void*, unsigned int)':
src/src/core/SkBitmapProcShader.cpp:243:13: warning: 'color' may be
used uninitialized in this function
CPP ARM obj/src/core/SkBitmapSampler.o <= src/src/core/SkBitmapSampler.cpp
src/src/core/SkBitmapProcState_matrixProcs.cpp:530:1: sorry,
unimplemented: gimple bytecode streams do not support machine specific
builtin functions on this target
...
However, I can get other bench items passed such as cximage, gcstone,
gnugo, mpeg4, webkit, and python.
Can anyone give me some hints to resolve LTO problem? Thanks in advance.
Sincerely,
-jserv
[1] https://wiki.linaro.org/Platform/Android/Toolchain#Reference%20Benchmark
We use the same toolchain benchmark suite as Google compiler team took.
[2] https://wiki.linaro.org/Platform/Android/UpstreamToolchain
== Last week ==
* CoreMark ARMv6/v7 regressions: posted another combine patch upstream,
which was quickly approved and committed. The XOR simplification one is
now approved too, but needs a little more revising of comments before
committing.
* The above two patches now bring CoreMark under -march=armv7-a to very
close of the performance of -march=armv5te. However, a regression where
uxtb+cmp cannot be combined into 'ands ... #255' still causes v7 to lose
slightly. This should be the final issue to solve...
* Launchpad #736007/GCC Bugzilla PR48183: NEON ICE in
emit-rtl.c:immed_double_const() under -g. Posted patch upstream, but
looks like more discussion is needed before we know if this is the
"right" way to do it.
* Launchpad #736661, armel FTBFS (G++ ICE in expand_expr_real_1()).
Looking at this.
* Pinged a few upstream patch submissions.
== This week ==
* Launchpad #723185/CS issue #9845 now assigned to me, start looking at
this.
* Get the XOR patch committed upstream, and the above described uxtb+cmp
issue solved.
* Work on other GCC issues.