== GDB ==
* Completed glibc patch to add ARM unwind tables to system call stubs
(bug #684218), patch committed upstream and backported to Ubuntu glibc.
* Posted kernel patch to fixes GDB inferior calls while stopped in a
restartable system call (bug #615974); waiting for review.
* Ongoing work to fix single-stepping over signal handlers (bug #615978).
* Implemented patch to fix single-stepping across bad ARM/Thumb boundary
(bug #667309); posted to mailing list for comments.
* Contributed two fixes for valgrind on ARM (to enable running GDB under
valgrind); both now accepted mainline.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== This week ==
* Moved the discussion about the RTL and gimple representation of
strided loads/stores to the gcc@ list. Got some good feedback:
http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
* Started a subdiscussion about the handling of modes:
http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
This is a tricky one. I'll add more fuel to the fire next week.
* Committed two GCC patches to clean up the expand interface.
Dealt with the fallout (some expected, but unfortunately some not).
* Submitted two of the patches to improve code generation for
strided load/store intrinsics:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01631.htmlhttp://gcc.gnu.org/ml/gcc-patches/2011-03/msg01634.html
* Spent a lot of the week reworking the way the load/store intrinsics
are handled, to fix both correctness and performance bugs. The new
rtl patterns should have the right form for the vectoriser.
Made what feels like good progress, but it's not complete yet.
* Sent separate R_ARM_IRELATIVE patch to glibc, after feedback from
glibc-ports.
* Booked flight and hotel for Budapest summit.
* Pinged unreviewed patches.
== Next week ==
* More intrinsics improvements. I think these are necessary to get good
code out of the vectoriser too.
Richard
== String routines ==
* Wrote a thumb optimised strchr
- As expected it's got nice performance for longer runs but at
sizes <16 bytes it's slower, and a lot of the strchr
calls are very short, so it's probably not of benefit in most cases
( https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr?ac…
)
* Wrote a neon-memcpy
- As previously found with memset, it performs well on A8 but
poorly on A9 - it does however do the case where
the source/destination isn't aligned quite well even on A9 ; the vld1
unaligned case works with relatively little penalty.
(it performs comparably to the Bionic implementation - mine is a
bit faster on shorter calls, Bionic is better
on longer uses - I think that's because they've got some careful use
of preloads where I have so far got none).
I'm on holiday up to and including 5th April.
Dave
== GCC ==
Progress:
* Investigated excessive VFP moves . Investigating ways forward.
* Went through some of the test results with 4.6 RC2 upstream - looking
through test results etc.
* Setup SPEC2k6 cross on my Linaro machine.
* Waiting for my new Panda board sometime next week.
* Some small bug fixes upstream. Need to rework a couple of
documentation patches after review.
Plans:
* Continue looking at excessive VFP moves.
* Continue to look at some patches upstream.
* Finish working through Thumb2 speed tickets.
* Set up new Panda board.
* Start looking at DENBench results and identify
potential speed up areas.
Meetings:
* 1-1s
* Linaro toolchain meeting
Absences:
* March 30th (maybe): WC Cricket Semi-final. (Ind v Pak)
* April 15 – 26 -> Booked Holiday.
* May 9-14 - LDS Budapest
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
== maintain-beagle-models ==
* benchmarking/testing of the TCG locking fix: oddly benchmarks
seem to come out with less slowdown (1% or less) than a system
mode bootup/shutdown. (I used scimark and dhrystone. scimark
is the same speed, which is to be expected because we spend all
our time doing floating point emulation. I was expecting a bigger
perf hit on dhrystone, though.)
* submitted patches to make qemu fail cleanly if you ask for more RAM
than a board supports
== merge-correctness-fixes ==
* tested the Neon element load/store instructions; wrote patches to
fix UNDEF handling (which are blocked waiting for the patch pipeline
to be drained) and confirmed there aren't any other bugs.
There is a meego patch to use helper functions for multi-element
load/store which is apparently to avoid overflowing a TCG buffer:
need to test and upstream this.
* investigated android qemu tree for any missing correctness fixes:
looking through the changelog I think we have fixes upstream for
everything that was fixed in the android tree.
== other ==
* patch: fix versatilepb/realview handling of multiple nic options
* patch: better diagnosis of more nics requested than board supports
[this is needed to get the vexpress patch committed]
* reviewed a patch to add ARMv4/v4T support to qemu
(mostly consists of making sure we UNDEF in the right places)
* meetings: toolchain, standup, pdsw-tools, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hello,
Implemented a patch to apply SMS in the presence of instructions with
REG_INC_NOTE. (this occurs in telecom/autocor thus SMS needs to be run
with -fno-auto-inc-dec
flag to be applied)
Sent a merge request to gcc-linaro for the SMS patches.
Thanks to Andrew Stubbs for his help.
https://code.launchpad.net/~eres/gcc-linaro/SMS_doloop_for_ARM
I intend to send a request to gcc-linaro.4.6 as well.
Thanks,
Revital
Hi,
* resubmitted and committed store sink patch to trunk, I'll commit it
to gcc-linaro-4.6 next week
* submitted autodetection of vector size patch to gcc-patches, I'l
commit it next week
* started testing a patch that makes mvectorize-with-neon-quad the default
* DenBench: found some more cases where vectorization of strided
accesses using vzip/vuzp causes degradation. Since Richard is making a
lot of progress with vlsd/vst, I think it doesn't make sense to spend
too much time on vzip/vuzp, and I am going to run DenBench without
this patch.
Ira
Philipp Kern <trash(a)philkern.de> writes:
> On 2011-03-23, Goswin von Brederlow <goswin-v-b(a)web.de> wrote:
>> Also does the testing transition consider the Built-Using? If I specify
>> 'Built-Using: gcc-4.5 (= 4.5.2-5)' will the package be blocked from
>> entering testing until gcc-4.5 (= 4.5.2-5) has entered and block gcc-4.5
>> (= 4.5.2-5) from being replaced from testing?
>
> It doesn't need to. All we want is compliance on the archive side so that the
> sources are not expired away, as long as that binary is carried in a suite.
> No need to involve britney at that point.
>
> Kind regards
> Philipp Kern
Not quite. For ia32-libs it would be nice if ia32-libs could be blocked
from testing as long as the source packages it includes aren't in
testing. Currently that is solved by building against testing in the
first palce. But that is something we can live with.
As a side note the debian-cd package needs to also consider Built-Using
when creating source images. Will the Sources.gz file list multiple
entries for a source if multiple versions are relevant?
MfG
Goswin
Hi,
2009/11/2 Mark Hymers <mhy(a)debian.org>:
> On Mon, 02, Nov, 2009 at 12:43:42PM +0000, Philipp Kern spoke thus..
>> Of course it is a sane approach but very special care needs to be taken when
>> releasing to ensure GPL compliance. So what we should get is support in the
>> toolchain to declare against what source package the upload was built to
>> keep that around.
> We haven't implemented that yet for the archive software but it's on the
> TODO list (and not that difficult). None of us have had time to do the
> d-d-a post from the ftpteam meeting yet, but I'll make sure information
> is in there about it.
>
> I'm hoping to the archive-side support done in the next week or so.
Squeeze has already been released, cross toolchains were not released
along Debian main, but found at Emdebian repository.
Marcin Juszkiewicz has been working out cross compiler packages for
armel as part of his work for Linaro, which I attempt to include into
Debian main archive. As a result of the work done, linux-armel,
binutils-armel, eglibc-armel are merged into a single source package
named `cross-toolchain-base', the package is not optimal, but once we
got multiarch support, it should be renamed to `binutils-armel' (or
similar name) and use linux and eglibc libraries and headers provided
by multiarch.
Along this package I also plan to upload `gcc-4.5-cross' (#590465).
At the moment we are targeting one target architecture on two build
hosts ('{amd64,i386}->armel'), not sure if it is desired to be
supported on more build hosts. Target architecture support might grow
up in future, but right now it is not a priority.
Not sure if that is an issue for someone? Comments?
Best regards,
--
Héctor Orón
"Our Sun unleashes tremendous flares expelling hot gas into the Solar
System, which one day will disconnect us."
-- Day DVB-T stop working nicely
Video flare: http://antwrp.gsfc.nasa.gov/apod/ap100510.html
== Last week ==
* Committed STT_GNU_IFUNC changes to binutils.
* Submitted the STT_GNU_IFUNC changes to GLIBC ports. Got feedback
on Friday, which I'll deal with this week.
* Worked on the expand and rtl-level parts of the load/store lane
representation, with new optabs for each operation. This seems
to be working pretty well, but I still need to make some changes
to the way the existing intrinsics work.
* Wrote a patch to clean up the way we handle optabs during expand,
so that the new optabs mentioned above will need a bit less
cut-&-paste. Submitted upstream. Got some positive feedback.
* Committed testcase for PR rtl-optimization/47166 upstream.
== This week ==
* Deal with GLIBC feedback.
* More load/store lanes.
Richard
* Linaro GCC
Tested and merged both the latest Linaro merge requests, and various bug
fixes to the Shrink Wrap optimization from CS, into Linaro GCC 4.5.
Merged and tested from FSF GCC 4.6.
Richard and Ramana have approved some of my upstream patches! I just
need to wait for stage one so I can commit them upstream. I'll commit
them internally when I get time to do the final integration test.
Continued benchmarking GCC 4.6 with the patches merged from GCC 4.5.
Decided to discard a couple of extra patches since they don't appear to
be of any value.
* Other
On leave Wednesday to Friday playing daddy. :)
* Future Absence
Away Monday 28th to Friday 1st April.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
Hey
I'm trying to extend the *link: specs to pass a different
-dynamic-linker depending on the float ABI. But I didn't manage to
build a construct which would preserve the order of the flags; if I do
something like:
%{msoft-float:-dynamic-linker V1} %{mfloat-abi=softfp:-dynamic-linker V2}
Then I get V2 for "-mfloat-abi=softfp -msoft-float" instead of V1.
In gcc/gcc.c I found some docs on spec file syntax; I see one can use
%{S*&T*} and %{S*:X}, but apparently %{S*&T*:X} isn't allowed, so I
can't manipulate the value. I tried to use
%{msoft-float*:-dynamic-linker V1} %{mfloat-abi=softfp*:-dynamic-linker V2}
but that gives the same effect (the msoft-float flags are
grouped together in the original order and put first, then the
mfloat-abi=softfp are grouped together in the original order and put
second).
I didn't manage to get %{msoft-float*:%<msoft-float -dynamic-linker V1}
to work; in fact I didn't get supressions to work.
Any idea?
Thanks!
PS: float-abit=softfp/soft-float are just convenient examples; the
actual target is to use different -dynamic-linker for hard vs soft
float-abi
--
Loïc Minier
I went to the first QEMU Users Forum in Grenoble last week;
this is my impressions and summary of what happened. Sorry if
it's a bit TLDR...
== Summary and general observations ==
This was a day long set of talks tacked onto the end of the DATE
conference. There were about 40 attendees; the focus of the talks was
mostly industrial and academic research QEMU users/hackers (a set of
people who use and modify QEMU but who aren't very well represented on
the qemu-devel list).
A lot of the talks related to SystemC; at the moment people are
rolling their own SystemC<->QEMU bridges. In addition to the usual
problems when you try to put two simulation engines together (each of
which thinks it should be in control of the world) QEMU doesn't make
this easy because it is not very modular and makes the assumption that
only one QEMU exists in a process (lots of global variables, no
locking, etc).
There was a general perception from attendees that QEMU "development
community" is biased towards KVM rather than TCG. I tend to agree with
this, but think this is simply because (a) that's where the bulk of
the contributors are and (b) people doing TCG related work don't
always appear on the mailing list. (The "quick throwaway prototype"
approach often used for research doesn't really mesh well with
upstream's desire for solid long-term maintainable code, I guess.)
QEMU could certainly be made more convenient for this group of users:
greater modularisation and provision of "just the instruction set
simulator" as a pluggable library, for instance. Also the work by
STMicroelectronics on tracing/instrumentation plugins looks like
it should be useful to reduce the need to hack extra instrumentation
directly into QEMU's frontends.
People generally seemed to think the forum was useful, but it hasn't
been decided yet whether to repeat it next year, or perhaps to have
some sort of joint event with the open-source qemu community.
More detailed notes on each of the talks are below;
the proceedings/slides should also appear at http://adt.cs.upb.de/quf
within a few weeks. Of particular Linaro/ARM interest are:
* the STMicroelectronics plugin framework so your DLL can get
callbacks on interesting events and/or insert tracing or
instrumentation into generated code
* Nokia's work on getting useful timing/power type estimates out of
QEMU by measuring key events (insn exec, cache miss, TLB miss, etc)
and calibrating against real hardware to see how to weight these
* a talk on parallelising QEMU, ie "multicore on multicore"
* speeding up Neon by adding SIMD IR ops and translating to SSE
The forum started with a brief introduction by the organiser, followed
by an informal Q&A session with Nathan Froyd from CodeSourcery
(...since his laptop with his presentation slides had died on the
journey over from the US...)
== Talk 1: QEMU and SystemC ==
M. Monton from GreenSocs presented a couple of approaches to using
QEMU with SystemC. "QEMU-SC" is for systems which are mostly QEMU
based with one or two SystemC devices -- QEMU is the master. Speed
penalty is 8-14% over implementing the device natively. "QBox" makes
the SystemC simulation the master, and QEMU is implemented as a TLM2
Initiator; this works for systems which are almost all SystemC and
which you just want to add a QEMU core to. Speed penalty 100% (!)
although they suspect this is an artifact of the current
implementation and could be reduced to more like 25-30%. They'd like
to see a unified effort to do SystemC and QEMU integration (you'll
note that there are several talks here where the presenters had rolled
their own integration). Source available from www.greensocs.com.
== Talk 2: Combined Use of Dynamic Binary Translation and
SystemC for Fast and Accurate MPSoc Simulation ==
Description of a system where QEMU is used as the core model in a
SystemC simulation of a multiprocessor ARM system. The SystemC side
includes models of caches, write buffers and so on; this looked like
quite a low level detailed (high overhead) simulation. They simulate
multiple clusters of multiple cores, which is tricky with QEMU because
it has a design assumption of only one QEMU per process address space
(lots of global variables, no locking, etc); they handle this by
saving and restoring globals at SystemC synchronisation points, which
sounded rather hacky to me. They get timing information out of their
model by annotating the TCG intermediate representation ops with new
ops indicating number of cycles used, whether to check for
Icache/Dcache hit/miss, and so on. Clearly they've put a lot of work
into this. They'd like a standalone, reentrant ISS, basically so it's
easier to plug into other frameworks like SystemC.
== Talk 3: QEMU/SystemC Cosimulation at Different Abstraction Levels ==
This talk was about modelling an RTOS in SystemC; I have to say I
didn't really understand the motivation for doing this. Rather than
running an RTOS under emulation, they have a SystemC component which
provides the scheduler/mutex type APIs an RTOS would, and then model
RTOS tasks as other SystemC components. Some of these SystemC
components embed user-mode QEMU, so you can have a combination of
native and target-binare RTOS tasks. They're estimating time usage by
annotating QEMU translation blocks (but not doing any accounting for
cache effects).
== Talk 4: Timing Aspects in QEMU/SystemC Synchronisation ==
Slightly academic-feeling talk about how to handle the problem of
trying to run several separate simulations in parallel and keep their
timing in sync. (In particular, QEMU and a SystemC world.) If you just
alternate running each simulation there is no problem but it's not
making best use of the host CPU. If you run them in parallel you can
have the problem that sim A wants to send an event to sim B at time T,
but sim B has already run past time T. He described a couple of
possible approaches, but they were all "if you do this you might still
hit the problem but there's a tunable parameter to reduce the
probability of something going wrong"; also they only actually
implemented the simplest one. In some sense this is really all
workarounds for the fact that SystemC is being retrofitted/bolted
onto the outside of a QEMU simulation.
== Talk 5: Program Instrumentation with QEMU ==
Presentation by STMicroelectronics, about work they'd done adding
instrumentation to QEMU so you can use it for execution trace
generation, performance analysis, and profiling-driven optimisation
when compiling. It's basically a plugin architecture so you can
register hooks to be called at various interesting points (eg every
time a TB is executed); there are also translation time hooks so
plugins can insert extra code into the IR stream. Because it works at
the IR level it's CPU-agnostic. They've used this to do real work
like optimising/debugging of the Adobe Flash JIT for ARM. They're
hoping to be able to submit this upstream.
I liked this; I think it's a reasonably maintainable approach, and it
ought to alleviate the need for hacking extra ops directly into QEMU
for instrumentation (which is the approach you see in some of the
other presentations). In particular it ought to work well with the
Nokia work described in the next talk...
== Talk 6: Using QEMU in Timing Estimation for Mobile Software
Development ==
Work by Nokia's research division and Aalto university. This was
about getting useful timing estimates out of a QEMU model by adding
some instrumentation (instructions executed, cache misses, etc) and
then calibrating against real hardware to identify what weightings to
apply to each of these (weightings differ for different cores/devices;
eg on A8 your estimates are very poor if you don't account for L2
cache misses, but for some other cores TLB misses are more important
and adding L2 cache miss instrumentation gives only a small
improvement in accuracy.) The cache model is not a proper functional
cache model, it's just enough to be able to give cache hit/miss stats.
They reckon that three or four key statistics (cache miss, TLB miss, a
basic classification of insns into slow or fast) give estimated
execution times with about 10% level of inaccuracy; the claim was that
this is "feasible for practical usage". Git tree available.
This would be useful in conjunction with the STMicroelectronics
instrumentation plugin work; alternatively it might be interesting
to do this as a Valgrind plugin, since Valgrind has much more
mature support for arbitrary plugins. (Of course as a Valgrind
plugin you'd be restricted to running on an ARM host, and you're
only measuring one process, not whole-system effects.)
== Talk 7: QEMU in Digital Preservation Strategies ==
A less technical talk from a researcher who's working on the problems
of how museums should deal with preserving and conserving "digital
artifacts" (OSes, applications, games). There are a lot of reasons
why "just run natively" becomes infeasible: media decay, the connector
conspiracy, old and dying hardware, APIs and environments becoming
unsupported, proprietary file formats and on and on. If you emulate
hardware (with QEMU) then you only have to deal with emulating a few
(tens of) hardware platforms, rather than hundreds of operating
systems or thousands of file formats, so it's the most practical
approach. They're working on web interfaces for non-technical users.
Most interesting for the QEMU dev community is that they're
effectively building up a large set of regression tests (ie images of
old OSes and applications) which they are going to be able to run
automatic testing on.
== Talk 8: MARSS-x86: QEMU-based Micro-Architectural and Systems
Simulator for x86 Multicore Processors ==
This is about using QEMU for microarchitectural level modelling
(branch predictor, load/store unit, etc); their target audience is
academic researchers. There's an existing x86 pipeline level simulator
(PLTsim) but it has problems: it uses Xen for its system simulation so
it's hard to get installed (need a custom kernel on the host!), and it
doesn't cope with multicore. So they've basically taken PLTsim's
pipeline model and ported it into the QEMU system emulation
environment. When enabled it replaces the TCG dynamic translation
implementation; since the core state is stored in the same structures
it is possible to "fast forward" a simulation running under TCG and
then switch to "full microarchitecture simulation" for the interesting
parts of a benchmark. They get 200-400KIPS.
== Talk 9: Showing and Debugging Haiku with QEMU ==
Haiku is an x86 OS inspired by BeOS. The speaker talked about how they
use QEMU for demos and also for kernel and bootloader debugging.
== Talk 10: PQEMU : A parallel system emulator based on QEMU ==
This was a group from a Taiwan university who were essentially
claiming to have solved the "multicore on multicore" problem, so you
can run a simulated MPx4 ARM core on a quad-core x86 box and have it
actually use all the cores. They had some benchmarking graphs which
indicated that you do indeed get ~3.x times speedup over emulated
single-core, ie your scaling gain isn't swamped in locking overhead.
However, the presentation concentrated on the locking required for
code generation (which is in my opinion the easy part) and I wasn't really
convinced that they'd actually solved all the hard problems in getting
the whole system to be multithreaded. ("It only crashes once every
hundred runs...") Also their work is based on QEMU 0.12, which is now
quite old. We should definitely have a look at the source which they
hope to make available in a few months.
== Talk 11: PRoot: A Step Forward for QEMU User-Mode ==
STMicroelectronics again, presenting an alternative to the usual
"chroot plus binfmt_misc" approach for running target binaries
seamlessly under qemu's linux-user mode. It's a wrapper around qemu
which uses ptrace to intercept the syscalls qemu makes to the host; in
particular it can add the target-directory prefix to all filesystem
access syscalls, and can turn an attempt to exec "/bin/ls" into an
exec of "qemu-linux-arm /bin/ls". The advantage over chroot is that
it's more flexible and doesn't need root access to set up. They didn't
give figures for how much overhead the syscall interception adds,
though.
== Talk 12: QEMU TCG Enhancements for Speeding up Emulation of SIMD ==
Simple idea -- make emulation of Neon instructions faster by adding
some new SIMD IR ops and then implementing them with SSE instructions
in the x86 backend. Some basic benchmarking shows that they can be ten
times faster this way. Issues:
* what is the best set of "generic" SIMD ops to add to the QEMU IR?
* is making Neon faster the best use of resource for speeding up
QEMU overall, or should we be looking at parallelism or other
problems first?
* are there nasty edge cases (flags, corner case input values etc)
which would be a pain to handle?
Interesting, though, and I think it takes the right general approach
(ie not horrifically Neon specific). My feeling is that for this to go
upstream it would need uses in two different QEMU front ends (to
demonstrate that the ops are generic) and implementations in at least
the x86 backend, plus fallback code so backends need not implement the
ops; that's a fair bit of work beyond what they've currently
implemented.
== Talk 13: A SysML-based Framework with QEMU-SystemC Code Generation ==
This was the last talk, and the speaker ran through it very fast as we
were running out of time. They have a code generator for taking a UML
description of a device and turning it into SystemC (for VHDL) and C++
(for a QEMU device) and then cosimulating them for verification.
-- PMM
Hello list,
Recently, Android team is working on integrating Linaro toolchain for
Android and NDK. According to the initial benchmark results[1],
Linaro GCC is competitive comparing to Google toolchain. In the
meanwhile, we are trying to enable gcc-4.5 specific features such as
Graphite and LTO (Link Time Optimization) in order to make the best
choice for Android build system and NDK usage. However, I encountered
a problem about LTO and would like to ask help from toolchain WG.
Assuming Linaro Toolchain for Android is installed in directory
/tmp/android-toolchain-eabi, you can obtain Google's toolchain
benchmark suite by git:
# git clone git://android.git.kernel.org/toolchain/benchmark.git
You have to apply the attached patch in order to make benchmark suite
work[2]. Then, change directory to skia:
# cd benchmark/skia
And build skia bench with LTO enabled:
# ../scripts/bench.py --action=build
--toolchain=/tmp/android-toolchain-eabi --add_cflags="-flto
-user-linker-plugin"
The build process would be interrupted by gcc:
make -j4 --warn-undefined-variables -f ../scripts/build/main.mk
TOOLCHAIN=/tmp/android-toolchain-eabi ADD_CFLAGS="-flto
-user-linker-plugin" build
CPP ARM obj/src/core/Sk64.o <= src/src/core/Sk64.cpp
CPP ARM obj/src/core/SkAlphaRuns.o <= src/src/core/SkAlphaRuns.cpp
CPP ARM obj/src/core/SkBitmap.o <= src/src/core/SkBitmap.cpp
CPP ARM obj/src/core/SkBitmapProcShader.o <= src/src/core/SkBitmapProcShader.cpp
CPP ARM obj/src/core/SkBitmapProcState.o <= src/src/core/SkBitmapProcState.cpp
CPP ARM obj/src/core/SkBitmapProcState_matrixProcs.o <=
src/src/core/SkBitmapProcState_matrixProcs.cpp
src/src/core/SkBitmapProcShader.cpp: In function
'SkShader::CreateBitmapShader(SkBitmap const&, SkShader::TileMode,
SkShader::TileMode, void*, unsigned int)':
src/src/core/SkBitmapProcShader.cpp:243:13: warning: 'color' may be
used uninitialized in this function
CPP ARM obj/src/core/SkBitmapSampler.o <= src/src/core/SkBitmapSampler.cpp
src/src/core/SkBitmapProcState_matrixProcs.cpp:530:1: sorry,
unimplemented: gimple bytecode streams do not support machine specific
builtin functions on this target
...
However, I can get other bench items passed such as cximage, gcstone,
gnugo, mpeg4, webkit, and python.
Can anyone give me some hints to resolve LTO problem? Thanks in advance.
Sincerely,
-jserv
[1] https://wiki.linaro.org/Platform/Android/Toolchain#Reference%20Benchmark
We use the same toolchain benchmark suite as Google compiler team took.
[2] https://wiki.linaro.org/Platform/Android/UpstreamToolchain
== Last week ==
* CoreMark ARMv6/v7 regressions: posted another combine patch upstream,
which was quickly approved and committed. The XOR simplification one is
now approved too, but needs a little more revising of comments before
committing.
* The above two patches now bring CoreMark under -march=armv7-a to very
close of the performance of -march=armv5te. However, a regression where
uxtb+cmp cannot be combined into 'ands ... #255' still causes v7 to lose
slightly. This should be the final issue to solve...
* Launchpad #736007/GCC Bugzilla PR48183: NEON ICE in
emit-rtl.c:immed_double_const() under -g. Posted patch upstream, but
looks like more discussion is needed before we know if this is the
"right" way to do it.
* Launchpad #736661, armel FTBFS (G++ ICE in expand_expr_real_1()).
Looking at this.
* Pinged a few upstream patch submissions.
== This week ==
* Launchpad #723185/CS issue #9845 now assigned to me, start looking at
this.
* Get the XOR patch committed upstream, and the above described uxtb+cmp
issue solved.
* Work on other GCC issues.
Hi there. I have a custom report on top of the Launchpad tickets that
shows how old they are and if they need attention:
http://ex.seabright.co.nz/helpers/tickets/gcc-linaro?group_by=lint
I check this once a day to see how we're doing. It's useful when
deciding which bug to attack next.
-- Michael
== libunwind ==
* Had few discussions with Uli with regard to unwinding.
* Continued to learn about libunwind internals.
* The .ARM.exidx and .ARM.extbl section parser is functional but the
integration into libunwind needs to be improved. Currently there are two
seperate models that hold the informations of the current frame. Since they
are not synchronized the behavior of libunwind is quite unexpected to the
user.
* I started on eliminating the redundancy by removing the model that was
introduced for the extbl support. My goal is to have the parser operate on the
DWARF model directly. In theory this should also allow to mix DWARF- and
extable-frames.
Regards
Ken
== GCC ==
* Started looking at performance regressions. Setting up builds with
EEMBC Denbench and other benchmarks.
* Looked at PR47719 in some detail this week.
* Set up environment on laptop . Fixed PR46788 in 4.6 branch and trunk.
* Discussions regarding armhf, how to maintain Linaro branches -
upstreaming patches etc.
* Looked at a case of performance improvements with VFP stores. I think
it's because we end up allowing PRE_INC and POST_DEC for floating point
mode values because of which there end up being more transfers to and
from the integer core registers.
* Off sick on Monday 14th March 2011.
== Misc ==
* Sorted out travel arrangements for LDS. Waiting for visa now.
== GDB ==
* Ongoing work on glibc patch to add ARM unwind tables to system
call stubs (bug #684218).
* Implemented initial version of a kernel patch that fixes GDB
inferior calls while stopped in a restartable system call
(bug #615974); started discussion with kernel folks.
* Implemented new version of patch to fix single-stepping over
signal handlers (bug #615978) that addresses review comments;
posted to mailing list.
* Verified Linaro GDB patch set can be applied to Ubuntu package.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
(sent early this week since I'll be in the conference all Friday)
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
Historical Milestones: [trimmed the 2010 ones]
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
== maintain-beagle-models ==
* wrote most of a patch to properly implement prlimit64 syscall
in usermode -- needs checking/testing
* wrote most of a patch to allow boards to specify their min/max/default
RAM size rather than having a single qemu-wide default; this would
let us specify RAM size on the beagle model (currently hardwired)
* retested the patch I wrote late last year which fixes TCG's locking
problems by having interrupting signals/threads just set a flag which
we check in every TB (rather than trying to mess with the TCG graph
in a totally unsafe and crashprone way). Slowdown: about 3.5% on
"boot Linaro nano on vexpress to rootprompt and shutdown again".
I shall see if I can persuade people that this is a price worth
paying for not randomly crashing in thread-heavy code :-)
== merge-correctness-fixes ==
* Wrote/submitted patchset which fixes VRECPS edge case handling
(mostly NaN related)
* Wrote/submitted patchset which fixes Neon VLD of single
element to all lanes
* Wrote/submitted patch which fixes qemu to work on an ARM host
where the host C code has been built in Thumb mode
== other ==
* attended QEMU Users Forum in Grenoble
* meetings: toolchain, standup, pdsw-tools, arch q&a
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Short week
* libffi patch accepted upstream
* eglibc integration of string routine changes
- I have something that works but it's more complex than I'd like
(to get it to fall
back to the C code on stuff I haven't optimised for).
* Trying a neon memchr; tried a really simple 8 byte a loop version - it's
quite slow on both A8 and A9; branching on the result of comparisons
done in the neon is not simple.
* Porting jam bug 735877 chromium using d32 float; it was passing
vpfpv3 rather than using the default when configured without neon.
On holiday tomorrow (Friday).
Dave
Hello,
Experiment with aes benchmark from DENbench.
Continue my experiments with SMS which includes re-implementing an old
patch to insert reg-moves in free slots rather than greedily before the
definition as is done in the current implementation.
Thanks,
Revital
Hi,
* submitted store sinking patch to mainline
* started testing auto-detection of vector size patch
* DENBench - some benchmarks are still unstable, I am looking into
stable regressions, adjusting and fixing the cost model for them
Next week:
Sunday and Monday - holidays
Ira
Dave did an investigation earlier in the year into Cortex-A9 and
RealView PBX support in QEMU. The write-up is available here:
https://wiki.linaro.org/WorkingGroups/ToolChain/Outputs/QEMURealViewPBX
Dave and Peter: could you please review it?
I've now closed out the blueprint. I'd like to do similar reports on
other outputs and will attack vexpress next.
-- Michael
Hi Michael, Andrew,
Mounir just pointed out that our non-Ubuntu LP projects (like gcc-linaro,
gdb-linaro etc.) are now also included in the LP work-item tracking
statistics (http://status.linaro.org/linaro-toolchain-wg.html). This
didn't happen in the past due to a Launchpad issue that has now been fixed.
This seems to be working out nicely, except for one issue: what about the
gcc-linaro-tracking project? I have a couple of bugs that are fixed in
Linaro GCC, and are also fixed in mainline GCC, but they still show up as
an "in-progress" work-item in the status tracker (there are a whole bunch
more of those assigned to Andrew as well). The reason for this is the LP
records have an associated gcc-linaro-toolchain project entry, and this is
set to "Fix Committed", but not "Fix Released" ... probably because GCC
4.6.0 is not yet released?
Now, on the one hand it does make sense to include the -tracking project in
the work-item statistics, because they *do* reflect important tasks:
namely, to make sure that the changes indeed land in the upstream
repository. However, having them all show up as "in progress" until the
community makes a new GCC release does not seem very helpful: this is not
in our control, and our work is in fact done once the patch is committed
upstream.
Therefore my suggestion: we should immediately mark -tracking bugs as "Fix
Released" (not "Fix Committed"), as soon as the corresponding patch is
committed upstream (and thus our work on the problem is completed).
Thoughts? Does this make sense? Will this mess up any of the other
purposes for which we currently use the -tracking project?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi Dave. I had a little play with cortex-strings and did some
benchmarks on my Tegra 2. Images are attached.
I've added two scripts to cortex-strings:
scripts/bench-all.sh runs all the routines on all variants and records them
scripts/plot.py plots the results from above
ploy.py corrects for the benchmark overhead by doing a linear fit to
the null 'bounce' results and subtracting this fit.
You should be able to a autogen; configure; make; bash
scripts/bench-all.sh | tee log.txt; python scripts/plot.py log.txt.
I'm sure you have your own favourite tools though.
The string routines look good. Lumpy in funny ways though...
-- Michael
[Sorry, forgot to CC: the list]
Hi Ira,
Thanks for the feedback.
On 6 March 2011 09:20, Ira Rosen <IRAR(a)il.ibm.com> wrote:
> > So how about the following functions? (Forgive the pascally syntax.)
> >
> > __builtin_load_lanes (REF : array N*M of X)
> > returns array N of vector M of X
> > maps to vldN
> > in practice, the result would be used in assignments of the form:
> > vectorX = ARRAY_REF <result, X>
> >
> > __builtin_store_lanes (VECTORS : array N of vector M of X)
> > returns array N*M of X
> > maps to vstN
> > in practice, the argument would be populated by assignments ofthe
> form:
> > vectorX = ARRAY_REF <result, X>
> >
> > __builtin_load_lane (REF : array N of X,
> > VECTORS : array N of vector M of X,
> > LANE : integer)
> > returns array N of vector M of X
> > maps to vldN_lane
> >
> > __builtin_store_lane (VECTORS : array N of vector M of X,
> > LANE : integer)
> > returns array N of X
> > maps to vstN_lane
> >
>
> How do you distinguish between "multiple structures" and "single structure
> to all lanes"?
Sorry, I'm not sure I understand the question. Could you give a couple
of examples?
The idea is that the arrays above really are array types, regardless of the
actual type of the thing we're accessing (which might be a larger array
than the bounds above say, or which might be an array of structures
or a structure of arrays). That should be OK because arrays alias
their elements.
Richard
Hi Matthias,
in last week's meeting you raised the question what, if any, code from the
Linaro GDB repository could be useful for inclusion into the natty GDB
package. I've now reviewed the contents of the repository, and my
suggestion would be to use everything in Linaro GDB 7.2, except for this
commit (which changes the branding to "Linaro GDB"):
revno: 32969
committer: Ulrich Weigand <uweigand(a)de.ibm.com>
branch nick: 7.2
timestamp: Wed 2010-09-22 19:18:38 +0200
message:
2010-09-22 Ulrich Weigand <uweigand(a)de.ibm.com>
* src-release: Support gdb-linaro packages.
gdb/
* version.in: Set to Linaro GDB version number.
* configure.ac (PKGVERSION, BUGURL): Refer to Linaro.
* configure: Regenerate.
gdb/gdbserver/
* configure.ac (PKGVERSION, BUGURL): Refer to Linaro.
* configure: Regenerate.
gdb/doc/
* configure.ac (PKGVERSION, BUGURL): Refer to Linaro.
* configure: Regenerate.
(Instead, the branding ought to be set as appropriate for the Ubuntu
package. Maybe with an additional reference to Linaro, just as with GCC?)
I've created a snapshot of the Linaro GDB 7.2 branch using the command
bzr diff --prefix a/:b/ -r32965..
and then manually removed changes to
src-release
gdb/version.in
gdb/configure.ac
gdb/configure
gdb/gdbserver/configure.ac
gdb/gdbserver/configure
gdb/doc/configure.ac
gdb/doc/configure
I've left in the new file ChangeLog.linaro for documentation purposes, but
if you prefer this could of course be removed as well.
The resulting patch is appended here. (Note that I'd recommend to continue
updating the patch from Linaro GDB as further changes make it in.)
(See attached file: linaro-gdb.patch)
I've then added the patch to the natty GDB package. Since it touches a
completely distinct set of files compared to the existing list of patches
in the package, it can be added to the series file at any arbitrary point.
I've built the resulting compiler on i386, arm, and ppc64, and it strictly
improved the test results on all three platforms:
i386 without patch:
# of expected passes 16161
# of unexpected failures 114
# of expected failures 72
# of untested testcases 9
# of unresolved testcases 1
# of unsupported tests 69
i386 with patch:
# of expected passes 16331
# of unexpected failures 24
# of expected failures 72
# of untested testcases 9
# of unresolved testcases 1
# of unsupported tests 69
Fixed test case failures are from:
gdb.base/break-interp.exp
gdb.base/foll-fork.exp
gdb.base/printcmds.exp
(These are just test suite cleanups, no actual code changes.)
ppc without patch:
# of expected passes 15350
# of unexpected failures 74
# of expected failures 53
# of untested testcases 15
# of unresolved testcases 1
# of unsupported tests 63
ppc with patch:
# of expected passes 15350
# of unexpected failures 55
# of expected failures 53
# of untested testcases 15
# of unresolved testcases 1
# of unsupported tests 63
Fixed test case failures are from:
gdb.base/printcmds.exp
gdb.threads/local-watch-wrong-thread.exp
gdb.threads/watchthreads.exp
(These are just test suite cleanups, no actual code changes.)
arm without patch:
# of expected passes 15343
# of unexpected failures 270
# of unexpected successes 1
# of expected failures 65
# of untested testcases 11
# of unresolved testcases 2
# of unsupported tests 70
arm with patch:
# of expected passes 15686
# of unexpected failures 46
# of unexpected successes 3
# of expected failures 63
# of untested testcases 11
# of unresolved testcases 1
# of unsupported tests 69
Fixed test case failures are from:
gdb.base/break-interp.exp
gdb.base/corefile.exp
gdb.base/foll-fork.exp
gdb.base/gcore.exp
gdb.base/gdb1555.exp
gdb.base/pr11022.exp
gdb.base/printcmds.exp
gdb.base/recurse.exp
gdb.base/relativedebug.exp
gdb.base/step-test.exp
gdb.base/watch-cond.exp
gdb.base/watch-read.exp
gdb.base/watch_thread_num.exp
gdb.base/watch-vfork.exp
gdb.gdb/selftest.exp
gdb.mi/gdb792.exp
gdb.mi/mi2-syn-frame.exp
gdb.mi/mi2-var-display.exp
gdb.mi/mi2-watch.exp
gdb.mi/mi-syn-frame.exp
gdb.mi/mi-var-display.exp
gdb.mi/mi-watch.exp
gdb.pie/corefile.exp
gdb.server/ext-attach.exp
gdb.threads/attachstop-mt.exp
gdb.threads/attach-stopped.exp
gdb.threads/linux-dp.exp
gdb.threads/local-watch-wrong-thread.exp
gdb.threads/pthread_cond_wait.exp
(This represents much of the bug fix work that went into Linaro GDB.)
Let me know if there's any further information you need, or anything else I
can do to help get the Linaro changes into natty GDB.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Merged fixes for several bug into Linaro GCC 4.5. Both from Linaro
(Richard, Matthias and Ramana), and from CS (the shrink wrap problems).
Continued working on benchmarking the patches I've merged to 4.6. Spent
quite some time trying to figure out why EEMBC and the Spec2K weren't
working properly. I've got this sorted now.
Confirmed that the patch to discourage NEON use for integer operations
is still profitable on Cortex-A8. Posted the patch upstream.
Merged upstream GCC 4.6 into Linaro GCC 4.6.
Booked travel to Budapest for Linaro @ UDS.
Followed up on Ramana's questions about the RVCT interoperability patch.
Paul Brook helped explain what it was about, and pointed me at the
proper section in the proper ARM manual.
Continued forward porting patches to 4.6. Mostly I need to convince
myself that they still do something useful. I have posted one new patch
to upstream - the "Discourage A8 NEON" patch.
* Future Absence
Away Wednesday 16th to Friday 18th.
Away Monday 28th to Friday 1st April.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* RVCT Interoperability patch
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg00059.html
* Discourage NEON on A8
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg00576.html
== Last week ==
* Working on Coremark ARMv6 regressions. Identified a major cause being
RTL ifcvt failing on one of the crc routines, due to combine pass
failing to optimize a particular sequence, causing the if-conversion
estimates to give up on conditional executing (too many insns). The
combine pass failed on ARMv6 and above, due to the existence of true
zero_extend insns. On ARMv5, the use of two shifts actually allowed
combine to phase reduce the shifts one by one, thus producing better
code. On ARMv6, combine produced a (xor (and ...) <mask>) which did not
match any insn. Analyzed and sent a patch upstream which should work on
such XOR cases. Patch is due for upstream commit for 4.7-stage1.
(http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00609.html)
* Another situation of un-optimized uxth insns still exists; trying
to solve this by another combine patch I am currently testing, will send
upstream later.
== This week ==
* verify the improvements the above patches should have on Coremark for
ARMv6/v7.
* Work on sending them to Linaro and SG++ branches.
* Other bug issues.
== GDB ==
* Ongoing work on glibc patch to add ARM unwind tables to system
call stubs; ran into design problems that look difficult to fix.
* As an alternative, started work on a GDB patch to recognize glibc
system call assembler stubs via code-scanning; this should allow
alloc unwinding in the absence of debug info for current libc code.
* Analyzed bug #728216 (GDB fails to get a valid backtrace while
debugging a Webkit SIGSEGV) and resolved as invalid; the fault
occurs within JIT-generated code where unwinding is impossible.
== Misc ==
* Made travel arrangements for Linaro Summit in Budapest
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber:
Green: another qemu-linaro release out the door on time
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
== maintain-beagle-models ==
* released qemu-linaro 2011-03
* had to do a 2011-03-1 reroll of the tarball on day of release to
fix a "versatilepb model crashes on startup" bug found at last minute
* Paul Larson is working on having automated test image boots on
qemu built from git, so we can catch this much earlier in the cycle
== merge-correctness-fixes ==
* added support to risu for testing of load and store instructions
* used this to test a patch which cleans up Thumb load/store decode
and makes us UNDEF in the right places
* wrote/submitted patch to fix GE bits for signed modulo arithmetic
* wrote/submitted patch to get SMUAD/SMLAD Q bit right in an edge case
* started on a patchset which will fix various minor qemu Neon bugs
detected by test programs from the valgrind source tree
== other ==
* meetings: toolchain, standup, pdsw-tools
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hi,
== libunwind ==
* the patches posted last week are now upstream
* continued to study the Exception Handling ABI for the ARM Architecture
* looked into the structure of libunwind (lib interdependencies)
* documented at: https://wiki.linaro.org/KenWerner/Sandbox/libunwind
* The work on the local unwinding appears to be quite complete. If the
generic unwind model is used the code assumes the GCC personality routine. We
should either check name of the symbol (maybe be difficult) or just call the
pers function. I'm in contact with Zach on this.
Regards
Ken
LP: #731665 is a silent bad code generation bug at least on functions
which are empty except for inline assembly:
https://bugs.launchpad.net/ubuntu/+source/gcc-4.5/+bug/731665
It was introduced in the shrink-wrap patch and is due to using an
uninitialised variable. Andrew, can you please address this urgently
either in Linaro or CSL.
-- Michael
== hard-float ==
* Updated libffi variadic patch and Sent updated libffi variadic
patch to the ffi mailing list.
== String routines ==
* Got a big endian build environment going
* Patched up memchr and strlen for big endian; turned out to be a
very small change in the end; and
tested it on qemu-armeb - note that an older version it didn't work
on, but a newer one it did; I'll assume
the newer one is correct.
* Fixed a couple of build issues in the cortex strings test harness
== Other ==
* Kicked off a SPEC2006 train run on canis using the 2011.03 compilers
I'm on holiday tomorrow (Friday) and Monday.
Dave
Hello,
* Sent the patch to support targets
that their doloop part is not decoupled from the rest of the loop's
instructions (as is the case for ARM) to @gcc-patches:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00350.html
* Continue looking into DENbench benchmarks.
Thanks,
Revital
Hi,
* continued working on cost model tuning. I don't see much difference
running EEMBC DenBench with and without vectorization enabled (and,
therefore, also with and without cost model).
Also, I have to say, that the results are not stable and I sometimes
get 10% difference just running the same executable two times in a
row.
* the only benchmark I see consistent degradation 5% with
vectorization is DenBench aes, both with GCC trunk and gcc-linaro 4.5.
I found one of the responsible loops, if it is not vectorized I see
only 1.8% degradation. The problem there is that the loop bound is
unknown at compile time, so the vectorizer attempts to vectorize the
loop using runtime guards to verify that there are enough iterations
to vectorize. The actual number of iterations is 4, so the scalar
version of the loop is chosen at the run time, but I guess the guards
cause the degradation. I'll continue looking into this next week.
* prepared the conditional-store-sink patch (one of the patches that
helps to vectorize Telecom Viterbi) for submission to gcc-patches.
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.03-1.
Linaro QEMU 2011.03-1 is the second release of qemu-linaro. Based
off upstream (trunk) qemu, it includes a number of ARM-focused
bug fixes and enhancements.
This release includes a model of the ARM Versatile Express
platform. This is still experimental but may be of use to people
who want a model supporting up to 1GB of RAM with graphics and
networking. Instructions for getting started with it are on the
wiki: https://wiki.linaro.org/PeterMaydell/QemuVersatileExpress
Other interesting changes include:
- The OMAP emulation bug which was causing hangs if Linux tried
to enable a swapfile is fixed
- The OMAP UART model has been improved; this fixes the problem where
kernels using the new omap-hsuart serial drivers stopped serial output
halfway through boot.
- As usual, various minor correctness fixes and other upstream changes
Known issues:
- The beagle and beaglexm models do not support USB, so there is no
keyboard, mouse or networking (#708703)
The only change over the shortlived 2011.03-0 is that the last
minute bug #731093 has been fixed (versatilepb models would crash
on startup.)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.03-1
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu.
When ready, Natty packages of qemu-linaro 2011.03-1 will be in the
Ubuntu archive. Packages for users of Ubuntu 10.04 LTS and Ubuntu
10.10 will be in the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
Hi all,
I've had comments that getting hold of binaries for the linaro
toolchain can be trick for people unfamiliar with the linaro tools.
One reason is that we don't release binaries as such -- but a visitor
browsing in through http://www.linaro.org/downloads/ won't discover
this, and may waste a lot of time trying to understand launchpad etc.
before coming to the conclusion that binaries either aren't available
or are not easily findable.
On the other hand, the cross toolchain packages are likely to be of
interest to such visitors, but aren't obviously advertised -- maybe
I'm looking in the wrong place, but if so then new visitors to the
linaro pages are likely to look in the wrong place too.
Would it make sense to explain the situation more prominently so that
visitors know what to expect?
Something along the lines of "if you use distro x revision y, these
cross-compiler packages are available" and "if you need the tools for
some other environment, you need to download the source and build it
for yourself".
Cheers
---Dave
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.
Linaro GDB 7.2 2011.03-0 is the fourth release in the 7.2 series.
Based off the latest GDB 7.2, it includes ARM-focused bug fixes and
enhancements.
Interesting changes include:
* Hardware watchpoint support
* Backtracing while in the Linux kernel trampoline frame
Hardware watchpoints use the support built into ARM devices to watch
for changes in values in memory with little performance impact. A
2.6.37 or later kernel is required.
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.2-2011.03-0
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
-- Michael
Committed Kazu's VFP testcases patch upstream.
Merged the latest from upstream GCC 4.6.
Merged all the outstanding launchpad merge requests against both GCC 4.5
and 4.6.
Spun the 4.5-2011.03-0 and 4.6-2011.03-0 releases. Passed the tarballs
to Michael H for final testing.
Brought the patch tracker up to date w.r.t. to new merges.
Posted one of Dan's patches upstream for review.
Decided to drop Julian's A8 alignment patch completely. I had previously
discovered it provided no measurable benefit on A8, and now I've found
the same for A9 (Pandaboard). There's no real improvement for any
combination of -falign-* options in EEMBC.
Bernd's "Discourage NEON on A8" patch also doesn't show any value in the
benchmark results, but I think I've forward ported it wrong, because it
should at least change the binary size, and it doesn't. I need to look
into this further.
I also decided I don't know enough about ARMv7, so I spent some time
reading a few chapters from the ARM A.R.M.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* RVCT Interoperability patch
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg00059.html
Last week:
* Launchpad #711819 / PR47719: ARM minipool ICE. Followed up on
discussion with Bernd and Ramana. Later posted discussion results on
gcc-patches, where Richard Earnshaw took it over with a final fix.
* Coremark ARMv7/v6 regressions: mostly pinpointed the exact cases where
RTL simplification fails to optimize away ZERO_EXTEND expressions. Still
working on how to enhance it.
* TW Public Holiday on Feb.28 (Mon), was off for one day.
This week:
* Try to turn Coremark regression investigation into code form.
* Other GCC issues.
I've been spending this week playing around with various representations
of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the
best representation would be to use built-in functions.
One concern in the original discussion was that the optimisers might
move the original MEM_REFs away from the call. I don't think that's
a problem though. For loads, we can simply treat the whole of the
accessed memory as an array, and pass the array by value. If we do that,
then the call would just look like:
__builtin_load_lanes (MEM_REF[(elem[N] *)ADDR])
(where, despite the C notation, the MEM_REF accesses the whole of elem[N]).
It is of course possible in principle for the tree optimisers to replace
this MEM_REF with another, equivalent, one, but that's OK semantically.
It isn't possible for the optimisers to replace it with something like
an SSA name, because arrays can't be stored in gimple registers.
__builtin_load_lanes would then be used like this:
combined_vectors = __builtin_load_lanes (...);
vector1 = ...extract first vector from combined_vectors...
vector2 = ...extract second vector from combined_vectors...
....
So combined_vectors only exists for load and extract operations.
The question then is: what type should it have? (At this point I'm
just talking about types, not modes.) The main possibilities seemed to be:
1. an integer type
Pros
* Gimple registers can store integers.
Cons
* As Julian points out, GCC doesn't really support integer types
that are wider than 2 HOST_WIDE_INTs. It would be good to
remove that restriction, but it might be a lot of work, and it
isn't something we'd want to take on as part of this project.
* We're not really using the type as an integer.
* The combination of the integer type and the __builtin_load_lanes
array argument wouldn't be enough to determine the correct
load operation. __builtin_load_lanes would need something
like a vector count (N => vldN) argument as well.
2. a combined vector type
Pros
* Gimple registers can store vectors.
Cons
* For vld3, this would mean creating vector types with non-power-
of-two vectors. GCC doesn't support those yet, and you get
ICEs as soon as you try to use them. (Remember that this is
all about types, not modes.)
It _might_ be interesting to implement this support, but as
above, it would be a lot of work. It also raises some semantic
questions, such as: what is the alignment of the new vectors?
Which leads to...
* The alignment of the type would be strange. E.g. suppose
we're loading N*2 uint32_ts into N vectors of 2 elements each.
The types and alignments would be:
N=2 uint32x4_t, alignment 16
N=3 uint32x6_t, alignment 8 (if we follow the convention for modes)
N=4 uint32x8_t, alignment 32
We don't need alignments greater than 8 in our intended use;
16 and 32 are overkill.
* We're not really using the type as a single vector,
but as a collection of vectors.
* The combination of the vector type and the __builtin_load_lanes
array argument wouldn't be enough to determine the correct
load operation. __builtin_load_lanes would need something
like a vector count (N => vldN) argument as well.
3. an array of vectors type
Pros
* No support for new GCC features (large integers or non-power-of-two
vectors) is needed.
* The alignment of the type would be taken from the alignment of the
individual vectors, which is correct.
* It accurately reflects how the loaded value is going to be used.
* The type uniquely identifies the correct load operation,
without need for additional arguments. (This is minor.)
Cons
* Gimple registers can't store array values.
So I think the only disadvantage of using an array of vectors is that the
result can never be a gimple register. But that isn't much of a disadvantage
really; the things we care about are the individual vectors, which can
of course be treated as gimple registers. I think our tracking of memory
values is good enough for combined_vectors to be treated as such
(even though, with the back-end changes we talked about earlier,
they will actually be stored in RTL registers).
So how about the following functions? (Forgive the pascally syntax.)
__builtin_load_lanes (REF : array N*M of X)
returns array N of vector M of X
maps to vldN
in practice, the result would be used in assignments of the form:
vectorX = ARRAY_REF <result, X>
__builtin_store_lanes (VECTORS : array N of vector M of X)
returns array N*M of X
maps to vstN
in practice, the argument would be populated by assignments of the form:
vectorX = ARRAY_REF <result, X>
__builtin_load_lane (REF : array N of X,
VECTORS : array N of vector M of X,
LANE : integer)
returns array N of vector M of X
maps to vldN_lane
__builtin_store_lane (VECTORS : array N of vector M of X,
LANE : integer)
returns array N of X
maps to vstN_lane
Note that each operation can be expanded independently. The expansion
doesn't rely on preceding or following statements.
I've hacked up the prototype below as a proof of concept. It includes
changes to the C parser to allow these functions to be created in the
original source code. This is throw-away code though; it would never
be submitted.
I've also included a simple test case and the output I get from it.
The output looks pretty good; there's not even the stray VMOV that
I saw with the intrinsics earlier in the week.
(Note that if you'd like to try this yourself, you'll need the patch
I posted on Monday as well.)
What do you think? Obviously this discussion needs to move to gcc@ at
some point, but I wanted to make sure this was vaguely sane first.
Richard
Hello,
I am looking for a way to disable '-gtoggle' flag in the run of stage 2 in
bootstrap; when
configuring ARM with (*).
The flag seems to be applied in stage 2 but not in stage 3 which seems to
cause bootstrap failure when
testing SMS as in stage 2 SMS fails because of debug_insn caused
by -gtoggle disturbing do-loop; while in stage 3 SMS succeeds; resulting
in different .o files and bootsrtrap failure.
(*) This the configure I used:
../gcc/configure --prefix=/home/eres/mainline/build --enable-checking
--enable-languages=c --enable-bootstrap
Thanks,
Revital
== GDB ==
* Committed fix for the GDB part of #620611 (Unable to
backtrace out of vector page 0xffff0000) to mainline and
Linaro GDB 7.2.
* Ran into GDB crashes due to memory corruption in tests
involving multiple inferiors. Tracked down root cause
(using valgrind) to long-standing double free bug in GDB
terminal state handling code. Committed fix to mainline
and Linaro GDB 7.2.
* While using valgrind (see above), ran into problems:
* ptrace system call is unsupported on ARM
* certain variants of the "SUB from SP" Thumb-2 instruction
are not handled by the VEX compiler
Fixed both problems locally, and was then able to successfully
valgrind GDB on ARM.
* Created Linaro GDB 7.2-2011.03-0 release.
* Worked on glibc patch to add ARM unwind tables to system
call stubs; this will help unwinding in the absence of
debug info for libc, and in particular fix #684218 (Failures
in gdb.base/call-signal-resume.exp)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
== PandaBoard ==
* upgraded my ARM dev environment from Ubuntu to Linaro snapshot (20110303)
* found another kernel bug on the panda (#728565)
== libunwind ==
* resolved build issues on ARM (when using the linaro snapshot)
* allows the testsuite to work with linkers that do not pull in indirect
shared libs
* fix build of the test-static-link test case on ARM
* link libunwind-setjmp.so against libunwind-elf
* posted some first patches on the libunwind ml
* learned about the Exception Handling ABI for the ARM Architecture
Regards
Ken
Starting back in Linaro land after a gap of 3-4 weeks where I've been
away on ARM internal tasks.
== GCC ==
- Setting up a new machine that I received for Linaro work.
- Spent some time reviewing upstream patches. Spent some time on the
P1 PR47719 upstream to get this fixed .
- Starting to read up on the benchmarking report and recreating the
environments.
- Looked through some of the speed tickets to have a look through and
spend some time on it.
- Put in a hardware request for a Panda board.
== Next Week ==
- Set up environment properly for some amount of benchmarking.
- Look at some of the performance regressions and work on some things
that need to be done.
- Continue looking at PR47719.
* Investigated and fixed sqlite3 testsuite failure on ARM (bug 725052)
* Discussing libffi API changes with maintainer; hopefully he's
going to send out his comments today.
* Looking at how to upstream the string routine changes
* Need to look at big endian testing
* Testing QEmu pre-release for Peter; looking very nice.
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
== maintain-beagle-models ==
* preparation and test for next week's qemu-linaro 2011-03 release
* put in a temporary fix for bug 723630 (apt/glibc now try prlimit64
syscall, so silence qemu warnings about not implementing it)
* investigated qemu warnings about bad 16 bit writes: this is a
kernel bug: https://bugs.launchpad.net/linux-linaro/+bug/727781
== vexpress model ==
* sent vexpress patches upstream, put into qemu-linaro
== merge-correctness-fixes ==
* more work on performance counter registers: proper cycle counter
implementation; now just needs a bit of tidying before upstreaming
* ran valgrind's test cases on qemu; added revealed issues to
https://blueprints.launchpad.net/qemu-linaro/+spec/merge-correctness-fixes
* sent patch fixing broken VMOV s0,s1,r0,r1 implementation
* sent patch fixing inverted carry bit on ORNS
== other ==
* meetings: toolchain, standup, architecture q&a, pdsw-tools, team brief
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
== This week ==
* Submitted the fix for the Qt miscompilation upstream. Applied after
approval.
* Submitted a patch for the Thumb LDR problem that Dave Martin hit.
This was rejected.
* Ended up spending a few days on the "unreasonable amount of memory
while compiling qemu" bug due to unfamiilarity with the DWARF 2 code.
I realise the original idea was that I'd just file this upstream,
but it was one of those cases where I kept finding out more info
for the bug report until the problem became obvious.
I've now submitted two patches for this upstream. The first was trivial
and is now in. I was asked to add a bit of extra code to the second,
which I hope to do next week.
* Looked at the MIPS bug that was reported against the Linaro toolchain.
This turned out to be a problem in our extension elimination pass.
Submitted a merge request for that.
* Got confirmation from ARM that we should use relocation number 160
for R_ARM_IRELATIVE, and that it was OK to make the changes public
(thanks!). I've now submitted the binutils patches upstream.
I'll do the eglibc ones when I get back.
== Next week ==
Holiday!
Richard
Hello,
Testing the patch for SMS to support targets
that their doloop part is not decoupled from the rest of the loop's
instructions, as SMS currently requires.
The testing includes bootstrapping on ARM machines for c language
configured w and w\o --with-arch=armv7-a options and using"-O2
-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-auto-inc-dec
-funsafe-math-optimizations -mthumb" flags.
Thanks,
Revital
On Monday, I was asked to find out whether the fix for GCC Bugzilla
PR43137 was present in our source base.
I can confirm that it is *not* present.
Apologies for the delay.
Andrew
Hi All,
Up until now, I have had no choice but to test toolchain correctness on
A8 hardware. It made sense to use the same -mfpu settings as the
Linaro/Ubuntu package builds use. This did not match the policy that the
interesting platform was A9-NEON, but I didn't have that option.
That's changed now - our Panda boards have arrived! Yay! :)
But, it seems to me that if I change to using the Pandas for correctness
testing (not performance testing) then I won't be testing what Ubuntu
will actually use.
So what should I test on?
I'd rather not double my test load by testing on both, but that is an
option ....
Any suggestions?
Andrew
Hi,
On Mon, Feb 28, 2011 at 9:19 PM, Nicolas Pitre <nicolas.pitre(a)linaro.org> wrote:
> On Thu, 24 Feb 2011, John Rigby wrote:
>
>> The resulting kernel builds and boots but some modules have problems:
>>
>> $ modprobe fat
>> fat: unknown relocation: 102
>> FATAL: Error inserting vfat
>
> A workaround for what appears to be a binutils bug has been merged in
> linaro-2.6.38. So the Thumb2 kernel testing may resume on trusted
> targets.
Thanks for merging it.
It's a bit ugly to include turn off compiler optimisations to work
around this though, so we might encounter upstream to that patch.
In any case, we still need someone to take a look at the possible
tools issue -- CC'ing linaro-toolchain in case people aren't aware:
https://bugs.launchpad.net/binutils-linaro/+bug/725126
Cheers
---Dave
On 25 February 2011 22:28, Alexander Sack <asac(a)linaro.org> wrote:
> On Wed, Feb 23, 2011 at 8:28 AM, Jim Huang <jserv(a)0xlab.org> wrote:
>> I would like to make a proposal about utilizing Linaro toolchain for
>> Android and NDK (Native Development Kit)[1].
Added linaro-toolchai list in Cc.
>> ** Motivation
>>
>> There are some different perspectives between Linaro toolchain and
>> Google Android toolchain including technical and
>> non-technical considerations. It doesn't really work if we only
>> replace prebuilt toolchain with Linaro toolchain because
>> of the compatibility of Android system utilities such as ELF
>> prelinker. Also, since Android is developed in relatively closed
>
> I don't have enough background to understand this "ELF prelinker"
> stuff. Are you saying that because of the way how android links stuff
> we cannot have one code base for gcc that works for both, android and
> "normal libc linux"?
>
Take Bug #707487 for example:
https://bugs.launchpad.net/binutils-linaro/+bug/707487
It is evident that Android's system utilities like soslim ("strip"
implementation)
and apriori ("prelink" implementation) expect the specific output of
GNU Toolchain,
but it sometimes varies since we would take Linaro's toolchain.
>> environment (Google style open source model), a great amount of
>> software components are not always verified by different
>> toolchain or build configurations. This proposal attempts to
>
> ack. thats what we want to do. Of course, we cannot really verify what
> is going on behind closed walls, but we can continuously build android
> with our toolchain and fix issues due that in android public master
> and if even that doesn't work we can ensure that our android trees
> always work nicely with both, our gcc and android gcc.
>
Android team is known to work on this field already.
> Another thing is to make our toolchain easily consumable (like the NDK
> you mention at the bottom); this will increase chances that someone
> from google can eventually take a look at what we are doing etc. and
> also helps the community to use linaro toolchain to built their
> android distributions.
Agree.
>> establish the compact development flow to enable Linaro
>> optimized ARM toolchain to build Android from scratch and verify it
>> transparently. Eventually, Android can be the reference
>> indicator as Linaro toolchain performance and reliability.
>>
>>
>> ** Brief introduction to Google Android toolchain
>>
>> Inside Google, there is a dedicated compiler team working on GNU
>> Toolchain for various purposes including server-side
>> computing, Android, Chrome OS, etc. Google engineers submit patches to
>> upstream for public review and maintain the
>> toolchain for Android. Along with each Android Open Source Prokect
>> (AOSP) release, there is a special branch in korg
>> GIT [2] for hosting the GPL'd toolchain source code modified by
>> Google. Usually, file "README.google" mentioned the
>> summary, but it is not developer friendly because several changes were
>> done within one GIT commit.
>>
>> Please refer to wiki for details:
>> https://wiki.linaro.org/Platform/Android/UpstreamToolchain
>>
>
> thats a good wiki page. thanks for the content. If I read the skia
> example correctly, we could add a test to our "normal" abrek testsuite
> that uses our daily android toolchain and run the skia benchmark? e.g.
> we could start doing this benchmarking even without having a
> validation solution ready for android targets?
>
If adb is supposed to work well on target, then you can easily use "bench.py"
script mentioned in the above wiki to do several benchmarking.
> Please let's talk to Paul how we can get the android toolchain to
> /opt/android as part of abrek and lets try to add this to our abrek
> testsuites. Until we have daily toolchain builds it would be OK to
> download the android toolchain tarball from a fixed place from
> people.linaro.org I guess.
>
Ok!
>> ** What's wrong with Android upstream Toolchain?
>>
>> In my opinion, list as following:
>>
>> (1) Few information about Google improvement: Sometimes, we have to
>> guess something from implicit GIT commitlog
>> such as "commit gcc-4.4.3 which is used to build gcc-4.4.3 Android
>> toolchain in master"[3]. It is hard to track and get
>> verified carefully.
>
> yes, that feels like a messy situation. Do we know why they don't
> commit the changes as individual commits but then in next step
> document what they changed?
I have no exact idea since I am just an observer regarding Android's GIT tree.
Google engineers do send patches to FSF/GNU, but it is not always related to
the GIT activities we have seen in korg.
You can search the keyword, "submit", in file gcc/gcc-4.4.3/README.google , and
you will see some descriptions as following:
gcc/cp/cp-lang.c
gcc/gimple.c
gcc/langhooks-def.h
gcc/langhooks.h
gcc/langhooks.c
gcc/tree-flow.h
gcc/tree-ssa-dce.c
gcc/testsuite/g++.dg/tree-ssa/vptr_init_dse.C
gcc/testsuite/g++.dg/tree-ssa/vptr_init_dse2.C
Enhancing dead code elimination to eliminate
useless vptr field initialization.
Owner: davidxl
Status: not submitted
gcc/fold-const.c
gcc/Makefile.in
Fix 2045297
Owner: davidxl
Status: Not submitted
The information is too few to track for us since the above "Fix
2045297" tends to
indicate Google bug database number instead of FSF's.
>> (2) Google specific improvements are absent in recent release, only
>> enabled months later. For example, Google Compiler
>> Team Lead, Dr. Shih-wei Liao, presented the improvements against GNU
>> Toolchain in the middle of 2009.[4]. The report
>> came with several impressive improvements like FDO (Feedback Directed
>> Optimizations) and IPO (Inter-Procedure
>> Optimizations). However, only some of them are public to AOSP and be
>> integrated late in the middle of 2010 (Android
>> Froyo; 2.2). Even FDO was merged in Android Froyo already, but there
>> is few documentation and no robust method to verify
>> by community members such as Linaro engineers.
>
> you say that they don't publish the code for lets say the
> "gingerbread" toolchain in a timely fashion when they release
> gingerbread? Or do they ship a separate "fast" NDK/prebuilt for
> partners through secret channels?
I have no idea.
>> (3) For some reasons, Google tends to deliver stable (old) toolchain
>> plus mainline backport. It is a safe and workable approach,
>> but sometimes developers would expect to use the latest technologies
>> as Linaro aims to bring to the world.
>>
>> (4) Few readable documentation. For example, Google already open its
>> toolchain benchmark suite in early 2010, but there was
>> no document specific to such important components. Furthermore, there
>> was one file gone in public kog GIT, required by
>> automated benchmark process. One year later, Google engineer finally
>> put back the one to public. This implies the unusual way
>> Google developed and delivered software.
>>
>
> Assuming good faith I would think this might just have been an oversight.
>
> Do you know if anyone from community pointed this out to google using
> official android mailing lists/groups or a bug?
>
Google engineers sometimes pick up the issues from Google Code:
http://code.google.com/p/android/issues/list
And, they do discuss on mailing-list:
http://developer.android.com/community/
>> ** Linaro's Approach to enable latest technologies
>>
>> Linaro android team tries to do:
>> (1) Document Android toolchain and related utilities in korg GIT as
>> possible as we can.
>
> That's good stuff and I think your wiki page is already a great
> contribution in that direction. What we should do though is run this
> through google eyes early by using official android mailing lists.
>
Got it.
>> (2) Early adaptation of Linaro toolchain to Android build system and
>> verify these output systematically.
>
> ack. Do you know if those changes would be conflicting with what we do
> on "normal" linux side? e.g. do we need to maintain special android
> patches or can we merge those into our main trees?
>
In fact, GCC 4.6 already merges Android specific patches with the help of
CodeSourcery. We would initially backport these patches to linaro-gcc-4.5
branch for review. Luse Cheng already did it.
However, other parts are not related to Android directly, and they might be
too aggressive to generic GCC optimization, that can be the reason why Google
didn't submit first.
>> (3) Backport Google changes to Linaro GCC and review in public.
>
> This is really tricky as you said. Here again, we should propose this
> on android mailing lists to maybe get feedback from google team and
> maybe improve the way we work on that. Untangling a big patch based
> just on changelog feels really unefficient.
Ok, I got your point. However, what we need is to create workable combination
of Linaro kernel + Linaro toolchain for Android integration engineers.
Alexander, I need your help to catch the attention of someone at Google.
> Also, we have to remember that if we pick changes out of _their_ tree,
> we cannot upstream those to fsf because we don't own copyright to
> those. Of course, for stuff they already pushed to 4.6 its not a
> problem to backport them from fsf trunk.
Thanks for notice.
>> (4) Improve the deployment and validation flow by means of Linaro
>> infrastructure.
>
> my understanding is this:
>
> 1. we add support to build android toolchain from linaro branches to
> our cloud build service
> 2. we do this so that we either produce a full toolchain tarball that
> can be installed under /opt/android or a NDK tarball (or both)
NDK doesn't need admin permission to install.
> 3. we improve our android platform build infrastructure to allow
> using latest daily toolchain tarball and then we build android with a)
> google toolchain and b) linaro toolchain; in this way we get daily
> android builds for both toolchains that can go into the linaro
> validation farm and get the typical validation/testing and
> benchmarking done.
Yes, it would be great.
>> (5) Build and test Android system with Linaro tools. Then, figure out
>> the regressions caused by Linaro Toolchain and/or
>> aggressive optimizations
>
> right. I think that's covered with the point above, no? The android
> builds done with our toolchain would also be available in public, so
> you can do whatever you want on top of what we already
> test/validate/measure automatically in the validation farm with them.
Agree.
>> (6) measure performance gain by Linaro tools
>
> right. for this we need to define a set of open-source benchmarks to
> run and ensure that those are supported in our validation framework.
>
>> The detailed specification in wiki:
>> https://wiki.linaro.org/Platform/Android/Specs/LinaroAndroidToolchain
>>
>> ** Implementation of Linaro toolchain for Android
>>
>> We started from Android style toolchain build and move to Linaro GCC +
>> ARM specific optimizations in mind. The initial work
>> can be obtained by wiki:
>> https://wiki.linaro.org/Platform/Android/Toolchain
>>
>> We plan to maintain the following GIT repositories at least:
>> * android/toolchain/build.git : Linaro-aware build system. Derived
>> from Android toolchain build system, it can handle Linaro-GCC
>> and Linaro snapshot/bzr.
>> * android/toolchain/gcc-patches.git : Patchset to be applied on top
>> of Linaro-GCC release/snapshots
>
> I think thats fine. however, how do we ensure that we have patches
> that always apply to both release/snapshots? do we maintain branches
> for gcc-patches.git in case you need two versions of patch X if the
> linaro gcc codebase diverged?
I might need help from toolchain WG.
>> The reference builder script output:
>> $ ./linaro-build.sh --help
>> --prefix-dir= Specify where to install (default:
>> /tmp/android-toolchain-eabi)
>> --gcc-src-dir= Specify where linaro gcc source is (in <toolchain>/gcc)
>> --apply-gcc-patch=(yes|no) Apply-patch which in
>> <toolchain>/gcc-patches directory (default: no)
>>
>> Current verified combinations:
>> * gcc-linaro: 4.5-2011.02-0
>> * binutils: 2.20.1
>> * gmp: 4.2.4
>> * mpfr: 2.4.1
>>
>> Only gcc is replaced by gcc-linaro: 4.5-2011.02-0 and others are
>> checked out from korg GIT.
>
> do we need to do something like --gcc-src-dir and -patches for
> binutils, gmp and mpfr as well? or would we be only interested in
> improving/fixing gcc for now?
>
I think focusing on linaro-gcc is pretty good. We can follow the
original combination
of Google.
> Waybe we also want to support protocol schemes like git: http: and
> bzr+ssh:/lp: for the --gcc-src= argument. this would then
> automatically download/branch the source tree from the given location.
> What do you think?
Agree.
>> ** Summary of gcc-patches
>>
>> "gcc-patches" are used as "backport" from Google changes into Linaro
>> gcc base. Here is the summary at present:
>>
>> 0001-Add-linux-android.patch
>> Add linux-android
>>
>> 0002-Add-support-for-Bionic-C-library.patch
>> Add support for Bionic C library
>>
>> 0003-Support-compilation-for-Android-platform.patch
>> Support compilation for Android platform
>>
>> 0004-Add-multilib-configuration-for-arm-linux-androideabi.patch
>> Add multilib configuration for arm-linux-androideabi
>>
>> 0005-Fix-gthr-posix.h-to-support-Bionic.patch
>> Fix gthr-posix.h to support Bionic
>>
>> 0006-Add-untested-support-for-Bionic-to-libstdc.patch
>> Add [untested] support for Bionic to libstdc++
>>
>> These patches are taken from Maxim Kuvyrkov of CodeSourcery in gcc-4.6
>> branch. Of course, we can always add changes by
>> Google or other Android specific adaptation by this model.
>
> Can we get a toolchain example tarball done and uploaded to
> people.linaro.org? I would like to verify that those work out of the
> box with gingerbread and if so, i would like to see those land in the
> main toolchain WG branch rather than adding them to our gcc-patches
> tree.
Yes, I would like to do that later.
>> ** Planned improvements over Linaro toolchain for Android
>>
>> (1) GCC multilib setting
>> Default: arm, fpu and thumb. The prebuilt google toolchain use:
>> armv5te and mandroid. We should focus on ARMv7.
>> (2) HardFP-ABI Support for Android.
>> (3) Patch management: Better to get the Android patches into
>> Linaro-GCC tree eventually.
>> (4) Build system improvement. Don't have to build gmp, mpfr everytime,
>> and provide option to build without gdb.
>> (5) Enable LTO (Link Time Optimization, introduced since gcc-4.5) in
>> Android TARGET_GLOBAL_CFLAGS
>> (6) Verify the functionality of FDO (Feedback Directed Optimization)
>> and introduce the approaches to integrate.
>
> I really think those topics should be executed by the toolchain WG
> rather than in platform. I am happy that we give them guidance and
> support them by providing them with easy to use tools to get their job
> done. Also feeding them with topics is great. Please talk to Michael
> Hope and ask him how he wants to collect those android toolchain
> optimization topic ideas. Could be good input for our 11.11
> requirements gathering process.
Agree.
>> ** Toward Android NDK
>>
>> Once Linaro toolchain for Android is ready to use, it is time to
>> re-package Android NDK by Linaro toolchain. To do that, extra
>> build configuration, sysroot, is required. According to Android
>> Release Cycle & Phases[5], the repacked NDK should be verified
>> one moth after Android public release.
>
> That sounds like a great idea. What's the a benefit/difference of
> shipping an NDK compared to just shipping a "normal" toolchain binary
> tarball for this purpose?
NDK consists of some architecture specific helper scripts/headers to indicate
the optimization flags and some combinations such as ARMv7 with/without
NEON, etc.
If we provide NDK directly, users don't have to consider the above integration
issues as far as I know.
Sincerely,
Jim Huang (jserv)
Android Team
Temporarily took over Tech Lead of the Toolchain Working Group while
Michael Hope recovers from the Christchurch earthquake. (He's fine, but
unable to work.) This didn't actually require any action, in the end.
Michael returned to work towards the end of the week.
Forward ported, benchmarked, and posted one of Mark Shinwell's NEON
patches upstream.
Further benchmarking was not possible as the Panda board I was using is
located in Christchurch, NZ.
Merged and tested the FSF GCC 4.5 branch into Linaro GCC. There were a
couple of test regressions in the fortran testsuite, so I've filed bug
lp:723086. The other test results were either the same or better.
Benchmarked the ARM A8 function/jump alignment patch to see what effect
it has in GCC 4.6. Found no measurable improvement in EEMBC. I suggest
dropping this patch.
Brought the patch tracker up-to-date, and entered tracking tickets for
all outstanding patches.
Merged FSF trunk to Linaro GCC 4.6.
Committed Jie's Thumb2 testcase fix to FSF GCC trunk. Thanks to Ramana
for using his new found authority to approve it.
Investigated the suitability of several of the patches for
forward-porting. Corresponded with Benrd and Julian.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* Kazu's VFP testcases:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00128.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
== Last week ==
* Launchpad #721021 GCC ICE on ARM/XScale: identified as case of
upstream PR45177; backported and pushed to Linaro.
* Launchpad #709453/CS Issue #7122: Neon vmov 0.0 issues; some progress
on my current WIP patch, but tests showed another 3 regressions, still
on-going.
* Launchpad #711819/GCC PR47719: ICE in push_minipool_fix. Ramana
reminded that my patch, which added some pool range attributes, were
actually removed earlier by Bernd in the fix for PR43137. Discussed and
mostly concluded that we should add them back for now. Will re-submit
patch with testcase to gcc-patches this week.
* Coremark ARMv7-A regressions: still work in progress.
== This week ==
* TW Public Holiday Feb.28 (Mon).
* Ping some of my upstream patch submissions.
* Get incompleted issues done.
* Coremark regression investigation.
Hello Linaro toolchain guys,
I have a few questions regarding GCC fully supporting the ARM Cortex M4,
I'm especially thinking of the additional DSP instructions and if these are supported and how optimal the code being produced is?
Thanks for your support,
Best Regards
Christian (ST-Ericsson)
Hi,
== Investigate developer tools ==
* Finished latrace investigation.
== PandaBoard ==
* The defective PandaBoard that was sent back in December is now repaired and
on my desk again. It doesn't show the behaviour of #708883 and works
flawlessly so far. :)
== libunwind ==
* Did some debugging of the test-async-sig testcase to get started with
libunwind. It will dead-lock if you add "--enable-debug" since libunwind does
printfs in this case which are not signal safe.
* Sorted out which of Zachs patches are upstream and which are not.
* Started to learn about the different unwind methods that libunwind provides
on ARM.
Regards
Ken
== ffi ==
* Sent variadic patch for libffi to libffi-discuss
* Worked through some suggestions from Chung-Lin, need to do some rework
== string routines ==
* memchr & strchr patch sent for inclusion in ubuntu packages
* tried sqlite's benchmarks - they don't spend too much time in the
C library; although
a few % in memcpy, and ~1% in memset (also seem to have found an
sqlite test case failure on
ARM and filed as bug 725052)
== porting jam ==
* There wasn't much traffic on #linaro during this related to the jam
* I closed bug 635850 (fastdep FTBFS) which was already fixed with
an explicit fix for ARM in the changelog
and bug 492336 (eglibc's tst-eintr1 failing) which seems to work now
but it's not clear when it was fixed.
* Looking at eglibc's test log there seem to be a bunch of others
that are failing and may well be worth investigating.
* bug 372121 (qemu/xargs stack/number of arguments limit) seems to
work ok, however the reporter did say it was quite a fragile test;
that needs more investigation to see
whether the original reason has actually been fixed.
== misc ==
* swapping notes with Peter on the PBX SD card investigation
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
== maintain-beagle-models ==
* rebased qemu-linaro on upstream
* checked omap_uart model for any issues with enabling the extended
(non-16550A) features which the new Linux drivers need. Sent meego
merge request for patchset which turns on the features, and does
a little cleanup. Now in meego, qemu-linaro.
== merge-correctness-fixes ==
* reviewed versions 5 and 6 of Christophe's vrecpe/vsqrte patchset;
v6 was good and has now been committed
* sent a version of "dummy cp14 debug registers" patch upstream;
however I've realised it triggers a false positive in the
temp-leak debugging code in target-arm/translate.c
* wrote/sent a patch which moves this temp-leak debugging code
into TCG proper (which I think makes it much simpler and cleaner
and avoids the false positives mentioned above)
* some work on the cp15 performance counter registers. I now
have some code which I think is a fully architecturally valid
implementation of an "implements no events" core, except that
we don't implement the cycle count register.
* started testing/review of Adam's VA-to-PA translation regs patch.
In the course of this discovered that qemu unconditionally
implements an ARM940 cp15 WFI register which clashes with these;
submitted patch to add correct not-for-v6/v7 feature gating.
* sent out patch fixing usermode seeks by 32 bit guest on 64 bit
host (based on a diagnosis and suggested fix by Eoghan Sherry)
* sent patch fixing compile error in vnc code
== vexpress model ==
* sent a patchset for fixing the MMC card detect wiring on
PBX upstream; this is needed for vexpress too
* finished vexpress cleanup and cross-checking against the docs; I
now have a patchset I'm happy to upstream and will post next week
== other ==
* took part in pgp keysigning event with emdebian folks
* meetings: toolchain, PDSW-tools
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
== GDB ==
* Worked with Will Deacon and the Linaro kernel team to
make sure HW watchpoint and Versatile Express errata
fixes are included in the upcoming Linaro kernel release.
* Committed GDB HW watchpoint patches to mainline, and
backport to Linaro GDB. This completes work on the
HW watchpoint blueprint.
* Worked on fixing the GDB part of #620611 (Unable to
backtrace out of vector page 0xffff0000). Posted
(two versions of) mainline patch for discussion.
* Worked on kernel patch for #615974 (Interrupted system
call handling).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== This week ==
* Looked at the poor code generated for Neon load/store intrinsics.
Looked into the history behind the treatment of VFP registers by
CANNOT_CHANGE_MODE_CLASS. Peter confirmed that the restrictions
apply only to VFPv1. Wrote a patch to improve the code, which
partly overlapped with Julian's.
* Looked at how the operations should be represented at the tree level.
Experimented with various combinations of tree codes and types
to see which felt right. Wrote this up in the message I sent today.
== Next week ==
* More vectorisation.
* Submit some queued patches.
* Maybe some bug fixing. (I see there's a reload bug just waiting
to be claimed by a lucky developer.)
On holiday the following week.
Richard
Services at ex.seabright.co.nz are back up.
On Tue, Feb 22, 2011 at 10:06 PM, Michael Hope <michael.hope(a)linaro.org> wrote:
> Hi there. We've had an earthquake. Family and friends are fine but i'll be
> unavailable for a few days. Services on ex.seabright.co.nz are down. I'll
> cancel Wednesdays standup call.
>
> See you soon,
>
> -- Michael
Hello,
Implemented a patch for SMS to support targets that their doloop part is
not decoupled from the rest of the loop's instructions (which is the
current assumption of SMS). ARM is an example of such target, where the
loop's instructions might use CC reg which is used in the doloop part.
Now testing the patch on ARM and other targets that have do-loop.
Thanks,
Revital
Hi,
* vectorizer cost model
- implemented builtin_vectorization_cost for NEON
- added register spilling considerations to the cost model
- started testing/tuning on EEMBC Telecom and DenBench (for now I
have only two examples for spilling: fdct_int32 mp4encode that
shouldn't get vectorized and viterbi that should)
* measured vectorization impact on Telecom autcor - it's about 5x
(initially I got run time segfault, but the bug is already fixed on
GCC trunk, I'll have to check gcc-linaro-4.5 as well)
* NEON-vs.non-NEON degradation
- started to look at aes. There are 6 loops that get vectorized with
4.6 (due to this patch
http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01927.html that allows
cond_expr in number of loop iterations expressions) and vzip/vuzp
patch, but not with gcc-linaro-4.5. But it doesn't explain the
degradation of course.
- I don't understand mp4decodepsnr improvement, since I don't see
any loops or basic blocks vectorized.
Ira
One of the vectorisation discussions from last year was about the poor
code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces the
result of the loads onto the stack, then loads the individual pieces from
there. It does the same thing in reverse for stores.
I think there are two major problems here:
1. The result of the vld*() is a record type such as:
typedef struct int16x4x3_t
{
int16x4_t val[3];
} int16x4x3_t;
Ideally, we'd like one of these structures to be stored in a pseudo
register. However, the ARM port currently limits in-register
record types to 64 bits, so something this big is always given
BLKmode and stored on the stack.
A simple "fix" for this is to increase MAX_FIXED_MODE_SIZE.
That would do the right thing for the structures in arm_neon.h,
but wouldn't be safe in general.
2. The vld*() returns values as a single integer (such as EI mode),
while uses of the value will typically be in a vector mode such
as V4SI. CANNOT_CHANGE_MODE_CLASS doesn't allow direct
"mode-punning" between the two in VFP_REGS, so this again
forces the punning to be done on the stack.
The code in question is:
/* FPA registers can't do subreg as all values are reformatted to internal
precision. VFP registers may only be accessed in the mode they
were set. */
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
(GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
? reg_classes_intersect_p (FPA_REGS, (CLASS)) \
|| reg_classes_intersect_p (VFP_REGS, (CLASS)) \
However, the VFP restriction appears to be specific to VFPv1 --
thanks to Peter for the archaeology -- and isn't a problem for v6+.
In that case, removing this restriction is an important optimisation.
I tried the patch below on the following simple testcase:
#include "arm_neon.h"
void
foo (uint16_t *a)
{
uint16x4x3_t x, y;
x = vld3_u16 (a);
y = vld3_u16 (a + 12);
x.val[0] = vadd_u16 (x.val[0], y.val[0]);
x.val[1] = vadd_u16 (x.val[1], y.val[1]);
x.val[2] = vadd_u16 (x.val[2], y.val[2]);
vst3_u16 (a, x);
}
(not necessarily sensible!). Before the patch, -O2 produced:
sub sp, sp, #48
add r3, r0, #24
vld3.16 {d16-d18}, [r3]
vld3.16 {d20-d22}, [r0]
add r3, sp, #24
vstmia sp, {d20-d22}
vstmia r3, {d16-d18}
fldd d19, [sp, #8]
fldd d16, [sp, #0]
fldd d17, [sp, #24]
fldd d20, [sp, #32]
vadd.i16 d18, d16, d17
vadd.i16 d17, d19, d20
fldd d19, [sp, #16]
fldd d20, [sp, #40]
vadd.i16 d16, d19, d20
fstd d18, [sp, #0]
fstd d17, [sp, #8]
fstd d16, [sp, #16]
vldmia sp, {d16-d18}
vst3.16 {d16-d18}, [r0]
add sp, sp, #48
bx lr
After the patch we get:
vld3.16 {d24-d26}, [r0]
add r3, r0, #24
vld3.16 {d20-d22}, [r3]
vmov q8, q12 @ ti
vadd.i16 d17, d17, d21
vadd.i16 d16, d24, d20
vadd.i16 d18, d26, d22
vst3.16 {d16-d18}, [r0]
bx lr
The VMOV is a bit disappointing, and needs further investigation.
The first hunk fixes (2), and I think is correct. The second hunk
hacks (1), and isn't suitable in itself. I'll next try to make
arm_neon.h use built-in record types that are explicitly EImode,
which should remove the need to change MAX_FIXED_MODE_SIZE.
Richard
Index: gcc/gcc/config/arm/arm.h
===================================================================
--- gcc.orig/gcc/config/arm/arm.h
+++ gcc/gcc/config/arm/arm.h
@@ -1171,10 +1171,12 @@ enum reg_class
/* FPA registers can't do subreg as all values are reformatted to internal
precision. VFP registers may only be accessed in the mode they
were set. */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
- (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
- ? reg_classes_intersect_p (FPA_REGS, (CLASS)) \
- || reg_classes_intersect_p (VFP_REGS, (CLASS)) \
2+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+ (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \
+ ? (reg_classes_intersect_p (FPA_REGS, (CLASS)) \
+ || (TARGET_VFP \
+ && reg_classes_intersect_p (VFP_REGS, (CLASS)) \
+ && arm_fpu_desc->rev == 1)) \
: 0)
/* The class value for index registers, and the one for base regs. */
@@ -2458,4 +2460,6 @@ enum arm_builtins
instruction. */
#define MAX_LDM_STM_OPS 4
+#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (XImode)
+
#endif /* ! GCC_ARM_H */
Hi there. We've had an earthquake. Family and friends are fine but i'll be
unavailable for a few days. Services on ex.seabright.co.nz are down. I'll
cancel Wednesdays standup call.
See you soon,
-- Michael
== GDB ==
* Working with Will Deacon, identified root cause of GDB
problems running on Versatile Express in SMP mode, and
verified that Errata workaround fixes the problem
* Finished testing GDB HW watchpoints patch on vexpress,
submitted complete patch set for mainline inclusion
* Reviewed Yao's mainline patch to enable displaced
stepping in Thumb mode
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== Last week ==
* PR46178, PR46002: both upstream issues related to the priority
coloring mode of IRA. Both patches submitted, the first already approved
and committed. Vladimir M. did mention that the priority algorithm
would be removed once his newer "cover class-less" patches goes in
during stage1. Anyways, I got more familiar with IRA during the process,
and the patches will still be applicable to 4.5/4.6.
* PR43872: incorrectly aligned VLAs under ARM. This turned out to be a
one-liner fix. Submitted upstream awaiting approval.
* Discussed on email/IRC with Revital Eres on SMS and ARM doloop pattern
issues.
* Launchpad #721021: Linaro GCC ICE under -mtune=xscale. Investigated a
bit; did not see ICE immediately, but GCC went into infinite loop (Khem
Raj, the reporter, says it runs for a while then ICEs).
* Coremark ARMv5TE vs ARMv7-A performance regression: reproduced
consistently using our own Tegra boards. Investigated and seem to have
found something, will post more detailed findings later.
== This week ==
* Coremark investigation.
* More GCC issues.
== GCC ==
Posted 2 of our 4.5 patches upstream.
My latest 4.6 build and test completed, so I've pushed an update to the
bzr branch. The branch is now up to mainline state as of the 12th.
Merged 3 4.5 patches into Linaro GCC 4.6. Upstream review isn't
happening, so I've decided to commit them anyway. The last upload (FSF
mainline as of 12th Feb) will therefore become the baseline I'm going to
use for Linaro GCC 4.6.
Begun benchmarking the questionable patches before forward porting them,
using EEMBC. Michael Hope has given me access to one of his A9 Panda
boards in New Zealand. This ought to have been straight-forward, but of
course it wasn't. It took me a while to convince myself I was getting
meaningful results and testing the right thing. Also the A9 seemed to be
able to complete the configured iterations in 'zero' time, which fooled
me for a while. I think I now have a set up that works. It seems to run
very slowly sometimes though - something to do with SSH?
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* Kazu's VFP testcases:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00128.html
* Jie's thumb2 testcase fix:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00670.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
RAG:
Red:
Amber:
Green: DATE/QEMU conference place confirmed, travel booked
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
* maintain-beagle-models:
+ implemented missing epoll syscalls for qemu usermode,
submitted upstream
https://bugs.launchpad.net/qemu-linaro/+bug/644961
+ tracked down the problem causing serial console to break:
the new Linux driver uses some extra features of the UART
which we weren't modelling
https://bugs.launchpad.net/qemu-linaro/+bug/714600
* merge-correctness-fixes:
+ reworked VZIP/VUZP patch as per review comments, resubmitted
+ reviewed CL's latest shift patches, added fixes of my own for
large shift counts and overlapping src/dest regs, submitted
a 10 patch rolled up series
+ reviewed a patch for adding cp15 VA-PA translation ops
+ reviewed various versions of vrecpe/vsqrte patches from CL
* versatile-express model:
B Labs kindly made available their Versatile Express board model:
https://github.com/bbalban/qemu/commits/universal-branch
and I've spent a few days getting it to boot a Linaro kernel,
fixing a few bugs and cleaning up the patchset in preparation
for upstreaming it.
This included discovering a bug in qemu's SD card model which
was causing Linux not to be able to detect cards on PL181,
and resulting in spurious qemu warnings on omap3:
https://bugs.launchpad.net/qemu-linaro/+bug/714606
* other:
+ ARM architecture Q&A for modelling engineers
+ booked travel/hotel for QEMU conference
* meetings: toolchain, PDSW-tools, PD comms, Linaro-in-ARM network
infrastructure, pdsw-doughnuts and 1st birthday celebration,
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hi,
* continued to look into latrace and found an issue in case a dynamic
library gets unloaded. Otherwise latrace looks quite good on ARM.
https://wiki.linaro.org/KenWerner/Sandbox/latrace
* chasing bugs:
- After a lot of testing Andy Green has made a big step forward in
finding the root cause for the shut-down issue of my PandaBoard.
The PMIC is seeing an overcurrent and issues an interrupt that gets
ignored by current kernels. Then the PMIC shuts the board down for
safety reasons. As a workaround Andy has made a kernel patch for the
twl6030 driver that enables all interrupt sources. The kernel will
acknowledge the overcurrent reported by the PMIC and the board survives.
A patched kernel binary can be found at:
https://wiki.linaro.org/KenWerner/Sandbox/708883
- While testing Andys patches on the linaro natty kernels I ran into
https://bugs.launchpad.net/bugs/720055
- The flash-kernel utility doesn't work on the PandaBoard because the
subarch check expects omap4 instead of omap:
https://bugs.launchpad.net/bugs/721147
- Looked into the apr fail (process shared mutex's fail on armel v7).
Their mutex functionality can be mappped to various methods, but only
pthread is of interest here. The code relies on pthread_mutex_lock and
pthread_mutex_trylock which is implemented by the (e)glibc. The c library
uses GCCs __sync primitives if eglibc >= 2.12.1-0ubuntu11 and GCC >=4.5.
The testprocmutex testcase passes now.
https://bugs.launchpad.net/bugs/604753
Regards
Ken
"Will Deacon" <will.deacon(a)arm.com> wrote on 02/16/2011 01:07:09 PM:
> > I've now built a kernel with CONFIG_ARM_ERRATA_720789 enabled, and the
> > symptoms indeed seem to have disappeared completely ...
>
> Yup - that's because without it, invalidating a TLB entry for a
particular
> process isn't broadcast correctly, so you can end up using the old
(pre-COW)
> mappings if you're running on a different core.
OK. So I guess the only remaining questions is: if this hardware needs the
errata fix to work properly, shouldn't it be automatically selected by the
kernel configure logic? Note that this appears to happen for certain OMAP
boards, see arch/arm/mach-omap2/Kconfig:
config ARCH_OMAP4
bool "TI OMAP4"
default y
depends on ARCH_OMAP2PLUS
select CPU_V7
select ARM_GIC
select PL310_ERRATA_588369
select ARM_ERRATA_720789 <<=====
select USB_ARCH_HAS_EHCI
But this does not happen for the vexpress; arch/arm/mach-vexpress/Kconfig
has only:
config ARCH_VEXPRESS_CA9X4
bool "Versatile Express Cortex-A9x4 tile"
select CPU_V7
select ARM_GIC
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hello,
* Continue looking into DENbench benchmarks.
* While testing SMS I realized that my current implementation of doloop
pattern for ARM does not follow SMS's requirement to have the doloop
instructions be decoupled from the other loop's instructions. This happens
because doloop uses CC register which might be used elsewhere in the loop.
I am looking into a solution for that.
Thanks,
Revital
Hi,
This week I looked into DENBench:
* sad8_c (hot function from mp4encode) needs SLP reduction, but it
also contains cond_expr which cannot be vectorized as reduction, so I
don't think there is anything I can do here
* fdct_int32 (another hot function from mp4encode) now gets vectorized
with vzip/vuzp patch, but the vectorization causes performance
degradation here because of multiple register spills. I also noticed
that vectorizer costs are not set for NEON, i.e., it uses default
costs. So, I am now working on costs for NEON and adding registers
consideration into vectorizer's cost model.
I also did some general vectorization research, checking opportunities
of collaboration with GRAPHITE pass and auto-parallelization.
Ira
I mentioned in the toolchain standup call that I'd done a quick
estimate of the work required to support vexpress, so I thought I
might as well clean it up a little and post it.
This is a quick summary and time estimate for adding Versatile
Express support to qemu. The general idea is that most of the
components on this board already have QEMU implementations
(since they're standard ARM primecells used in versatile/realview),
and we can live without the few major components that aren't
implemented (maybe we'd need dummy implementations if the
kernel prods them on startup.)
Components already supported by QEMU:
-------------------------------------
A9MPx4
PL050 keyboard, mouse
SMCS LAN9118 ethernet
PL011 UARTs
SP804 timers
Components with a near match in QEMU:
-------------------------------------
PL111 CLCD -- qemu has a PL110
PL180 MMC card -- qemu has a PL181
-- both cases should either just work or be fairly trivial tweaks
Components not supported by QEMU:
---------------------------------
PL041 audio
compact flash
two-wire serial bus (for PCI-express switch config and DVI-I displays)
ISP1761 Philips USB controller
User switches and LEDs -- vexpress specific, but trivial to do
Components where a dummy implementation should be sufficient:
-------------------------------------------------------------
PL310 L2 cache controller
PL341 dynamic memory controller
PL354 static memory bus controller
trustzone controllers
Other required work:
--------------------
The usual knitting for interrupts, clocks, reset etc etc.
Summary
-------
Assuming we're happy not to worry about support for
audio, USB, two-wire serial bus or compact flash, this
is about two weeks work to put together, test and get
a more-or-less upstreamable patchset from. This would
produce a platform hopefully at least as usable as
versatile, but with an A9 and 1GB RAM.
-- PMM
"Will Deacon" <will.deacon(a)arm.com> wrote on 02/14/2011 11:30:45 AM:
> > - In testing on Versatile Express, I noticed what appears to be SMP
> > related bugs in handling regular software breakpoints: occasionally,
> > software breakpoints simply are not hit and execution continues as if
> > the underlying code had not been changed at all. This symptom
> > completely goes away if GDB and the debugged process are forced to
> > the same CPU using the affinity feature (e.g. with schedtool).
>
> I've seen this issue in the past but I thought I'd fixed it. What kernel
are
> you using and do you have CONFIG_ARM_ERRATA_720789 enabled?
I'm using the 2.6.37-1002-linaro-vexpress kernel from the Linaro package
of the same name. This does *not* have CONFIG_ARM_ERRATA_720789 enabled
(presumably because the mach-vexpress/Kconfig file does not add it?) ...
> > My guess, just from seeing those symptoms, would be that when
inserting
> > a software breakpoint via ptrace, not all i-caches on all CPUs are
> > reliably flushed ... Any thoughts on this?
>
> There was an I-cache aliasing problem in the kernel coupled with a TLB
> invalidation hardware bug on the versatile express. I fixed these though
> and haven't seen any problems since.
Hmm, a TLB flush problem could also explain the symptom (because the write
of the breakpoint to the text section causes a copy-on-write operation
which
installs a new page ...)
I'll try rebuilding the kernel with the above config option enabled.
> Hmmm, I'll need to have a think about this. What does GDB do if it
receives
> a SIGTRAP with si_addr set to (potentially) complete nonsense? As an
aside,
> Cortex-A15 reports the faulting address for a watchpoint correctly, so we
> will be able to use multiple watchpoints there.
The GDB common core can handle either of the following two indications:
A) The (read/write/access) watchpoint at address XXX triggered.
B) A write watchpoint may have triggered at some address.
In the case of B, GDB will scan all the write breakpoints it is currently
tracking and compare the current value at that address with the last value
it remembers being present there. Any changes GDB sees will cause it to
report the corresponding watchpoint as triggered.
As far as the kernel interface is concerned, the important issue that the
ARM native target in GDB is able to understand what the kernel reports, so
it can in turn report either case A or B to the common core.
This means as long as there is some way for GDB to understand the kernel
is reporting a write watchpoint hit at an unknown address, everything is
fine. This could be done e.g. be reporting a "slot" zero in si_errno to
indicate the slot (and then also the address) triggering the watchpoint
is unknown ...
> > - Finally, I noticed when reading kernel code that under some
> > circumstances, the kernel will automatically do a single step to
> > get off a watchpoint that was just hit. However, this does not
> > happen for user-space watchpoints installed via ptrace, right?
> > (Just wanting to confirm; since GDB currently does that single
> > step itself -- we don't want *both* kernel and GDB to issue a
> > single step each ...)
>
> If the {break,watch}point has been inserted via ptrace, the kernel will
> send a SIGTRAP instead of stepping the instruction.
OK, thanks for the confirmation!
> > I haven't gotten to looking further into other hardware (IGEP,
> > Panda) -- that's next on the list.
>
> Good stuff, keep me posted if you see any further problems!
Sure, will do!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hello, my fellow ARM aficionados!
The Linaro Developer Platform Team is pleased to announce a new initiative
to help improve the state of software on ARM: the ARM porting jam. Starting
today, February 16th, we will be running a weekly IRC jam on Wednesdays from
1400-1800 UTC to bring developers together to work on all manner of
userspace porting bugs, with the aim of fixing portability issues and
getting the fixes delivered to our upstreams.
An initial porting queue of known issues can be found here:
https://bugs.launchpad.net/ubuntu/+bugs?field.tag=arm-porting-queue
Interested in making the software in Ubuntu run better on ARM? Stop on by
the #linaro channel on irc.linaro.org today!
--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
slangasek(a)ubuntu.com vorlon(a)debian.org
"Will Deacon" <will.deacon(a)arm.com> wrote on 02/11/2011 10:13:01 AM:
> I don't have a pandaboard, so I'd be interested to see if the code
> works there. I developed it using ARM boards, so the versatile express
> is a known good target.
I've now got it working reliably on on Versatile Express, after fixing
a couple of bugs on the GDB side (both in the HW-watchpoint patch, and
in common GDB code). The testsuite now passes with no regressions when
enabling HW watchpoints, except for two tests that require more than one
single watchpoint to be supported.
This raises another couple of issues/questions, however:
- In testing on Versatile Express, I noticed what appears to be SMP
related bugs in handling regular software breakpoints: occasionally,
software breakpoints simply are not hit and execution continues as if
the underlying code had not been changed at all. This symptom
completely goes away if GDB and the debugged process are forced to
the same CPU using the affinity feature (e.g. with schedtool).
My guess, just from seeing those symptoms, would be that when inserting
a software breakpoint via ptrace, not all i-caches on all CPUs are
reliably flushed ... Any thoughts on this?
- As mentioned above, the kernel currently only supports one single
watchpoint to be active at a time, even though hardware might support
multiple ones. The reason seems to be that when a watchpoint triggers,
the kernel cannot figure out which one it was (if there's more than one
choice).
This is a bit unfortunate, given that GDB will attempt to insert two
or more watchpoints in many interesting cases (e.g. a "watch *p"
command will insert *two* low-level watchpoints, one at the address
of p, and one at the address where p (currently) points to).
In addition, for regular (write) watchpoints, GDB does not actually
*require* the underlying hardware/kernel to specify which watchpoint
was hit; GDB is able to find out by itself by checking whether the
values at any of the currently active locations actually changed.
(For read/access type watchpoints, GDB does require that underlying
support -- but those are much more rarely used anyway.)
Do you see any chance of improving upon the current behaviour?
- Finally, I noticed when reading kernel code that under some
circumstances, the kernel will automatically do a single step to
get off a watchpoint that was just hit. However, this does not
happen for user-space watchpoints installed via ptrace, right?
(Just wanting to confirm; since GDB currently does that single
step itself -- we don't want *both* kernel and GDB to issue a
single step each ...)
I haven't gotten to looking further into other hardware (IGEP,
Panda) -- that's next on the list.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== Linaro GCC 4.5 ==
Re merged all the patches I've had to back out of Linaro GCC due to
various test failures. I've now found all the extra fixes/patches
necessary to make them go ... I think. Tested the build and test on ARM
and x86_64.
== Linaro GCC 4.6 ==
Continued getting the 4.5 patches forward ported to 4.6. I now have
about 4 patches waiting for review upatream, or ready to be posted.
Upstream review isn't happening though. This partly due to GCC being in
stage 4, but mostly due to Richard Earshaw being on sabatical, and the
other maintainers being inactive. I can see that I'm going to have to
abandon my hopes of only merging to Linaro GCC once it's been approved
upstream, and be content with merging to Linaro once it's posted upstream.
Started another test to rebase the Linaro 4.6 branch with the latest
from upstream. Once that's done, I think I'll start merging my changes
in, and call that our baseline. (There'll still be merges from upstream,
but the history will diverge.)
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* Kazu's VFP testcases:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00128.html
* Jie's thumb2 testcase fix:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00670.html
== Week of Jan.31st--Feb.6th ==
* Vacation, Chinese New Year Holiday.
== Last week ==
* Monday (Feb.7th), last day of vacation.
* LP #711819, ICE in push_minipool_fix: this turned out to be a simple
case where a memory load alternative was not tagged with the minipool
range attributes. Patch sent upstream, awaiting approval.
* LP #709453, wrong code generated for NEON. Tracked this down and
mostly know how to fix this, but discussion with Ramana brought the
issue up that the entire idea of using NEON vmov.i32 for loading VFP
constants may not be good for A9, and unclear for A8. We probably should
just revert the patch from the Linaro tree for now.
* PR46002, IRA internal compiler error with -fira-algorithm=priority.
Been looking at this as a part of my background IRA studies. Have a
possible patch for this, plus found another assert fail ICE under ARM.
Will see if can post upstream this week.
== This week ==
* Continue to look at above unfinished issues, as well as other new ones.
== GDB ==
* Installed 2.6.37 Linaro kernel on IGEP and Versatile Express
in order to verify support for HW breakpoints/watchpoints
* Tested GDB HW watchpoints patch, fixed several bugs in the
patch and core GDB, and got it working reliably on vexpress
* Started discussion with Will Deacon (ARM) regarding possible
further enhancements to related kernel support
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== String routines ==
* Copied an improvement I'd previously made to memchr (removing a
branch using a big IT block) to strlen
* Modified benchmark setup to build everything as a library to
fairly give everything a PLT overhead.
* Pushed optimised memchr and strlen and simple strchr into
cortex-strings bzr repo
* Patched eglibc to use memchr and strchr code - although currently
fighting to get appropriate .changes file
== ffi ==
* Kicked off TSC request for license permissions
== bugs ==
* Built and recreated the qt4-x11 bug, produced all the dumps and
boiled it down to a few lines of suspicious RTL for Richard.
** Away next week.
== GCC ==
* Finished testing fix for lp:709329 and got that merged.
* Wrote up a plan for GCC performance improvements based on what we
discussed at the sprint.
* Internal ARM tasks that kept me busy for most of last week and this week.
Plans:
* still stuck on some ARM internal tasks for next week.
== This week ==
* Got the STT_GNU_IFUNC work ready to submit. Split out some preparatory
patches, including fixes for some general ARM inefficiencies that I
noticed this week. Ran the EGLIBC testsuite (including ifunc tests)
and they passed.
* Discussed ideas for representing permuted vector loads with Ira.
I'm still um-ing and ah-ing about the various possible approaches,
but I think I understand the constraints a bit more now.
* Fixed Qt miscompilation (lp #705689).
* Fixed PC-relative load bug in the assembler (lp #716967).
== Next week ==
Holiday!
Richard
RAG:
Red:
Amber: DATE/QEMU conference still hasn't confirmed I have a place...
Green: qemu-linaro first release made!
Current Milestones:
| Planned | Estimate | Actual |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
* maintain-beagle-models:
+ first qemu-linaro release (2011.02-0) made on time
+ fixed OMAP3 MMC controller model bug that was causing the kernel
to hang when enabling a swapfile; pushed fix to qemu and meego trees
+ rebased qemu-linaro on new upstream
* merge-correctness-fixes
+ reviewed some softfloat patches from Christophe; testing of
the half-precision floating point conversion instructions
showed up a number of other bugs which I submitted patches for:
http://patchwork.ozlabs.org/patch/82594/ (n/6)
+ reviewed and tested Christophe's patches for VQMOVUN and
VSLI.64/VSRI.64; these have been committed upstream
+ fix compile failure if !CONFIG_USE_GUEST_BASE
http://patchwork.ozlabs.org/patch/82630/
+ remove stray #include halfway through source file
http://patchwork.ozlabs.org/patch/82661/
+ improved vmull.p8 implementation over the meego version, sent
upstream: http://patchwork.ozlabs.org/patch/82657/
+ upstreamed patch to fix VQDMLSL:
http://patchwork.ozlabs.org/patch/82752/
+ upstreamed patch fixing thumb-to-arm neon dp insn conversion:
http://patchwork.ozlabs.org/patch/82757/
+ upstreamed patches fixing Neon VZIP and VUZP
* other
+ did a quick estimate of required effort to do vexpress model
(answer: 2 weeks if we don't want audio/USB/compact flash)
+ usual crop of standing meetings
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hi,
* moved from Ubuntu Maverick to Natty on the PandaBoard
* investigation on the LTTng User Space Tracer:
https://wiki.linaro.org/KenWerner/Sandbox/LTTng
* started to look into latrace:
https://wiki.linaro.org/KenWerner/Sandbox/latrace
The idea is neat but there are issues in case the users code does dlclose
on a shared object. I'll investigate further when time permits.
* spent some time on IBM internal process work
Regards
Ken
Hi Will,
> > - It seems odd that the kernel says it doesn't support the debug
> > architecture, but then reports to user space that 1 watchpoint and 6
> > breakpoints are supported ... GDB will never use the watchpoint,
because
> > the maximum watchpoint size is reported as zero, but GDB will attempt
to
> > use the breakpoints. Setting a breakpoint will appear to succeed, but
then
> > the breakpoint just never triggers. The kernel should IMO be more
> > consistent in how unsupported configurations are handled ...
>
> Agreed. This is an artifact of how the ptrace info register is populated.
> I'll work on a fix tomorrow so that we don't report any resources when
> the architecture is unsupported.
Great, thanks!
> > - Why is architecture 0x4 not supported? This seems to be the variant
of
> > the v7 debug architecture with memory-mapped registers. Apparently the
> > IGEP only supports this version ... Do you know what the
> > Beagle-/Pandaboard and other clones do? What would it take to support
this
> > architecture variant? Given the widespread use of those boards, it
would
> > be really nice if we could support hardware debugging on them ...
>
> The memory-mapped interface is hugely unreliable in real hardware because
> you have to calculate the address of the memory-mapped debug registers by
> using a base and offset, which are hardcoded in some information
registers.
> Unfortunately, I've never found a board where these registers have been
> programmed correctly so (a) I had nothing to test my code with (b) few
people
> would be able to use it and (c) there's not really a safe way to go
around
> poking random areas of memory.
Huh, I see. I have no idea whether those information registers contain
correct values on IGEP ..
> > - Which hardware *is* supported? Can you recommend a board I should be
> > using to verify GDB support is working?
>
> The simple rule is Cortex-A8 is unsupported and Cortex-A9 is supported.
> The A5 should work (untested) and the A15 will need a bit of hacking to
> get it supported.
OK. I guess I can try on our Versatile Express.
> > Thanks for your help in getting this working!
>
> No problem. If you find anybody with working memory-mapped debug and some
> spare time, I'd be happy to review patches :)
Thanks! I'll try and see if I can figure out where the MM area is
on the IGEP ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hello,
* Analyzing DENBench benchmarks.
* Running mp3 player on Crotex A9 with gcc-linaro -r99463 using SMS flags
(*) gives 21% improvement in execution time compared to using only base
flags(**).
(*) -fmodulo-sched -fmodulo-sched-allow-regmoves
(**) -mcpu=cortex-a9 -mtune=cortex-a9 -mthumb -static --fast-math
Thanks,
Revital
Hi,
* regtested vzip/vuzp patch
* looked into big-endian build
* applied all the required patches and checked that Viterbi gets
vectorized giving ~2x performance improvement (compiled with
cross-compiler)
* looked into vld/vst implementation - mostly discussions with Richard
* DenBench analysis:
- there are loops that should get vectorized with vzip/vuzp patch,
I'll check them next week
- sad8_c (hot function from mp4encode) needs reduction SLP (which I
implemented several weeks ago), and an ability to jump unknown stride
in loop SLP - I am looking into this
Ira
On Wednesday 09 February 2011 20:25:32 Will Deacon wrote:
> > - Why is architecture 0x4 not supported? This seems to be the variant of
> > the v7 debug architecture with memory-mapped registers. Apparently the
> > IGEP only supports this version ... Do you know what the
> > Beagle-/Pandaboard and other clones do? What would it take to support this
> > architecture variant? Given the widespread use of those boards, it would
> > be really nice if we could support hardware debugging on them ...
>
> The memory-mapped interface is hugely unreliable in real hardware because
> you have to calculate the address of the memory-mapped debug registers by
> using a base and offset, which are hardcoded in some information registers.
> Unfortunately, I've never found a board where these registers have been
> programmed correctly so (a) I had nothing to test my code with (b) few people
> would be able to use it and (c) there's not really a safe way to go around
> poking random areas of memory.
So the only problem is that it's board specific? That's something we
know how to deal with -- all I/O components have some random board
specific address, and we put them in a platform device that is
listed in the board file. This should be easy enough to do for another
register area, though it means we have to do it separately for each board.
> > - Which hardware is supported? Can you recommend a board I should be
> > using to verify GDB support is working?
>
> The simple rule is Cortex-A8 is unsupported and Cortex-A9 is supported.
> The A5 should work (untested) and the A15 will need a bit of hacking to
> get it supported.
Is that because A8 is memory mapped and A9 uses CP14, or is there another
problem with A8?
Arnd
Hello Will,
I've been trying to get GDB support for hardware watchpoints/breakpoints
going. I've ported Matthew's GDB patch to current mainline, and am running
this under a 2.6.37-1002-linaro-omap kernel on an IGEPv2 board.
However, something seems to be not quite working: I'm seeing this kernel
message on boot:
hw-breakpoint: debug architecture 0x4 unsupported.
and then at runtime, the result of a PTRACE_GETHBPREGS call for register 0
is 0x04000106:
debug architecture: 4
watchpoint size: 0
nr. watchpoints: 1
nr. breakpoints: 6
This leads me to a couple of questions:
- It seems odd that the kernel says it doesn't support the debug
architecture, but then reports to user space that 1 watchpoint and 6
breakpoints are supported ... GDB will never use the watchpoint, because
the maximum watchpoint size is reported as zero, but GDB will attempt to
use the breakpoints. Setting a breakpoint will appear to succeed, but then
the breakpoint just never triggers. The kernel should IMO be more
consistent in how unsupported configurations are handled ...
- Why is architecture 0x4 not supported? This seems to be the variant of
the v7 debug architecture with memory-mapped registers. Apparently the
IGEP only supports this version ... Do you know what the
Beagle-/Pandaboard and other clones do? What would it take to support this
architecture variant? Given the widespread use of those boards, it would
be really nice if we could support hardware debugging on them ...
- Which hardware *is* supported? Can you recommend a board I should be
using to verify GDB support is working?
Thanks for your help in getting this working!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
I'm working in the Linaro toolchain team on adding ARM support for GNU
indirect functions (STT_GNU_IFUNCs). The indirect function feature
requires a new relocation type, which is typically called R_FOO_IRELATIVE.
I'd therefore like to propose a new R_ARM_IRELATIVE relocation type for
the ARM EABI.
This relocation is only used in ET_EXEC and ET_DYN objects. If the
object has a PT_DYNAMIC tag, then the relocation may only appear in
the DT_REL(A) table; it cannot appear in the DT_JMPREL table.
(Note that this is a deliberate divergence from the x86 and x86_64
behaviour, which does allow the IRELATIVE relocation to be used in
DT_JMPREL table, but which requires it to be applied at load time,
regardless of bind-now vs. lazy semantics. However, the proposed
ARM behaviour matches that of other targets like PowerPC.)
Static ET_EXEC objects may have R_ARM_IRELATIVE relocations. In this
case, the relocations are stored in a relocation table that contains no
other type of relocation (not even R_ARM_NONE). The static linker
defines two symbols:
__rel_iplt_start, which the linker points to the start of this table
__rel_iplt_end, which the linker points to the last byte of this table
plus one.
The two symbols are equal if the executable has no R_ARM_IRELATIVE
relocations. It is the executable's responsibility to apply these
relocations as appropriate. If the static linker emits a symbol table,
then it is not defined whether the linker includes __rel_iplt_start and
__rel_iplt_end in that symbol table.
The static linker may (or may not) define __rel_iplt_start and
__rel_iplt_end in dynamic objects. However, if it does define them,
the symbols must refer to part of the DT_REL(A) table, and it is still
the dynamic linker's responsibility to apply the relocations.
An R_ARM_IRELATIVE relocation applies to all bits of a 4-byte field.
There are no alignment restrictions on the field. The relocation
value is:
call(B(S) + A)
where call(X) represents the value of r0 after performing an indirect
branch-with-link-and-exchange (BLX) to address X.
The dynamic linker must have applied all earlier DT_REL(A) relocations
before calling X. It is undefined whether later DT_REL(A) relocations
have been applied or not, and X must not make any assumptions about the
status of those relocations.
If there is an R_ARM_IRELATIVE relocation with symbol S and addend A,
then the relocation value:
call(B(S) + A)
is considered to be a load-time constant. It is possible for an object
to have more than one R_ARM_IRELATIVE relocation with the same value
of B(S) + A, and in such a case, it is not defined whether the dynamic
linker invokes the target function each time, or whether it caches the
results of earlier calls.
I realise this isn't the cleanest extension in the world. As Alan Modra
noted on the binutils list, the choice of __rel_iplt_start and __rel_iplt_end
is particularly unfortunate, since the relocations are not specific to
"PLTs". However, the GNU extension has been defined this way,
so unfortunately there isn't much room for target-specific variation.
Thanks,
Richard
Hi,
I'd like to check vzip/vuzp patch in big endian mode. But when I try
to compile with -mbig-endian flag, I get
> ~/mainline/bin/bin/gcc -O3 -mfloat-abi=softfp -mfpu=neon neon-vtrnu8.c -mbig-endian
/home/irar/mainline/bin/lib/gcc/armv7l-unknown-linux-gnueabi/4.6.0/../../../libgcc_s.so.1:
could not read symbols: File in wrong format
collect2: ld returned 1 exit status
What am I missing?
Thanks,
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.
Linaro GDB 7.2 2011.02-0 is the third release in the 7.2 series. Based
off the latest GDB 7.2, it includes a number of ARM-focused bug fixes
and enhancements.
Interesting changes include:
* Backtracing is more reliable through using the ARM specific
exception tables for unwinding
* Better supports debugging functions compiled with GCC's -fstack-protector
* Multiple testsuite related fixes
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.2-2011.02-0
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
-- Michael