== GDB ==
* Committed hardware watchpoint support for gdbserver to mainline,
including two minor changes resulting from review comments;
backported those fixes to Linaro GDB as well.
* Implemented and tested support for disabling address space
randomization in gdbserver; patch posted for review.
* Investigated support for cross-platform core file generation.
== GCC ==
* Patch review week.
* Posted updated patch for PR 50305.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Arnd Bergmann <arnd(a)arndb.de> wrote on 08/26/2011 04:44:26 PM:
> On Thursday 25 August 2011, Russell King - ARM Linux wrote:
> >
> > Arnd, can you test this to make sure your gdb test case still works,
and
> > Mark, can you test this to make sure it fixes your problem please?
>
> Hi Russell,
>
> The patch in question was not actually from me but from Ulrich Weigand,
> so he's probably the right person to test your patch.
> I'm forwarding it in full to Uli for reference.
Hi Arnd, hi Russell,
sorry for the late reply, I've just returned from vacation today ...
I've not yet run the test, but just from reading through the patch
it seems that this will at least partially re-introduce the problem
my original patch was trying to fix.
The situation here is about GDB performing an "inferior function
call", e.g. via the GDB "call" command. To do so, GDB will:
0. [ Have gotten control of the target process via some ptrace
intercept previously, and then ... ]
1. Save the register state
2. Create a dummy frame on the stack and set up registers (PC, SP,
argument registers, ...) as appropriate for a function call
3. Restart via PTRACE_CONTINUE
[ ... at this point, the target process runs the function until
it returns to a breakpoint instruction and GDB gets control
again via another ptrace intercept ... ]
4. Restore the register state saved in [1.]
5. At some later point, continue the target process [at its
original location] with PTRACE_CONTINUE
The problem now occurs if at point [0.] the target process just
happened to be blocked in a restartable system call. For this
sequence to then work as expected, two things have to happen:
- at point [3.], the kernel must *not* attempt to restart a
system call, even though it thinks we're stopped in a
restartable system call
- at point [5.], the kernel now *must* restart the originally
interrupted system call, even though it thinks we're stopped
at some breakpoint, and not within a system call
My patch achieved both these goals, while it would seem your
patch only solves the first issue, not the second one. In
fact, since any interaction with ptrace will always cause the
TIF_SYS_RESTART flag to be *reset*, and there is no way at all
to *set* it, there doesn't appear to be any way for GDB to
achive that second goal.
[ With my patch, that second goal was implicitly achieved by
the fact that at [1.] GDB would save a register state that
already corresponds to the way things should be for restarting
the system call. When that register set is then restored in [4.],
restart just happens automatically without any further kernel
intervention. ]
One way to fix this might be to make the TIF_SYS_RESTART flag
itself visible to ptrace, so the GDB could save/restore it
along with the rest of the register set; this would be similar
to how that problem is handled on other platforms. However,
there doesn't appear to be an obvious place for the flag in
the ptrace register set ...
Bye,
Ulrich
== String routines ==
* Having got agreement on ignoring the triplet for picking the
routine, I'm just testing a patch,
but fighting a qemu setup.
* Found the binfmt binding for armeb was wrong (runs the le
version); filed bug with fix in
Dave
==GCC==
Combined report for last 2 weeks -
===Progress===
* Committed conditional compares patch to Linaro GCC 4.6
* Looking at modelling auto-inc-decs better .
* Tried patch for PR19599 and that broke bootstrap with a segfault.
Needs some re-engineering.
* Looked at the latest bootstrap failure on trunk. Still narrowing down.
* Some work on some administrative stuff
* Bit of patch review.
* Went for LLVM dev meeting.
* Release week had a few issues and helped dry-run cbuild spawns of
jobs and think I now know how to do that.
=== Plans ===
* finish looking at bootstrap failure.
* Finish auto-inc-dec patch.
* some more patch review.
* Send out LLVM dev meeting report.
Absences.
* 5th October - Out of office.
* 13th -14th October - Internal ARM training.
* 31st Oct - 4th Nov - Linaro Summit Orlando
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
(short week: 4 days)
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
== a15-system-mode-planning ==
* now complete: we have generated blueprints/roadmap cards for the TSC
for the various options
== a15-usermode-support ==
* tested udiv/sdiv implementation
* fused mac: rough idea of what needs to be done, need to get all
the fiddly details right
== omap3 upstreaming ==
* rebased and sent pullreq for various outstanding patches
== other ==
* code/design walkthrough for upstream's new memoryregion API
* working on lightning talk for pdsw doughnut session next week
* investigated compile failure building QEMU in thumb mode with debug
enabled (we're trying to use the Thumb framepointer register as a
temporary...)
* meetings: toolchain, standup, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences (to end of year):
Sep 29-Oct 07, Oct 17, Nov 21, Dec 15-Jan 03: leave
Oct 30-Nov 04: Linaro Connect Q4.11
== This week ==
* Submitted a fix for the performance regression caused by my
arm_comparison_operator patch. Applied upstream after approval
from Ramana (thanks). Will backport to Linaro towards the end
of next week if there are no reported problems.
* Went back to looking at -fsched-pressure. To recap, a colleague
ran SPEC for s390 comparing:
(a) normal -O3 based flags
(b) (a) + -fsched-pressure without my patch
(c) (a) + -fsched-pressure with my patch
(c) got the best geomean result, but there were some individual
tests for which (b) was significantly worse than (a), and for
which (c) only partly closed the gap.
Found one problem. It looks like -fsched-pressure only really
operates on the issue rate and instruction latencies; it doesn't
seem to use the DFA. This seems to be unintentional, and fixing
it showed some nice results.
Also, the -fsched-pressure patch that I wrote at Connect set the
starting pressure based on the set of registers that are both live
on entry to the block _and_ used within the enclosing loop,
This still seems to be a bit too conservative, in that it makes
the scheduler go out of its way to preserve loop invariants,
even if there are too many of them. Experimented with changing
"used" to "defined". This too seemed to be a win.
* Got access to some PowerPC GNU/Linux machines that are suitable
for running SPEC. Set up my account there and got SPEC building.
The idea is to use this to get more cross-target evidence for the
-fsched-pressure submission(s).
* Discussion about the SMS register-scheduling patches after great
feedback from Ayal. While drafting a still-unsent reply justifying
the main part of the patch, I found I was also explaining why another
part of the patch (specifically the prologue/epilogue part) was wrong.
Thought about that a bit today.
* Submitted fix for LP 641126.
== Next week ==
* More SMS register scheduling.
* More -fsched-pressure.
* Hopefully remerge the arm_comparison_operator patch with this week's fix.
Richard
* Working on getting everything in place for cross-compiling Firefox for
ARM. Trying to understand how the configuration script and make file works.
* Working on a test that will run Sunspider and extract the results. The
challenging part is that results are only presented on the page, not e.g.
written to stdout or to file. My approach to create an html file, embed the
page with the test in an iframe, and read out the results when the test is
done.
* Running SPEC2K on the Snowball board. An updated kernel solved the issue
with great variations in the test results. Some tests results look a bit
strange, so I will look at what those tests do to see what part of the
system is stressed.
Best Regards
Åsa
Hi,
* widening shifts patch - submitted upstream
* change default vector size patch - submitted to linaro-gcc
* automatic choice of vector size for basic block vectorization - testing
* vectorizer bug fixes
Next week we have New Year holiday on Wednesday (half day) and Thursday.
Ira
The Linaro Toolchain Working Group is pleased to announce the 2011.09
release of both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 2011.09-1 is the seventh release in the 4.6 series. Based
off the latest GCC 4.6.1+svn178681, it contains a range of vectoriser
and core performance improvements as well as fixing a number of
bugs.
Interesting changes include:
* Updates to 4.6.1+svn178681
* Improves performance by making better use of conditional compares
* Improves performance by properly scheduling widening multiplies
* Improves size and speed by improving constant generation in Thumb-2
* Implements support for widening multiples in toe core
* Improves vectorised code by reducing the over-promotion of intermediates
* Improves performance by reducing redundant moves between VFP and ARM
* Finishes off supporting the Android team in integrating Linaro GCC
Fixes:
* LP: #823548 Can't use -flto with skia
* LP: #823711 libvirt version 0.9.2-4ubuntu8 failed to build on armel
* LP: #827990 internal compiler error: in decode_addr_const, at varasm.c:2632
* LP: #836401 ICE on a | (b << negative-constant)
* LP: #838994 ICE building perl w/ -marm
* LP: #843775 ICE optimizing widening multiply-and-accumulate
Linaro GCC 4.5 2011.09 is the fourteenth release in the 4.5
series. Based off the latest GCC 4.5.3+svn178560, this is a
maintenance focused release.
Interesting changes in 4.5 include:
* Updates to 4.5.3+svn178560
Fixes:
* LP: #823711 libvirt version 0.9.2-4ubuntu8 failed to build on armel
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2011.09https://launchpad.net/gcc-linaro/+milestone/4.5-2011.09
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release page:
https://launchpad.net/gcc-linaro/4.6/4.6-2011.09https://launchpad.net/gcc-linaro/4.5/4.5-2011.09
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
I tried to bootstrap current GCC trunk and our latest gcc-linaro-4.6
in profile guided, link time optimisation, and SMS modes. The results
are here:
https://wiki.linaro.org/MichaelHope/Sandbox/PGOLTOSMSStatus1
Short story: you can't bootstrap in LTO or PGO on ARM as they run out
of memory. i686 LTO is broken on trunk and gcc-linaro-4.6. SMS is
fine in general.
I'll run these once a week and keep an eye on them. A -fwhopr instead
of -flto may help on ARM. I don't know why the PGO build runs out of
memory.
-- Michael
Måns pointed me at the IDCT throughput test that's included with
libav. I've written up a page on how to build and run it at:
https://wiki.linaro.org/MichaelHope/Sandbox/LibAvDCT
Included are results with and without the vectoriser. In all cases
the vectoriser improves things, including increasing the SIMPLE-C
version by 11 % and the peak by 17 %.
The coefficient of variance is low so the results are consistent. I
haven't investigated the benchmark itself to see if its valid - we
could be vectorising the loop overhead instead of the IDCT itself.
-- Michael
Please coordinate with Jon Masters at RedHat/Fedora and Adam Conrad at
Ubuntu/Debian on this. (Cc'ing the cross-distro list, through which the
recent ARM summit at Linux Plumbers was organized.)
Cheers,
- Michael
On Sep 16, 2011 8:41 AM, "David Gilbert" <david.gilbert(a)linaro.org> wrote:
> OK, so we seem to have agreement here that what we want is autodetect
> for eglibc and
> forget about the triplet; well technically that probably makes my life
> easier, and I don't
> think it's too hard a sell.
>
> Dave
>
> _______________________________________________
> linaro-toolchain mailing list
> linaro-toolchain(a)lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
* Linaro GCC
Spun 4.5 and 4.6 2011.09 GCC release tarballs. Uploaded them to
Michael's server, and kicked off the tests.
Continued work on my new constant optimization experiments. I now have
it tracking all the constants and am looking at how to detect the
optimization opportunities. So far it only calculates how exprensive it
would be to generate a value by adding to an existing constant, which is
a start at least. I'm having difficulties detecting whether changing an
insn will make it's parent (dependency-wise) obsolete, or not (and
therefore whether to count its costs - there's no problem for
instructions that overwrite an entire register, but ones that write to
portions of registers (such as MOVT) make more complex dependency
chains, and the def-use chains don't seem to be sorted into the order of
use.
* Other
Half day vacation on Thursday.
* Added testcases to Richard's micro benchmarks taken from libav.
* Discussed with Ayal the new version of the patch to support
instructions with
REG_INC_NOTE in SMS which causes bootstrap failure. I intend to debug
the bootstrap failure in order to find the cause for it.
(http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01216.html)
== String routines ==
* Tidying up bits of cortex strings for the release process
* Nailing down the behaviour of config.sub and the config systems in
gcc, binutils and eglibc
== Other ==
* A discussion on synchronisation primitives on various CPUs that
started on the gcc list
- looking at http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
- pointing out the 64bit instructions
- asking why they used isb's when neither the kernel or gcc use
them (answer the DMBs should
be fine as well, but there is some debate over which is
quicker, oh and DMBs are
converted to slower dsb's on most A9s due to an errata).
* Looking for docs on the non-core bits of current SoCs
* Extracting some denbench stats from a few months back for Ramana
About a day of non-Linaro IBM stuff.
Dave
RAG:
Red:
Amber:
Green:
NB: since qemu-linaro releases demonstrably go out on schedule
every month I'm dropping them from the milestone tables in these
reports, in favour of blueprint completion dates (usually they'll
be planned for dates coinciding with a qemu-linaro release).
Current Milestones:
|| || Planned || Estimate || Actual ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
== linaro-qemu-11.11 ==
* completed this month's release
== add-omap3-networking ==
* investigated why qemu's usb-net model didn't work on the beagle;
this was due to a bug in the usb-ohci controller model; patches
sent upstream. Fix will go into qemu-linaro 2011-10.
== a15-system-mode-planning ==
* wrote up options and my suggestion on the wiki:
https://wiki.linaro.org/PeterMaydell/QemuA15
* just need to discuss with Michael and turn this into a roadmap entry
== a15-usermode-support ==
* complete but untested implementation of UDIV and SDIV
* fused multiply-accumulate: implemented decode, and the special-cases
parts of the softfloat implementation (NaN, inf, etc); the difficult
bit of actually implementing the operation remains
== other ==
* meetings: toolchain, pdsw doughnuts, AFDS
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences (to end of year):
Sep 19, Sep 29-Oct 07, Oct 17, Nov 21, Dec 15-Jan 03: leave
Oct 30-Nov 04: Linaro Connect Q4.11
As mentioned on the standup call this morning, I've been trying to get my head
around the way different parts of the toolchain using the config scripts and the
triplets. I'd appreciate some thoughts on what the right thing to do
is, especially
since there was some unease at some of the ideas.
My aim here is to add an armv7 specific set of routines to the eglibc
ports and get this picked up
only when eglibc is built for armv7; but it's getting a bit tricky.
eglibc shares with gcc and binutils a script called config.sub (that
lives in a separate repository)
which munges the triplet into a $basic_machine and validates it for a
set of known
triplets.
So for example it has the (shell) pattern:
arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb]
to recognise triplets of the form arm- armbe- armle-
armel- armbe- armv5- armv5l- or armv5b-
It also knows more obscure things such as if you're configuring for
a netwinder it's an armv4l- system running linux - but frankly most
of that type of thing are a decade or two out of date. Note it doesn't
yet know about armv6 or armv7.
eglibc builds a search path that at the moment includes a path under
the 'ports' directory of the form
arm/eabi/$machine
where $machine is typically the first part of your triplet; however
at the moment eglibc doesn't have any ARM version specific subdirectories.
If I just added an ports/sysdeps/arm/eabi/armv7 directory it wouldn't
use it because
it searches in arm/eabi/arm if configured with the triplet arm-linux-gnueabi or
--with-cpu sets $submachine (NOT $machine) - so if you pass --with-cpu=armv7
it ends up searching
arm/eabi/arm/armv7
if you used the triplet arm-linux-gnueabi. If you had a triplet like
armel then I think
it would be searching
arm/eabi/armel/armv7
So my original patch (
http://old.nabble.com/-ARM--architecture-specific-subdirectories,-optimised…
)
did the following:
* Modified the paths searched to be arm/eabi (rather than arm/eabi/$machine)
* If $submachine hadn't been set by --with-cpu then autodetect it
from gcc's #defines
which meant that it ignored the start of the triplet and let you
specify --with-cpu=armv7
After some discussion with Joseph Myers, he's convinced me that isn't
what eglibc
is expecting (see later in the thread linked above); what it should
be doing is that
$machine should be armv7 and $submachine should be used if we wanted
say a cortex-a8 or
cortext-a9 specific version.
My current patch:
* adds armv6 and armv7 to config.sub
* adds arm/eabi/armv7 and arm/eabi/armv6t2 and one assembler
routine in there.
* If $machine is just 'arm' then it autodetects from gcc's #defines
* else if $machine is armv.... then that's still $machine
So if you use:
a triplet like arm-linux-gnueabi it looks at gcc and if that's configured
for armv7-a it searches arm/eabi/armv7
a triplet like armv7-linux-gnueabi then it searches arm/eabi/armv7
irrespective
of what gcc was configured for
a triplet like armv7-linux-gnueabi and --with-cpu=cortex-a9 then it searches
arm/eabi/armv7/cortex-a9 then arm/eabi/armv7
As far as I can tell gcc ignores the first part of the triplet, other than
noting it's arm and spotting if it ends with b for big endian; (i.e.
configuring gcc with armv4-linux-gnueabi and armv7-linux-gnueabi
ends up with the same compiler).
binutils also mostly ignores the 1st part of the triple - although is
a bit of a mess
with different parts parsing it differently (it seems to spot arm9e for some odd
reason); as far as I can tell gold will accept armbe* for big endian where as
ld takes arm*b !
If you're still reading, then the questions are:
1) Does the approach I've suggested make sense - in particular that the
machine directory chosen is based either on the triplet or where the triplet
doesn't specify the configuration of gcc; that's my interpretation of what
Joseph is suggesting.
2) Doing (1) would seem to suggest I should give config.sub armv6t2 and
some of the other complex names.
Dave
== This week ==
* Reviewed patches for the release.
* ...then broke the release. Tried to spin a new one.
* Worked on a "real" fix for bug 850099. Now in testing.
* Looked more at auto-inc-dec stuff. Saw a case that didn't behave
as I expected on the A9. The A9 TRM doesn't describe what happens
for post-indexed addressing, so I asked Ramana. Apparently the
behaviour is expected. Once I have more info, I'll try to update
the patches.
* Worked on neon-highlow-extract and neon-strided-load-extract.
Posted the three patches upstream. Nicely, the one I thought
was going to be the most controversial actually got positive
feedback from Paolo (who wrote the affected code).
Richard
== GDB ==
* Completed hardware watchpoint support for gdbserver.
* Tracked down watchpoint resource accounting regression
on GDB mainline (not present in 7.3).
* Created and published Linaro GDB 7.3-2011.09 release.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
* Running SPEC2K on the Snowball board. A fresh kernel with HIGHMEM enabled
made it possible to run the tests. Great variations in the results indicates
that something strange is going on. Turning off one of the CPU:s gives
stable result (but slow), so my current guess is that the variations are
caused by a known bug that makes one cpu run slower.
http://igloocommunity.org/bugzilla3/show_bug.cgi?id=1
The patch for this bug was not included in my kernel. Will have another go
with a kernel where the patch is included, as a background activity.
* Planned and started working on the "Adding browsing benchmarks to our
current set of tests"-activity. I will try to keep documentation up to date
here:
https://wiki.linaro.org/AsaSandahl/Sandbox/BrowsingBenchmarks
Experimenting with building Firefox in different ways, so far for x86.
Best Regards
Åsa
(bouncing to linaro-dev as it's generally interesting)
On Fri, Sep 16, 2011 at 8:17 AM, Ramana Radhakrishnan
<ramana.radhakrishnan(a)linaro.org> wrote:
> Hi,
>
> I've been looking at some of the perf regressions we've been seeing
> these days in an attempt to understand what's going on in these cases.
> While I can use perf and get more statistics and do other things to
> figure out why there are perf regressions between 2 binaries along
> with perf record and report, I wonder if it is possible to use u-boot
> to accurately measure what's going on. I would like to try and get the
> values of the performance counters between 2 program points .
>
> I am aware that there are patches that are floating around that allow
> users to set and reset the PMU counters by allowing user level access
> to it in the kernel : while that maybe useful to some I'm not sure if
> I want to take a chance with some other process getting scheduled that
> ends up getting scheduled. Even if there are parts of the kernel that
> save and restore PMU counters associated per process with across
> context switches . I'm looking for as accurate measurements as
> possible in this case and I wonder if u-boot is the best bet for this
> ( in the absence of any dedicated hardware debug / trace unit) given
> not all of us have one.
>
>
> At the minimum to do this I believe we require u-boot or some start-up code to:
>
> * Turn on i-cache and d-cache. ( The current u-boot for panda that I
> get from the linaro-uboot git repo
> git://git.linaro.org/boot/u-boot-linaro-stable.git says "Warning
> Caches turned off" when starting up ). Googling around I find a few
> patches floating around that turn on the d-cache in August from Aneesh
> at TI . We should consider getting these in at some point.
>
> * Looking in $(UBOOT_TOP)/examples/api I see that there are simple
> printf routines and simple stand-alone applications that exist which
> could be used for this purpose. The one problem with this is the fact
> that u-boot appears to require use of -ffixed-r8 for it's purposes
> which *might* mean we need these if we were to use API calls into
> standard u-boot functions .
I wonder if R8 is used in the current ARM version? There's no reason
we can't cherry pick parts such as the serial I/O out into a library
and make the app completely self contained. Skip all of the
initialisation stuff and assume the boot loader has done it for you.
> * Turn on / off speculative prefetching - I believe the kernel does
> this already for a few boards, but could this be done in u-boot just
> before it launches a test application ?
>
> * Turn on the VFP and Neon units.
>
> * Turn on unaligned access so that unaligned accesses are allowed in
> the test applications. GCC will now move towards generating unaligned
> accesses on versions of the architecture that support it, the patches
> upstream have now been approved.
>
> * Memory map / linker scripts to make sure we are putting things in
> the right places (sigh, has to be per-board).
But everything goes in RAM so you have one generic linker script and a
per board MEMORY definition. Similar to:
http://bazaar.launchpad.net/~stm32f-dev/stm32f-dev/stm32f-startup/view/head…
...but even lighter.
> We then write a set of library functions that could then look at what
> performance counters are of interest to us and track them by resetting
> them to 0 and making sure they haven't overflown.
>
> Has anyone else in the group played with u-boot before or has any
> thoughts in this direction ? I am not suggesting that we do this work
> right now but it sounds like an interesting thought of where we can
> get to with this.
My worry is that we miss turning on a feature and get results that
aren't representative. That should be easy enough to check by
baselineing against a Linux hosted run.
We can use NFS or kermit to load the programs. u-boot has a network
console which is nice when you don't have serial. This combined with
an expect script (or LAVA? Paul?) should automate the whole process.
-- Michael
Hi,
* put the sources of the libunwind android port, the patches for
debuggerd and the Android test app online
* documented things at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Outputs/LibunwindDebuggerd
* noticed differences between the old (debuggerd) and the new
(debuggerd+libunwind) backtraces
* I'm still not sure what's going on (maybe they are adding offsets
or something)
* however, the backtrace that libunwind does looks sane to me
Note: I'll be on vacation till October 7th.
Regards
Ken
Hi,
* testing widen-shifts patch on ARM
* SLP improvements:
- submitted a patch to allow not simple ivs in SLP
- committed a patch to allow read-after-read dependencies in SLP
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.09.
Linaro QEMU 2011.09 is the latest monthly release of
qemu-linaro. Based off upstream (trunk) QEMU, it includes a
number of ARM-focused bug fixes and enhancements.
New in this month's release:
- linux-user mode now supports the 64 bit cmpxchg kernel helpers
(only needed for applications compiled for ARMv6 or lower)
- PL111 display controller now supported; this fixes a problem
where BGR was interpreted as RGB on recent versatilepb kernels
Plus a few other minor bug fixes and the usual round of upstream
fixes and improvements.
Known issues:
- The beagle and beaglexm models still do not support USB networking;
we intend to fix this for the 2011.10 release
- There may be some problems with running multithreaded programs in
linux-user mode (LP:823902)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.09
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.3.
Linaro GDB 7.3 2011.09 is the second release in the 7.3 series. Based
off the latest GDB 7.3, it includes a number of ARM-focused bug fixes
and enhancements.
This release contains:
* Support for hardware breakpoints and watchpoints in gdbserver
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.3-2011.09
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
Hi there. The 2011.09 release has been spun and is testing up well.
The 4.5 and 4.6 branches are now open so feel free to commit any
approved patches.
-- Michael
Hi!
When building Android with the Linaro toolchain, I encountered this link
time error when going from gcc 4.4.3 to gcc 4.6.
"arm-eabi-g++: error: unrecognized option '-avoid-version'"
I find several posts about people encountering the same thing for different
programs.
Was this option removed? Anyone know the story behind it?
Regards
Åsa
Release Management needs the list of the the blueprints that will be
delivered this month.
For each bp, they want a Headline and Acceptance criteria. The Headline is a
statement to include in the Monthly release announcement. The acceptance
criteria is a statement how to verify the work is done.
I have collected this month's blueprints in the spreadsheet, please review
it for accuracy. If a blueprint you are working on is missing, please add
it.
The Headline and Acceptance can be added into the whiteboard as follows:
Headline:
headline text
Acceptance:
acceptance text
Once you add the the headline and acceptance, please modify the spreadsheet
to reflect that it is done.
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AoZqvK7R1biJdGxSc…
Thanks in advance,
Mounir
--
Mounir Bsaibes
Project Manager
Follow Linaro.org:
facebook.com/pages/Linaro/155974581091106http://twitter.com/#!/linaroorghttp://www.linaro.org/linaro-blog <http://www.linaro.org/linaro-blog>
==GCC==
===Progress===
* Fixed https://bugs.launchpad.net/ubuntu/+source/gcc-4.6/+bug/838994
. Investigated Bernd's alternate patch . Will commit mine.
* Looked at PR48308 for sometime whihc might be a dup of PR50313 .
* Some blueprint foo.
* Committed a few of the outstanding approved patches into Linaro GCC-4.6
* Patch review week
* Caught up with email after vacation.
=== Plans ===
* Commit conditional compares patch.
* Commit the patch for LP838994.
* Investigate some of the performance issues with strlen and some of
the cases with - one of the ideas is to probably try and get the
specific testcases run under u-boot or as a bare-metal binary and look
at dumps of various performance monitoring counters and see what's
happening.
* Look at BRANCH_COST and finish that up next.
* Dust off patch for PR19599 . One of them them that has fallen into the cracks.
* Some patch review.
Meetings:
* 1-1s
* TCWG calls
* Thumb2 performance call.
Absences.
* 16th Oct (pm) - LLVM developer summit - London
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked - hotel
to be booked.
* 08 Nov - 11 Nov - Tentatively booked not approved.
* Dec 19 - 31st Dec - Tentatively booked not approved.
Merged both GCC 4.5 and 4.6 from FSF to Linaro. Matthias requested that
I avoid a particular upstream 4.6 commit, so I selected the revision
before that as the merge point. The problem was then fixes upstream, and
another fix was desirable, so I've redone the merge from the branch head.
Another widening-multiplies bug was reported to me (I logged it as
pr50318/lp843775), so I've fixed that and committed the fix upstream,
and filed a merge request on Launchpad.
Finished fixing the bugs in my thumb2 constants optimizations, and
backported the new patches to Linaro GCC 4.6. Pushed the updated stuff
to Launchpad for testing.
Richard Sandiford found a flaw in my patch for pr50193/lp836401, so I've
done another version of that and posted it upstream. Ramana didn't like
that version. I've started again trying to fix it a different way, but I
don't have it working just yet.
Continued work on my new constant reuse patch. I have it detecting many
constant expressions, and calculating the values for some of them. Once
it does that sufficiently well, the next step is to track what constants
are available where, I then I'll be in a position to find optimization
opportunities. At the moment, 'sufficiently well' could just mean
MOVW/MOVT pairs, as those are the most common
Tried to get the CS Panda Boards up and running again after the move. No
success. Ricardo is on the case. I'm still using the boards located at
Canonical.
Andrew
Continue looking at Richard's micro benchmarks taken from libav w.r.t
SMS and experiment with different patches that Richard wrote to
improve code generation.
Submitted SMS related patch for minor misc fixes
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00551.html
Trying to understand why to new version of the patch to support
instructions with
REG_INC_NOTE in SMS causes bootstrap failure. Will email to the ml regarding it.
(http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01216.html)
* Looked at LP bug 736661. Sent a patch upstream. Some positive feedback,
but it hasn't been approved or rejected yet.
* Looked at the old R_ARM_THM_CALL linker bug, after Matthias prepared
a self-contained testcase (thanks). Attached a patch to the bug.
Will submit upstream next week if testing goes OK.
* Looked at why the backport of lp823708-4.5 retriggered the same
bootstrap failure that Chung-Lin's patch did. Haven't been able
to reproduce yet.
* Looked at making the neon_vget_high/low patterns use ordinary
subreg moves. Found that this triggered a fair few latent bugs
in the rtl optimisers. Tried to fix those. This gave some nice
improvements in some of the libav loops.
* Added h264 loops to the libav microbenchmarks.
* Blueprints.
* Upstream patch review.
Richard
* Completed First-time wiki page, at least for now. Expecting to add more
information as I go.
* Running SPEC2K on the Snowball board. The tests are failing because I run
out out of memory. This is due to too little RAM available in the default
kernel configuration. (Official HW pack.) I had a go with creating a swap
file on the SD-card. The tests are then running, but results are slow (which
makes sense). Will try to make more memory available with changed
uboot-option, or with a fresh kernel.
* Discussing with Michael about benchmark candidates that will add a web
browsing perspective to the benchmarks we have. I suggest Sunspider and V8
benchmark suite for the JavaScript aspect, and EEMBC Browsing Bench and
perhaps ARMBBench for the load and render aspect. As for the imaging aspect
we have DENBench and ConsumerBench.
Best Regards
Åsa
== String routines ==
* Trying to understand my strlen behaviour that Michael identified
- Found lots of ways of making the faster case slower, but none of making
the slower case faster!
- Perf not being available on Panda (bug 702999/843628) made it
difficult to
dig down
* Fixing standards corner cases for strchr/memchr
- input match needs to be truncated to char (fixes bug 842258 & 791274)
* Tidying up formatting for cortex-strings release
* Looking at eglibc integration again
- getting confused by what has to happen in config.sub and how
other users of it
cope with triplets like armv7 even though it's not in config.sub
== QEMU ==
* Testing Peter's QEMU release
- All good
- Lost a few hours due to the broken version of l-i-f-ui in Oneiric
- PPA version works OK
* A little bit of perf profiling
== Other ==
* Managed to get hold of a nice fast build machine
== GDB ==
* Worked on hardware watchpoint support for gdbserver.
== GCC ==
* Analyzed root cause of three more ICEs when building Linux
kernel with mainline GCC (reported by Arnd):
PR target/50305: Inline asm reload failure when building Linux kernel
PR middle-end/50307: SSA checking ICE when building Linux kernel
PR tree-optimization/50318: ICE optimizing widening
multiply-and-accumulate
* Implemented proposed fix for PR target/50305 and posted for review.
== Misc ==
* Installed updated FPGA bitfiles on my Versatile Express and verified
that network stability issues (LP #673820) are now fixed.
* Booked Linaro Connect Q4.11 travel.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber:
Green: overrunning OMAP3 upstreaming work (mostly) replanned
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
== upstream-omap3-patches ==
* more reshuffling of patches and dropping of unnecessary changes (eg
code reformatting)
* we're going to divide this blueprint into four, each of which has a
reasonably clearly defined submilestone and an estimated 3
engineering weeks of work in it
* in order to not have work on this completely push out other items on
the schedule, we're going to limit work done on this to 2 or 3 days
each week
* still todo: actually split the blueprint, set dates for
submilestones, check that other blueprints fit reasonably in the
other half-week
== linaro-qemu-11.11 ==
* built a pre-release tarball and tested it -- looks good for next
week's release
* investigated whether we can reinstate the firmware blobs in our
releases (bringing us back into line with upstream) -- should be
possible but need to go through the license approval process since
some are GPLv3
== a15-system-mode-planning ==
* starting to see some (gentle) pressure for A15 support
* thinking about what we should do here; my current opinion is that
QEMU should implement an "A15 without virtualization or LPAE" -- we
have Linux kernels that will boot on this, and it is essentially
what an A15-on-A15 hw virt guest would see. The device work will be
needed for KVM anyway. Need to write this up.
== other ==
* submitted TSC licensing request to add the firmware blobs back into
our qemu-linaro tarballs, in line with how upstream do releases
* I need to track better how much time I'm spending on things like code
review on qemu-devel, minor bug fixing and other things that aren't
blueprints
* all holiday to the end of the year now booked (see below)
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences (to end of year):
Sep 19, Sep 29-Oct 07, Oct 17, Nov 21, Dec 15-Jan 03: leave
Oct 30-Nov 04: Linaro Connect Q4.11
Hi,
* I've been setting up a new system because the old laptop died
* finished the initial port of libunwind for Android on ARM
* changed debuggerd to make use of libunwind to unwind the stack of
crashing applications
* it works and the output looks great :)
* I plan to document these things in the wiki by next week
Regards
Ken
Just as an FYI, I've added these loops to the libav microbenchmarks
avg-h264-chroma-mc8-8.txt
avg-pixels8-8.txt
ff-h264-idct-add-8-8.txt
ff-put-pixels8x16-8.txt
h264-loop-filter-luma-8.txt
idct-internal-8.txt
put-h264-chroma-mc8-8.txt
put-h264-qpel8-h-lowpass-8.txt
put-h264-qpel8-hv-lowpass-8.txt
put-h264-qpel8-v-lowpass-8.txt
based on Michael's h264 profile. These loops:
decode_residual
ff_h264_decode_mb_cavlc
fill_decode_caches
aren't really the kind of thing that the microbenchmark is designed for;
running the whole h264 benchmark is probably a better test. Some of the
functions in the profile just consist of two copies of a simpler loop,
one after the other, so for those I just used the simpler loop.
Usual microbenchmark caveats apply.
Richard
Hi,
* merged vector over-promotion patch to linaro-gcc-4.6
* committed upstream the change of the default vector size for NEON
* continued working on widening shifts
Ira
Hi Dave. I've been hacking away and have checked in a couple of
benchmarking and plotting scripts to lp:cortex-strings. The current
results are at:
http://people.linaro.org/~michaelh/incoming/strings-performance/
All are done on an A9. The results are very incomplete due to how
long things take to run. I'll leave ursa3 doing these over the
weekend which should flesh this out for the other routines.
Your new memcpy() is looking good as well - as fast as GLIBC.
-- Michael
While out benchmarking today, I ran across code similar to this:
int *a;
int *b;
int *c;
const int ad[320];
const int bd[320];
const int cd[320];
void fill()
{
for (int i = 0; i < 320; i++)
{
a[i] = ad[i];
b[i] = bd[i];
c[i] = cd[i];
}
}
I was surprised and happy to see the vectoriser kick in for the copy.
The inner loop looks like:
add r5, r3, ip
adds r4, r3, r7
vldmia r2!, {d16-d17}
vldmia r1!, {d18-d19}
adds r0, r3, r6
vst1.32 {q9}, [r5]
vst1.32 {q8}, [r4]
vldmia r3, {d16-d17}
adds r3, r3, #16
cmp r3, r8
vst1.32 {q8}, [r0]
bne .L3
so r3 is the loop variable and {ip,r7} are the offsets from r3 to the
destination pointers. Adding a __restrict doesn't change the code.
Richard, will your auto-inc/dec changes combine the final vldmia r3,
add r3 into a vldmia r3! ?
Changing the int *a into in-file arrays like int a[320] gives:
vldmia r0!, {d16-d17}
vldmia r5!, {d18-d19}
vstmia r4!, {d18-d19}
vstmia r1!, {d16-d17}
vldmia r2!, {d16-d17}
vstmia r3!, {d16-d17}
cmp r3, r6
bne .L2
Marking them as extern int a[320] goes back to the first form.
Can we always use the second form? What optimisation is preventing it?
-- Michael
On Fri, Sep 2, 2011 at 4:51 AM, David Gilbert <david.gilbert(a)linaro.org> wrote:
> Hi Michael,
> I've just committed a pair of memcpy's into src/linaro-a9 - memcpy.S
> that is armv7
> and memcpy-hybrid.S that is a Neon hybrid which uses neon for non-aligned cases
> and for large (128K or larger) copies. I've also (accidentally)
> wired the memcpy-hybrid
> one into the Makefile.am (I wasn't sure what the right way to do this
> was - the neon_sources
> seemed a good place for it, but there is nothing currently in there
> that turns off the non-neon
> version).
>
> I'd be interested in seeing the results for both; I've got a bit of
> a soft spot for the hybrid
> solution.
>
> On the memset, yes the 'and' that you added is fine - but I started
> having a play and have
> some performance results (on -t 128) that I don't really understand:
>
>
> 1) and r1,#0xff
> orr r1,r1,r1,lsl#8
> orr r1,r1,r1,lsl#16
>
> That's your solution - and fastest at somewhere around 2270MB/s for
> me - by the TRM I reckon
> that should be 3 cycles.
>
> 2) lsls r1,#24
> orr r1,r1,r1,lsr#8
> orr r1,r1,r1,lsr#16
>
> lsl isn't explicitly listed in the TRM, so I assumed that was the
> same as a move with a constant
> shift, which my reading is that it's a single cycle; and the lsls is 2
> bytes - so you would think
> that should be as fast as yours but 2 bytes smaller - except it's
> reliably down at 2228MB/s - so
> it is slower.
>
> 3) Thinking it was an alignment issue I tried adding a mov r5,r1 to
> the front of that, and got 2248MB/s -
> so being faster with an extra instruction it probably was an alignment issue?
>
> 4) I also tried a pair of bfi's:
> bfi r1,r1, #8, #8
> bfi r1,r1, #16, #16
>
> That came out at 2228MB/s - and is 4 cycles by the book.
Unfortunately you can't tell the performance from the latency.
Attached is a micro benchmark that has the three different versions
(and, ubx, lsl). After compensating for the loop time, I got:
* lsl: 1.006 s
* ubx: 0.876 s
* and: 0.918 s
even though ubx has a latency of two cycles.
I then took the AND version and shifted it to the start of the file.
This small change in alignment pushed it up to 1.048 s which is 14 %
slower.
-- Michael
* Linaro GCC
Fixed up, committed and posted two bug fixes to my thumb2 constants
patches, found by other people running FSF trunk.
Analysed bug lp:836401 / pr50193, developed a fix, and posted it both
upstream and to launchpad for testing. The launchpad tests have come
back clean, and the patch is approved, but upstream have not approved it
yet.
Posted a query to linaro-dev mailing list asking for ARM CPU ID register
numbers, and got lots back. Entered these into the patch, and begun some
test builds. I will post the new verion upstream, if all's well, next week.
Started looking at an optimization I discussed with Richard Earnshaw in
Cambourne, in which GCC attempts to synthesize constants more
efficiently by reusing constant values already in registers. I've made a
start, but not much more to say just yet.
* Other
- Public holiday Monday.
- Half day leave on Wednesday.
- Internal train session
Continue looking at Richard's micro benchmarks w.r.t SMS.
Examining Ayal's comments to the patch to support instructions with
REG_INC_NOTE in SMS.
(http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01216.html)
Took one day off yestarday (4/9)
Do we know anything about "Csmith"?
Maybe we should try it?
Andrew
-------- Original Message --------
Subject: Re: [PATCH][ARM] pr50193: ICE on a | (b << negative-constant)
Date: Thu, 1 Sep 2011 13:21:38 +0000 (UTC)
From: Joseph S. Myers <joseph(a)codesourcery.com>
To: Andrew Stubbs <ams(a)codesourcery.com>
CC: gcc-patches(a)gcc.gnu.org, patches(a)linaro.org
Newsgroups: gmane.comp.gcc.patches
References: <4E5F6B5F.2020207(a)codesourcery.com>
On Thu, 1 Sep 2011, Andrew Stubbs wrote:
> This patch fixes the problem by merely checking that the constant is positive.
> I've confirmed that values larger than the mode-size are not a problem because
> the compiler optimizes those away earlier, even at -O0.
Do you mean that you have observed for some testcases that they get
optimized away - or do you have reasons (if so, please state them) to
believe that any possible path through the compiler that would result in a
larger constant here (possibly as a result of constant propagation and
other optimizations) will always result in it being optimized away as
well? If it's just observation it would be better to put the complete
check in here.
Quite of few of the Csmith-generated bug reports from John Regehr have
involved constants appearing in unexpected places as a result of
transformations in the compiler. It would probably be a good idea for
someone to try using Csmith to find ARM compiler bugs (both ICEs and
wrong-code); pretty much all the bugs reported have been testing on x86
and x86_64, so it's likely there are quite a few bugs in the ARM back end
that could be found that way.
--
Joseph S. Myers
joseph(a)codesourcery.com
Hi,
libunwind:
* improvements in case the user doesn't use ARM unwind tables but DWARF info
* code used to pick ARM unwind from the crt files which says "cantunwind"
android:
* upgraded working base to 11.08
* continue to port libunwind to android
* noticed a header file clash that causes errors
* finished an android app that uses a native part to crash the process
* as a vehicle to test the modified debuggerd
Regards
Ken
== GDB ==
* Russell King now wants to revert my kernel patch that
fixed #615974; discussed alternative options.
== GCC ==
* Patch review week.
* Analyzed root cause of ICE when building Linux kernel
with mainline GCC (reported by Arnd).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== QEmu ==
* Sent 64bit atomic helper fix upstream
* Basic boot time and simple benchmarks v Panda board
* Tested prebuilt images and Peter's latest post-merge QEmu tree
- The full Ubuntu desktop on an emulated Overo is a bit slow -
it's rather short on RAM
- The full Ubuntu desktop on an emulated VExpress isn't bad; it's
got the full 1G; (with particularly grim
line of awk to mount vexpress images based on Peter's
suggestion of the use of 'file')
== String routines ==
* Pushed memcpy and memset up to cortex-strings bzr
* Working through memset issue with Michael
- Made my code a little less sensitive to initial alignment
== Hard float ==
* Testing libffi 3.0.11rc1 - still hasn't got variadic patch in, but
hopeing it will land later in the cycle.
== Other ==
* Excavating inbox after week off.
* Build LMbench and kicked run off on Panda. (Got stuck in some
heuristics under emulation)
Dave
== This week ==
* Looked at the get_arm_condition_code ICE. Seems to be a popular bug:
was reported as #589887 #823708 and #809761 in Lauchpad and as PR49030
in bugzilla. Sent a patch upstream.
* Submitted SMS register-dependency patch upstream.
* Reviewed Bernd's new shrink-wrap patch.
* Tried to clean up my microbenchmarks. Found that preloading the caches
at the start of the benchmark fixed the variations I was seeing on a
Beagleboard. (As Dave says, it seems that there's no allocation
on write.) Added code to check the results of each loop. Packaged
it up and pushed into bzr.
* An IBM colleague kindly tried my -fsched-pressure patch/hack
on s390. Although it performed the best of the three runs
(trunk, -fsched-pressure, patch+-fsched-pressure), there were
some disappointing outliers. -fsched-pressure still introduces a 7%
regression in one test (down from a 14% regression without the patch).
Another test benefited from -fsched-pressure without my patch but
regressed with it.
== Next week ==
* Look more at SMS.
* Look more at the sched-pressure thing (if I get time).
Richard
* SPEC2K week. Experimented with building and running and finally did a full
run on the Panda board.
* Running SPEC2K on the Snowball board as well. It is troublesome to work
with the board because of the ethernet problem that makes the board freeze
after some time. I have to use the SD-card for file transfer. It seems to be
a known issue on the Snowball V3. Tested a V5 board which was much more
stable. Have asked for a V5 board from ST-E Linaro internal project manager.
(Do not know when I will get it.)
* Preliminary travel booking done for Linaro Connect in Orlando.
Best Regards
Åsa
I've tried to clean up the libav microbenchmarks that I did for the strided
load/store stuff. They're on Launchpad at:
lp:~rsandifo/+junk/loop-microbenchmarks
The main changes are that the benchmarks now preload the caches (for CPUs
that don't allocate on write) and that they now check the optimised loop
against an unoptimised one.
The usual big caveat applies: these loops were chosen because they were
affected by strided load/stores. They aren't necessarily interesting
for any other reason, and some were even explicitly marked as cold.
I'm going to add some of the video decode routines from Michael's
benchmark soon. These microbenchmarks aren't supposed to be
libav-specific though, so if you have other interesting ones,
please do add them.
Richard
Trip report: KVM Forum/LinuxCon NA 2011
KVM Forum is an annual conference; this year it was colocated with
LinuxCon NA in Vancouver. There were about 150 attendees; many of them
are simply users of KVM and so many of the talks are aimed at KVM
users. However it's also an opportunity for the KVM and QEMU developer
community to get together, with a number of informal BoF sessions and
an all-day hackathon later in the week.
The talk schedule is here, together with the slides for all talks:
http://www.linux-kvm.org/page/KVM_Forum_2011
Some brief highlights:
* Keynotes
ARM/Linaro got positive mentions in both keynotes; Avi Kivity said of
the ARM/A15 KVM work that he had "every reason to expect it to be very
successful". Anthony Liguori's keynote summarising the year in QEMU
development included some statistics about commits: Linaro came third
in the list of "companies with most commits", behind only Red Hat and
IBM; I came top of the "individual authors with most commits" list,
being apparently responsible for 7% of all QEMU patches this year :-)
* KVM on POWER/PPC
There were several talks about KVM on PPC architectures; interestingly
this is seeing use not just on the server end but also in the
embedded/realtime space (including a talk from Freescale where they
said they are working on KVM on embedded PPC because of customer
demand for KVM). It was also reassuring to see that another
architecture has preceded us in shaking out x86-isms from KVM.
* KVM Tool
This has got headlines recently as a potential replacement for the
userspace launcher/device model role which QEMU currently plays when
starting guest OSes under KVM. It's intended to be minimal and
lightweight and only to run Linux guests (with paravirtualised devices
for most purposes). The general reaction seemed to be that although
the implementation is currently minimal it will become larger and
bloatier as they add features off their wishlist. There's also some
ill-feeling about the effective namespace grab of calling the
userspace binary "kvm". From an ARM-centric point of view we can just
wait and see whether it gets much traction. Possibly it may turn into
a testbed for technology which is easier to develop on than the
'mature' QEMU which has to deal with backwards compatibility and
supporting users.
* QEMU Object Model and Device Model issues and redesign
For me one of the most important strands of conversation at the
conference was replacing QEMU's device model abstraction with
something better. QEMU's current device model abstraction is "qdev";
this is the (vaguely object-oriented) framework which lets you create
devices, configure them and connect them together. It models the world
as a tree: a root device exposes a bus, to which child devices can
connect; those child devices may expose further buses, and so on.
This works quite well in the PC world where mostly you're interested
in plugging in USB devices, PCI cards, etc; it is rather less well
matched to the embedded board models where things are much less
hierarchical. qdev's major flaws include:
+ insists on bus hierarchy, but not everything is a bus, and in any
case there are often several trees (memory transactions, clock,
interrupts) which don't necessarily coincide
+ no support for composition ("device foo is actually devices bar
and baz glued into one box")
+ just barely supports having devices expose signal (gpio/irq) lines
and memory regions (typically registers), but doesn't let you give
them useful names, so you have to access them by index number
We spent just about all of Thursday's hacking session going through
this. I felt we got good agreement on the problems, and perhaps
80-90% agreement on Anthony Liguori's proposed new QEMU Object Model
as a solution to them; some loose ends still need to be worked
through.
* LinuxCon NA
KVM Forum was colocated with LinuxCon NA this year. My opinion (which
seemed to be shared with the other Linaro attendees I talked to about it)
was that LinuxCon NA suffered from being not very technical and not
very focused -- it wasn't clear to me who they thought their target
audience was. A few points of interest:
+ Linus Torvalds' keynote was reported in some places as more
complaints about ARM hardware but I actually thought it was pretty
positive about the progress we're making in sorting out the issues
+ Matthew Garrett's talk about x86 platform drivers (those things that
deal with LEDs, funny keys, batteries and other odd laptop hardware)
revealed that actually PC hardware manufacturers do just as much
random non-standard undocumented silliness, it's just that accident
of history has limited them to only doing so in the minor bits at
the edges...
-- PMM
Hello Ulrich (or anyone else acquainted with gdb),
Could the gdb test suite be run on a kernel with the below patch applied
please? A confirmation that this patch doesn't regress gdb is required
before this can move ahead. Quick feedback would be greatly
appreciated.
Thanks.
---------- Forwarded message ----------
Date: Thu, 25 Aug 2011 15:55:58 +0100
From: Russell King - ARM Linux <linux(a)arm.linux.org.uk>
To: Tejun Heo <tj(a)kernel.org>, Arnd Bergmann <arnd(a)arndb.de>,
Mark Brown <broonie(a)opensource.wolfsonmicro.com>
Cc: Rafael J. Wysocki <rjw(a)sisk.pl>, linux-kernel(a)vger.kernel.org,
linux-arm-kernel(a)lists.infradead.org
Subject: Re: try_to_freeze() called with IRQs disabled on ARM
On Thu, Aug 25, 2011 at 03:09:07PM +0200, Tejun Heo wrote:
> Hey, Russell.
>
> If you can fix it properly without going through temporary step,
> that's awesome. Let's put the arguments behind, okay?
Here's the patch. As the kernel I've run this against doesn't have the
change to try_to_freeze(), I added a might_sleep() in do_signal() during
my testing to verify that it fixes Mark's problem (which it does.)
I've tested functions returning -ERESTARTSYS, -ERESTARTNOHAND and
-ERESTART_RESTARTBLOCK, all of which seem to behave as expected with
signals such as SIGCONT (without handler) and SIGALRM (with handler).
I haven't tested -ERESTARTNOINTR.
I don't have a test case for the race condition I mentioned (which is
admittedly pretty difficult to construct, requiring an explicit
signal, schedule, signal sequence) but this should plug that too.
How do we achieve this? Effectively the steps in this patch are:
1. Undo Arnd's fixups to the syscall restart processing (but don't worry,
we restore it in step 3).
2. Introduce TIF_SYS_RESTART, which is set when we enter signal handling
and the syscall has returned one of the restart codes. This is used
as a flag to indicate that we have some syscall restart processing to
do at some point.
3. Clear TIF_SYS_RESTART whenever ptrace is used to set the GP registers
(thereby restoring Arnd's fixup for his gdb testsuite problem - it
would be good if Arnd could reconfirm that.)
4. When we setup a user handler to run, check TIF_SYS_RESTART and clear it.
If it was set, we need to set things up to return -EINTR or restart the
syscall as appropriate. As we've cleared it, no further restart
processing will occur.
5. Once we've run all work (signal delivery, and rescheduling events), and
we're about to return to userspace, make a final check for TIF_SYS_RESTART.
If it's still set, then we're returning to userspace having not setup
any user handlers, and we need to restart the syscall. This is mostly
trivial, except for OABI restartblock which requires the user stack to
be written. We have to re-enable IRQs for this write, which means we
have to manually re-check for rescheduling events, abort the restart,
and try again later.
One of the side effects of reverting Arnd's patch is that we restore the
strace behaviour which we've had for years on ARM, and can still be seen
on x86: strace can see the -ERESTART return codes from the kernel syscalls,
rather than what seems to be the signal number:
Before:
rt_sigsuspend([] <unfinished ...>
--- SIGIO (I/O possible) ---
<... rt_sigsuspend resumed> ) = 29
sigreturn() = ? (mask now [])
vs:
rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted)
--- SIGIO (I/O possible) @ 0 (0) ---
sigreturn() = ? (mask now [])
x86:
rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted)
--- {si_signo=SIGIO, si_code=SI_USER} (I/O possible) ---
sigreturn() = ? (mask now [])
So, this patch should fix:
1. The race which I identified in the signal handling code (I think x86
and other architectures can suffer from it too.)
2. The warning from try_to_freeze.
3. The unanticipated change to strace output.
Arnd, can you test this to make sure your gdb test case still works, and
Mark, can you test this to make sure it fixes your problem please?
Thanks.
arch/arm/include/asm/thread_info.h | 3 +
arch/arm/kernel/entry-common.S | 11 ++
arch/arm/kernel/ptrace.c | 2 +
arch/arm/kernel/signal.c | 209 ++++++++++++++++++++++++------------
4 files changed, 155 insertions(+), 70 deletions(-)
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 7b5cc8d..40df533 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -129,6 +129,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
/*
* thread information flags:
* TIF_SYSCALL_TRACE - syscall trace active
+ * TIF_SYS_RESTART - syscall restart processing
* TIF_SIGPENDING - signal pending
* TIF_NEED_RESCHED - rescheduling necessary
* TIF_NOTIFY_RESUME - callback before returning to user
@@ -139,6 +140,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define TIF_NEED_RESCHED 1
#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
#define TIF_SYSCALL_TRACE 8
+#define TIF_SYS_RESTART 9
#define TIF_POLLING_NRFLAG 16
#define TIF_USING_IWMMXT 17
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
@@ -147,6 +149,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define TIF_SECCOMP 21
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
+#define _TIF_SYS_RESTART (1 << TIF_SYS_RESTART)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index b2a27b6..e922b85 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -45,6 +45,7 @@ ret_fast_syscall:
fast_work_pending:
str r0, [sp, #S_R0+S_OFF]! @ returned r0
work_pending:
+ enable_irq
tst r1, #_TIF_NEED_RESCHED
bne work_resched
tst r1, #_TIF_SIGPENDING|_TIF_NOTIFY_RESUME
@@ -56,6 +57,13 @@ work_pending:
bl do_notify_resume
b ret_slow_syscall @ Check work again
+work_syscall_restart:
+ mov r0, sp @ 'regs'
+ bl syscall_restart @ process system call restart
+ teq r0, #0 @ if ret=0 -> success, so
+ beq ret_restart @ return to userspace directly
+ b ret_slow_syscall @ otherwise, we have a segfault
+
work_resched:
bl schedule
/*
@@ -69,6 +77,9 @@ ENTRY(ret_to_user_from_irq)
tst r1, #_TIF_WORK_MASK
bne work_pending
no_work_pending:
+ tst r1, #_TIF_SYS_RESTART
+ bne work_syscall_restart
+ret_restart:
#if defined(CONFIG_IRQSOFF_TRACER)
asm_trace_hardirqs_on
#endif
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2491f3b..ac8c34e 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -177,6 +177,7 @@ put_user_reg(struct task_struct *task, int offset, long data)
if (valid_user_regs(&newregs)) {
regs->uregs[offset] = data;
+ clear_ti_thread_flag(task_thread_info(task), TIF_SYS_RESTART);
ret = 0;
}
@@ -604,6 +605,7 @@ static int gpr_set(struct task_struct *target,
return -EINVAL;
*task_pt_regs(target) = newregs;
+ clear_ti_thread_flag(task_thread_info(target), TIF_SYS_RESTART);
return 0;
}
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 0340224..42a1521 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -649,6 +649,135 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
}
/*
+ * Syscall restarting codes
+ *
+ * -ERESTARTSYS: restart system call if no handler, or if there is a
+ * handler but it's marked SA_RESTART. Otherwise return -EINTR.
+ * -ERESTARTNOINTR: always restart system call
+ * -ERESTARTNOHAND: restart system call only if no handler, otherwise
+ * return -EINTR if invoking a user signal handler.
+ * -ERESTART_RESTARTBLOCK: call restart syscall if no handler, otherwise
+ * return -EINTR if invoking a user signal handler.
+ */
+static void setup_syscall_restart(struct pt_regs *regs)
+{
+ regs->ARM_r0 = regs->ARM_ORIG_r0;
+ regs->ARM_pc -= thumb_mode(regs) ? 2 : 4;
+}
+
+/*
+ * Depending on the signal settings we may need to revert the decision
+ * to restart the system call. But skip this if a debugger has chosen
+ * to restart at a different PC.
+ */
+static void syscall_restart_handler(struct pt_regs *regs, struct k_sigaction *ka)
+{
+ if (test_and_clear_thread_flag(TIF_SYS_RESTART)) {
+ long r0 = regs->ARM_r0;
+
+ /*
+ * By default, return -EINTR to the user process for any
+ * syscall which would otherwise be restarted.
+ */
+ regs->ARM_r0 = -EINTR;
+
+ if (r0 == -ERESTARTNOINTR ||
+ (r0 == -ERESTARTSYS && !(ka->sa.sa_flags & SA_RESTART)))
+ setup_syscall_restart(regs);
+ }
+}
+
+/*
+ * Handle syscall restarting when there is no user handler in place for
+ * a delivered signal. Rather than doing this as part of the normal
+ * signal processing, we do this on the final return to userspace, after
+ * we've finished handling signals and checking for schedule events.
+ *
+ * This avoids bad behaviour such as:
+ * - syscall returns -ERESTARTNOHAND
+ * - signal with no handler (so we set things up to restart the syscall)
+ * - schedule
+ * - signal with handler (eg, SIGALRM)
+ * - we call the handler and then restart the syscall
+ *
+ * In order to avoid races with TIF_NEED_RESCHED, IRQs must be disabled
+ * when this function is called and remain disabled until we exit to
+ * userspace.
+ */
+asmlinkage int syscall_restart(struct pt_regs *regs)
+{
+ struct thread_info *thread = current_thread_info();
+
+ clear_ti_thread_flag(thread, TIF_SYS_RESTART);
+
+ /*
+ * Restart the system call. We haven't setup a signal handler
+ * to invoke, and the regset hasn't been usurped by ptrace.
+ */
+ if (regs->ARM_r0 == -ERESTART_RESTARTBLOCK) {
+ if (thumb_mode(regs)) {
+ regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
+ regs->ARM_pc -= 2;
+ } else {
+#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
+ regs->ARM_r7 = __NR_restart_syscall;
+ regs->ARM_pc -= 4;
+#else
+ u32 sp = regs->ARM_sp - 4;
+ u32 __user *usp = (u32 __user *)sp;
+ int ret;
+
+ /*
+ * For OABI, we need to play some extra games, because
+ * we need to write to the users stack, which we can't
+ * do reliably from IRQs-disabled context. Temporarily
+ * re-enable IRQs, perform the store, and then plug
+ * the resulting race afterwards.
+ */
+ local_irq_enable();
+ ret = put_user(regs->ARM_pc, usp);
+ local_irq_disable();
+
+ /*
+ * Plug the reschedule race - if we need to reschedule,
+ * abort the syscall restarting. We haven't modified
+ * anything other than the attempted write to the stack
+ * so we can merely retry later.
+ */
+ if (need_resched()) {
+ set_ti_thread_flag(thread, TIF_SYS_RESTART);
+ return -EINTR;
+ }
+
+ /*
+ * We failed (for some reason) to write to the stack.
+ * Terminate the task.
+ */
+ if (ret) {
+ force_sigsegv(0, current);
+ return -EFAULT;
+ }
+
+ /*
+ * Success, update the stack pointer and point the
+ * PC at the restarting code.
+ */
+ regs->ARM_sp = sp;
+ regs->ARM_pc = KERN_RESTART_CODE;
+#endif
+ }
+ } else {
+ /*
+ * Simple restart - just back up and re-execute the last
+ * instruction.
+ */
+ setup_syscall_restart(regs);
+ }
+
+ return 0;
+}
+
+/*
* Note that 'init' is a special process: it doesn't get signals it doesn't
* want to handle. Thus you cannot kill init even with a SIGKILL even by
* mistake.
@@ -659,7 +788,6 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
*/
static void do_signal(struct pt_regs *regs, int syscall)
{
- unsigned int retval = 0, continue_addr = 0, restart_addr = 0;
struct k_sigaction ka;
siginfo_t info;
int signr;
@@ -674,32 +802,16 @@ static void do_signal(struct pt_regs *regs, int syscall)
return;
/*
- * If we were from a system call, check for system call restarting...
+ * Set the SYS_RESTART flag to indicate that we have some
+ * cleanup of the restart state to perform when returning to
+ * userspace.
*/
- if (syscall) {
- continue_addr = regs->ARM_pc;
- restart_addr = continue_addr - (thumb_mode(regs) ? 2 : 4);
- retval = regs->ARM_r0;
-
- /*
- * Prepare for system call restart. We do this here so that a
- * debugger will see the already changed PSW.
- */
- switch (retval) {
- case -ERESTARTNOHAND:
- case -ERESTARTSYS:
- case -ERESTARTNOINTR:
- regs->ARM_r0 = regs->ARM_ORIG_r0;
- regs->ARM_pc = restart_addr;
- break;
- case -ERESTART_RESTARTBLOCK:
- regs->ARM_r0 = -EINTR;
- break;
- }
- }
-
- if (try_to_freeze())
- goto no_signal;
+ if (syscall &&
+ (regs->ARM_r0 == -ERESTARTSYS ||
+ regs->ARM_r0 == -ERESTARTNOINTR ||
+ regs->ARM_r0 == -ERESTARTNOHAND ||
+ regs->ARM_r0 == -ERESTART_RESTARTBLOCK))
+ set_thread_flag(TIF_SYS_RESTART);
/*
* Get the signal to deliver. When running under ptrace, at this
@@ -709,19 +821,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
if (signr > 0) {
sigset_t *oldset;
- /*
- * Depending on the signal settings we may need to revert the
- * decision to restart the system call. But skip this if a
- * debugger has chosen to restart at a different PC.
- */
- if (regs->ARM_pc == restart_addr) {
- if (retval == -ERESTARTNOHAND
- || (retval == -ERESTARTSYS
- && !(ka.sa.sa_flags & SA_RESTART))) {
- regs->ARM_r0 = -EINTR;
- regs->ARM_pc = continue_addr;
- }
- }
+ syscall_restart_handler(regs, &ka);
if (test_thread_flag(TIF_RESTORE_SIGMASK))
oldset = ¤t->saved_sigmask;
@@ -740,38 +840,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
return;
}
- no_signal:
if (syscall) {
- /*
- * Handle restarting a different system call. As above,
- * if a debugger has chosen to restart at a different PC,
- * ignore the restart.
- */
- if (retval == -ERESTART_RESTARTBLOCK
- && regs->ARM_pc == continue_addr) {
- if (thumb_mode(regs)) {
- regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
- regs->ARM_pc -= 2;
- } else {
-#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
- regs->ARM_r7 = __NR_restart_syscall;
- regs->ARM_pc -= 4;
-#else
- u32 __user *usp;
-
- regs->ARM_sp -= 4;
- usp = (u32 __user *)regs->ARM_sp;
-
- if (put_user(regs->ARM_pc, usp) == 0) {
- regs->ARM_pc = KERN_RESTART_CODE;
- } else {
- regs->ARM_sp += 4;
- force_sigsegv(0, current);
- }
-#endif
- }
- }
-
/* If there's no signal to deliver, we just put the saved sigmask
* back.
*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Hi,
the current gcc-4.6 packages build for both softfp and hard, so that the armel
and (not yet existing) armhf packages can be installed together in the system.
To enable multilib, I currently use the rather complicated arm-multilib.diff,
which works, but doesn't seem to be correct. With the much simpler arm-ml2.diff,
the directory for the default multilib is not resolved to . (as done for e.g.
amd64).
[amd64] $ gcc -print-multi-directory
.
[armel] $ gcc -print-multi-directory
sf
If I understand the code correctly, this comes from the hard setting of
MULTILIB_DEFAULTS in the arm target. If you look at mips, you see
#ifndef MULTILIB_DEFAULTS
#define MULTILIB_DEFAULTS \
{ MULTILIB_ENDIAN_DEFAULT, MULTILIB_ISA_DEFAULT, MULTILIB_ABI_DEFAULT }
#endif
which records the proper selected defaults.
Should something similiar be done for arm?
Matthias
Hi; I've just completed a tricky rebase of qemu-linaro on upstream;
there were several invasive upstream changes which have landed
recently and which meant that I had to tweak a lot of the qemu-linaro
patches as I did the rebase. I've tested the results but it's possible
that some breakage may have slipped through...
So if you're a regular user of qemu-linaro's system mode and feel
like checking the sources out of git:
git://git.linaro.org/qemu/qemu-linaro.git
building them and testing that the things you regularly do with it
haven't regressed, then I'd appreciate it, and you can help us avoid
any nasty surprises in the next (2011.09) release.
Thanks!
-- Peter Maydell
See:
http://builds.linaro.org/toolchain/gcc-4.7~svn178154
The problem is -Werror triggering on:
../../../gcc-4.7~/gcc/config/arm/arm.c: In function 'int
optimal_immediate_sequence_1(rtx_code, long long unsigned int,
four_ints*, int)':
../../../gcc-4.7~/gcc/config/arm/arm.c:2690:46: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2690:60: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2691:20: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2691:34: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2701:16: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2702:18: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
../../../gcc-4.7~/gcc/config/arm/arm.c:2703:18: error: comparison
between signed and unsigned integer expressions [-Werror=sign-compare]
-- Michael
Booked hotel and travel for Linaro Connect in Orlando.
Fixed a couple of bugs in my thumb2 constants patch and retested. The
test results came back clean, so I've committed it upstream.
Bernd claimed he has found some test failures that might be caused by my
patch, but I couldn't reproduce them at first. I've now got the failure,
but I've not yet investigated the cause. Next week ...
Committed my widening multiplies patches to Linaro GCC, after first
convincing Richard Sandiford that it wasn't totally bonkers.
Started work on ARM GCC tuning options:
* Submitted a patch for -m{arch,cpu,tune}=native
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg14225.html
* Submitted a patch for -m{arch,cpu,tune}=generic-armv7-a
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg14231.html
Joseph found an issue with those patches, but that was easily resolved
and I've reposted both.
RAG:
Red:
Amber: OMAP3 patch upstreaming is (still) slower progress than hoped
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
== upstream-omap3-patches ==
* finished the fairly nasty rebase of qemu-linaro onto upstream master
(several invasive changes went into master that meant a number of
our local patches needed updating)
* omap_gpmc changes cleaned up, updated to use MemoryRegions, and
submitted to upstream (17 patch patchset)
== gsoc-support ==
* final meeting/evaluation writeup now the GSoC project has ended
* we now have a first pass at what some upstream-acceptable versions
of the Android goldfish platform devices might look like, and a
much better idea of the degree of difference between the android
and upstream qemu trees, and where the pitfalls/issues lie
== other ==
* fixed some breakage upstream in n810 and integratorcp models caused
by landing of MemoryRegion changes
* trying to write up what my preferred model of device connections
would look like in concrete C implementation terms
* interesting discussion on boot-architecture list about how boot
loaders should start hypervisor-aware software (xen, kvm kernel):
http://www.mail-archive.com/boot-architecture@lists.linaro.org/msg00053.html
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Oct 30-Nov 04: Linaro Connect Q4.11
== This week ==
* Wrote some patches to make SMS schedule register moves. They made a
significant difference to some libav loops. I'm running a regression
test on pwoerpc-ibm-aix5.3.0 and will submit upstream next week if
all goes OK.
* Looked at why mjpegenc was so much worse with SMS. Turned out to be
a register spilling problem. Found that -fira-algorithm=priority
avoids the regression and makes several other tests better too.
(I just tested that to see whether there was a feasible register
allocation for these cases; -fira-algorithm=priority isn't the
way to go.)
* Saw that the register allocator seemed to be tripping over the
XImode "structure" values, and that we still had one vector move
per structure element by the time we get to the scheduling passes.
Eliminated those with a combination of one fix and one hack.
It seemed to avoid the allocation problems.
* Patch review (Linaro and upstream).
* Backported libgcc visibility fix to 4.6 and 4.5.
== Next week ==
* Submit register-scheduling patch.
* Submit memory cost patch (from auto-inc-dec changes)
* Possibly submit the auto-inc-dec changes themselves, depending on
how the rtx cost discussion goes.
Richard
==GCC==
===Progress===
* Looked at the vectorize_with_neon_quad failure again and decided
that I had to handle another case but not convinced that the extra
stall we'd get in this case was worth it. In any case it would have
been a workaround but Richard Sandiford fixed this by getting df to do
the right thing which would have been the right fix.
* Backported tbh patch.
* Backported conditional execution improvements patch from Jiangning
to Linaro 4.6 branch.
* Committed the LTO + Neon / Android intrinsics patch.
* Panda seems more reliable this week but I suspect that's the room
cooling more .
* Broke up a few blueprints and marked some as done.
* BRANCH_COST results show not a huge variation in SPEC and there are
some results that are inconsistent.. Need to run a few benchmarks
again Sigh :( .
* Finished the A9 scheduler patch for smull and friends and committed
upstream and into Linaro 4.6.
* Reviewed the shrink-wrapping patch and the widening multiplies patch
for a short duration.
* Looked at the failures in the "popular embedded benchmark" for
sometime with Asa.
* Tried one of the ICE patches and that seemed to work just fine with
bootstrap on FSF trunk. Need to figure out why this was breaking in
the Linaro 4.6 tree. https://bugs.launchpad.net/gcc-linaro/+bug/689887
=== Plans ===
Next Week - Holiday :) Feet not up but walking in what looks like
typical bank holiday weather ... Might check email later in the week.
Meetings:
* 1-1s
* TCWG calls
* Thumb2 performance call.
Absences.
* 29th Aug - Sept. 2 - Holiday booked and approved.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked - hotel
to be booked.
* Investigated the errors in the automotive test and concluded that they are
CRC-errors, but not depending on the test case result (non intrusive crc
check). We decided these errors need to be cleared out once and for all.
Michael and Ramana helping out with continued investigation.
* EEMBC run on both Panda and Snowball with gcc4.5.2. Results look
reasonable, but Michael will also have a look. I will spend a little more
time comparing the results from the two boards.
* Started to run SPEC2K on the Panda board.
Best Regards
Åsa
Following on from yesterday's call about what it would take to enable
SMS by default: one of the problems I was seeing with the SMS+IV patch
was that we ended up with excessive moves. E.g. a loop such as:
void
foo (int *__restrict a, int n)
{
int i;
for (i = 0; i < n; i += 2)
a[i] = a[i] * a[i + 1];
}
would end up being scheduled with an ii of 3, which means that in the
ideal case, each loop iteration would take 3 cycles. However, we then
added ~8 register moves to the loop in order to satisfy dependencies.
Obviously those 8 moves add considerably to the iteration time.
I played around with a heuristic to see whether there were enough
free slots in the original schedule to accomodate the moves.
That avoided the problem, but it was a hack: the moves weren't
actually scheduled in those slots. (In current trunk, the moves
generated for an instruction are inserted immediately before that
instruction.)
I mentioned this to Revital, who told me that Mustafa Hagog had
tried a more complete approach that really did schedule the moves.
That patch was quite old, so I ended up reimplementing the same kind
of idea in a slightly different way. (The main functional changes
from Mustafa's version were to schedule from the end of the window
rather than the start, and to use a cyclic window. E.g. moves for
an instruction in row 0 column 0 should be scheduled starting at
row ii-1 downwards.)
The effect on my flawed libav microbenchmarks was much greater
than I imagined. I used the options:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
-fmodulo-sched -fmodulo-sched-allow-regmoves -fno-auto-inc-dec
The "before" code was from trunk, the "after" code was trunk + the
register scheduling patch alone (not the IV patch). Only the tests
that have different "before" and "after" code are run. The results were:
a3dec
before: 500000 runs take 4.68384s
after: 500000 runs take 4.61395s
speedup: x1.02
aes
before: 500000 runs take 20.0523s
after: 500000 runs take 16.9722s
speedup: x1.18
avs
before: 1000000 runs take 15.4698s
after: 1000000 runs take 2.23676s
speedup: x6.92
dxa
before: 2000000 runs take 18.5848s
after: 2000000 runs take 4.40607s
speedup: x4.22
mjpegenc
before: 500000 runs take 28.6987s
after: 500000 runs take 7.31342s
speedup: x3.92
resample
before: 1000000 runs take 10.418s
after: 1000000 runs take 1.91016s
speedup: x5.45
rgb2rgb-rgb24tobgr16
before: 1000000 runs take 1.60513s
after: 1000000 runs take 1.15643s
speedup: x1.39
rgb2rgb-yv12touyvy
before: 1500000 runs take 3.50122s
after: 1500000 runs take 3.49887s
speedup: x1
twinvq
before: 500000 runs take 0.452423s
after: 500000 runs take 0.452454s
speedup: x1
Taking resample as an example: before the patch we had an ii of 27,
stage count of 6, and 12 vector moves. Vector moves can't be dual
issued, and there was only one free slot, so even in theory, this loop
takes 27 + 12 - 1 = 38 cycles. Unfortunately, there were so many new
registers that we spilled quite a few.
After the patch we have an ii of 28, a stage count of 3, and no moves,
so in theory, one iteration should take 28 cycles. We also don't spill.
So I think the difference really is genuine. (The large difference
in moves between ii=27 and ii=28 is because in the ii=27 schedule,
a lot of A--(T,N,0)-->B (intra-cycle true) dependencies were scheduled
with time(B) == time(A) + ii + 1.)
I also saw benefits in one test in a "real" benchmark, which I can't
post here.
Richard
Hello,
Following today performance call
(https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2011-08-23)
here are some points raised regarding the steps towards enabling SMS by default:
* Benchmarks testing:
-- Running benchmarks as EEMBC and SPEC2006 with SMS enabled is
crucial to expose loops where SMS degrades the performance. those
loops need to be analysed to construct a cost model.
-- SMS increases code size by introducing prologue and epilogue to the
loop kernel. This should also be measured.
-- Measure increase in compile time: on native or cross build?
Currently SMS fails to bootstrap trunk on ARM machine. this should
also be taken into account when considering enabling it by default.
Should it be turned on with -O2 or -O3?
SMS flags to use for testing:
-O3 -fmodulo-sched-allow-regmoves -fmodulo-sched
-funsafe-loop-optimizations -fno-auto-inc-dec
Thanks,
Revital
Hi
Some time ago we agreed that not everyone here uses Ubuntu distribution
and decided to provide so called 'generic linux' cross toolchain.
Recently I managed to get it done and now need brave testers to tell is
it working or not.
Get it here: http://people.linaro.org/~hrw/generic-linux/ (64bit only)
Needed files are toolchain-11.07.tar.xz and init.sh script. Unpack
tarball from / so /opt/linaro/11.07/ will be populated and put init.sh
anywhere you want (it will be integrated into tarball later).
How to use:
$ source init.sh
this will add cross toolchain into PATH and also set LD_LIBRARY_PATH to
two directories:
- one with binutils libraries
- second with all extra libraries which may be needed
Feel free to experiment with second dir by removing files from there and
checking are system provided libs are fine too.
So far I checked this toolchain under few distributions:
- Ubuntu 10.04 'lucid' LTS
- Ubuntu 11.04 'natty'
- Fedora 14
- OpenSUSE 11.4
- CentOS 5.6
It failed only under CentOS (which was expected due to it's age).
How did I checked? So far compilation of 'gpm' and 'zlib' were tested.
==GCC==
===Progress===
* Continue to look at the test failure with mvectorize-with-neon-quad.
Should be able to commit the backend workaround in on Monday .
* Having some problems getting my panda board working reliably. I'm
not sure if its the temperature or what but when it gets hot in the
office as it was on Tuesday keeping it working reliably is hard. The
board locks up and then crashes quite often.
* Looked at VFP moves again for some more time.
* Committed tbh range change.
* Committed fixes for PR50022
=== Plans ===
* Finish off VFP moves patch.
* Look at BRANCH_COST results.
* Breakdown the T2 performance blueprints into smaller blueprints.
* Backport tbh range changes to Linaro 4.6
* Test the intrinsics patch once with some more intrinsics tests and
then merge it in to Linaro gcc 4.6
Meetings:
* 1-1s
* TCWG calls
Absences.
* 29th Aug - Sept. 2 - Holiday booked and approved.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked - hotel
to be booked.
Hi all,
I'm having real trouble here :(
I just can't seem to get bzr to work! I've tried to branch
gcc-linaro/4.6 again and again, and it just won't. My other machine
refuses to do the merge from lp:gcc/4.6, presumable because the bzr on
there is too old.
I'm stuck. Can anybody else do the merge from upstream?
I'm going to keep trying.
Andrew
* GCC
Completed merging GCC 4.5 from FSF to Linaro.
Spun release tarballs for Linaro GCC 4.5 and 4.6. Uploaded them to
Michael's server, and kicked off the test builds remotely.
Submitted expenses for Linaro Connect.
Finally (!) committed my widening multiplies patches to FSF. :)
Continued trying to figure out what's wrong with my thumb2 constants
patches. I think I have identified a possible flaw, but I'm having
trouble reproducing the problem as I have been unable to pin down a
specific constant/expression combination that makes it through all the
other optimizations intact, and triggers the problem. I've not run out
of idea yet though ...
* Other
On leave all day Wednesday.
Prepared for the big CodeSourcery to Mentor switch-over by moving all my
work-in-progress data over to the new servers.
== String routines ==
* Working through updating my eglibc patch for memchr, I think I'm
nearly there - took way too long
to persuade an eglibc make check to work cross (can't get native
build happy).
== QEMU ==
* Sent a new version of my QEMU patch for the atomic helpers to Peter.
* Tested the Android beagle image on a real beagle - it fails in
pretty much the same way as the
QEMU run.
== Other ==
* Had a brief look at bug 825711 - scribus ftbfs on ARM - this is QT
being built to define qreal as
float on ARM when it's double on most other things, scribus
having a qreal variable and something
it's defined as a double and then passing it to a template that
requires two arguments of the same type;
not really sure which one is to blame here!
I'm on holiday next week.
Dave
== GDB ==
* Created and published Linaro GDB 7.3 2011-08 release.
* Analyzed --with-sysroot=remote: testsuite failures,
and opened bug LP #829595.
* Reviewed Yao's latest Thumb-2 displaced stepping patch.
== Schedule ==
* I'll be on vacation 08/23 through 08/31.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi,
* continued to work on getting libunwind support for remote unwinding
upstream
* reworked some of the code to address concerns from the ml
* now upstream!
* made smaller fixes to have another libunwind testcase passing
* interfaced with the Linaro Android group to solve an issue where a
compile was failing when using -O3
* turned out that the Linaro GCC vectorizes a loop by generating some
neon instructions
* unfortunately the gas of the 2.20.1 binutils (that is currently
used by the Linaro Android toolchain) doesn't properly understand the
alignment restrictions of the generated asm code and throws an error
* this has be fixed upstream and using a gas from recent binutils
fixes the issue
* Bernhard is already working on getting newer binutils in their
Androuid toolchain build system
* continued the work to get libunwind building on Linaro Android
* wrote an Android.mk and got an initial libunwind.so built (ugly
hacks involved)
* next step is modify the debuggerd to make use of libunwind.so
Note: Next week I'll be on vacation.
Regards
Ken
== This week ==
* Looked at LP #823711. Turned out to be a problem with symbol
visibility in libgcc.a. Tested a fix that was accepted and applied
upstream. Will backport to upstream release branches, so we should
be able to pull the fix in that way.
* Backported the fix for BZ PR49987 to Linaro 4.6 and 4.5.
* Looked at the regrename bug that Ramana reported on gcc@.
* Looked at why libav wasn't being vectorised. Discussed with Ira.
I think we now have a Plan.
* Submitted address writeback scheduling patches upstream.
* Submitted and applied some tweaks to the rtx cost interface upstream.
* Spent a while trying to figure out what the targetm.rtx_costs
API actually is, and how rtx_cost should use it to evaluate the
cost of a SET. Discussed on gcc@.
* Found that ARM was giving SETs a base cost of 4 instructions.
Benchmarked the cost of "fixing" this. It generally seemed positive.
* Wrote a couple of other rtx cost patches.
== Next week ==
* Backport fix for #823711 to upstream branches.
* Hopefully finish off rtx costs stuff.
* Unless there's a clear outcome from the gcc@ discussion, I think
I'll abandon my idea of using insn_rtx_cost in the new auto inc/dec
patch, and simply sum the cost of every SET. Should be a small change.
Richard
* Started running EEMBC on Panda. Got three errors in the automotive test at
this point.
* Started documenting necessary steps for my start-up task:
https://wiki.linaro.org/Internal/ToolChain/Benchmarks/First%20time%20notes
* Upgraded the Snowball board to the latest version (V3). Created a
corresponding test image for Snowball (Linaro 11.06). There is a problem
with the serial console freezing after a couple of minutes without any
error, not sure if it is a complete crash or just the serial output. The
people I have talked to so far has not experienced the same problem. I will
set up the networking for the board and see where ssh gets me.
Best Regards
Åsa
Hi,
- change of default vector size for auto-vectorization on NEON -
submitted and approved
- continued working on vectorization of widening shifts
- looked into SLP vectorization for libav
- two vacation days
I'll be on vacation on Aug 22-30.
Ira
I put a build harness around libav and gathered some profiling data. See:
bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite
It includes a Makefile that builds a C only, h.264 only decoder and
two Creative Commons licensed videos to use as input.
README.rst has the basic commands for running ffmpeg and initial perf
results showing the hot functions. Dave, 20 % of the time is spent in
memcpy() so you might want to have a look.
The vectoriser has no effect. GCC 4.5 is ~17 % faster than 4.6. I'll
look into extracting and harnessing the functions themselves later
this week.
-- Michael
Hi,
is the Linaro toolchain (esp. gcc) useful on x86/x86_64, or is an
attempt to use the Linaro toolchain with such a target just asking for
trouble?
(No, I'm not secretly an Intel spy ;) Just trying to have some fun
with my desktop machine ;) )
ttyl
bero
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.3.
Linaro GDB 7.3 2011.08 is the first release in the 7.3 series. Based
off the latest GDB 7.3, it includes a number of ARM-focused bug fixes.
This release includes all bug fixes from the latest Linaro GDB 7.2
release that were not already included in FSF GDB 7.3.
In addition, this release fixes:
* LP: #804401 [remote testsuite] Thread support
* LP: #804387 [remote testsuite] Shared library test problems
* LP: #804392 [remote testsuite] Rebuilt executables not copied
* LP: #804396 [remote testsuite] Spurious failures
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.3-2011.08
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.08.
Linaro QEMU 2011.08 is the latest monthly release of qemu-linaro. Based
off upstream (trunk) QEMU, it includes a number of ARM-focused bug fixes
and enhancements.
This month's release is primarily minor improvements:
- Fixes LP:816791: ARMv6 cp15 barrier instructions now work
in linux-user mode as well as system mode
- Support for ARM1176JZF-S core has been added (thanks to
Jamie Iles <jamie(a)jamieiles.com>)
- Add workaround for kernel bug LP:727781 (which has resurfaced
in 3.0) to suppress warnings about bad-width omap i2c accesses
Plus of course new upstream fixes and improvements.
Performance:
When running qemu in system mode with an SD card image we have
determined that performance is best when the image is in writeback
caching mode. This significantly increases the performance of the SD
card (by factors of 10 or more). An example command line option is:
-drive if=sd,cache=writeback,file=my-sd-card.img
Note that cache=writeback may result in data not being written to
disk if the host system powers down unexpectedly (guest crashes
or powerdowns are not a problem).
Known issues:
- The beagle and beaglexm models still do not support USB networking
- There may be some problems with running multithreaded programs in
linux-user mode (LP:823902)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.08
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The Linaro Toolchain Working Group is pleased to announce the 2011.08
release of both Linaro GCC 4.6 and Linaro GCC 4.5.
Linaro GCC 4.6 2011.08 is the sixth release in the 4.6 series. Based
off the latest GCC 4.6.1+svn177703, it focuses on fixing bugs found
during the Android integration and in SMS. This is a quiet release
due to Linaro Connect.
Interesting changes include:
* Updates to 4.6.1+r177703
Fixes:
* LP: #736007 ICE immed_double_const at emit-rtl.c
* LP: #809768 ICE when compiling bionic's libm
* LP: #815777 Inconsistent packaging between tarball and root
directory names
Linaro GCC 4.5 2011.08 is the thirteenth release in the 4.5
series. Based off the latest GCC 4.5.3+svn177552, the release is
focused on maintenance.
Interesting changes in 4.5 include:
* Updates to 4.5.3+r177552
* Now builds for PowerPC
Fixes:
* LP: #736007 ICE immed_double_const at emit-rtl.c
* LP: #809768 ICE when compiling bionic's libm
* LP: #815435 ICE: insn does not satisfy its constraints
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.6-2011.08https://launchpad.net/gcc-linaro/+milestone/4.5-2011.08
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
On Thu, Aug 18, 2011 at 4:16 AM, Richard Earnshaw
<Richard.Earnshaw(a)arm.com> wrote:
> I was just browsing libgmp this afternoon and noticed that it really
> could do with an overhaul to support recent ARM chips.
>
> The ARM code seems to have been written for StrongARM; which is now
> almost obsolete (for example, it loads from a cache line it is about to
> write to in order to pre-allocate the line in the cache).
>
> It doesn't support v4T interworking.
>
> It doesn't make any use of v5 or later instructions.
>
> There is some Thumb(1) code, but again it has no support for
> interworking, is pretty poor and limited in scope.
>
> I'm not sure overall how useful this is to gcc performance; the library
> is needed to build GCC, but I think it's mostly there to support libmpfr.
>
> Nevertheless, there are other apps out there that make use of this
> stuff, including some crypto code, IIRC.
I looked at using gmp as a benchmark some time ago. The assembly
version is twice as fast as the C version already, which is nice. I
assume NEON would be a big improvement as well.
I had a quick poke through the dependencies in Ubuntu and came up with
the following popular packages that use libgmp or libmpfr:
* guile
* python-crypto
* gch (Haskel)
* maxima
* darcs
Nothing earth shattering but probably worthwhile. I've registered:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/improve-libgmp
so that we don't lose it.
-- Michael
Hi there. The 2011.08 release has been spun and is testing up well.
The 4.5 and 4.6 branches are now open so feel free to commit any
approved patches.
-- Michael
> . Would you be interested in adding a Firefox-based benchmark? As a large
> application it is a good testbed for LTO, FDO and other aggressive
> optimizations.
Sorry about the delayed response. I did notice your mail last week but
I was busy with our conference and then the first couple of days this
week have just disappeared with some internal training.
I would be interested in hearing how you get on with LTO and FDO on
ARM. Listening to Honza talking at the GCC unconference in London
about the memory usage for full LTO with trunk I did wonder what would
happen if we tried it on the ARM target to see what we got, but I
never managed to get around to trying anything there :) . We did look
at getting FDO working with Linaro GCC last cycle but there are still
a couple of issues with PGO in Linaro GCC 4.5.
With respect to LTO , the one problem we have currently is that the
Neon intrinsics aren't streamed out and streamed back in. So you might
have a few issues if your code uses arm_neon.h .
https://bugs.launchpad.net/gcc-linaro/+bug/823548 is an example of
this problem. This was fixed upstream and we probably just need to
backport that into our 4.6 tree. I've tried a backport this morning
and I think I have this right finally.
If you could do a build and a firefox benchmark run in about 30-60
minutes by all means please do let us know how you get on and what you
find. We've been steadily trying to improve the performance of the ARM
toolchain and the biggest improvements you'll notice will be with the
vectorizer but there will be other small improvements that you'll
notice in other general areas of code generation. We would be
interested in feedback about what can be done and to add to our queue
of things to look at and improve for the ARM port of GCC.
With respect to the images, Kiko's probably answered that bit.
cheers
Ramana
* GCC
Continued tracking down problems in my various broken patches. Fixed one
bug, investigated two more. Re-submitted the widening multiplies for
testing, and this time it returned with no problems. Yay, I can now
check it in next week.
Merged from upstream GCC 4.5. The launchpad import bug still exists
(although should not for much longer) so I had to ask on #launchpad to
get the imports done. Submitted the merged branch for testing.
Tried to merge GCC 4.6 similarly, but failed. Bzr just refused to play
ball, which was very frustrating. Michael Hope has now done the merge
instead.
* Other
On leave Wednesday and Friday.
* libauqntum - running the SMSed version on ARM machine did not show
significant improvement. Discussed it with Richard Sandiford.
Apparently in the SMS phase the instructions are of DI mode due to the
fact the loop contains 64 bit operations while they later been
generated as 32 bit operations. This makes SMS less accurate and I'm
now looking into a version which disables DI mode operations.
* Started to look at the potential of SMS on libav. Initial runs of
Richard's microbenchmarks with SMS show some regressions as well as
improvements that I'm looking at.
Hi there. I've written up the standard configurations that we use to
build and test Linaro GCC:
https://wiki.linaro.org/WorkingGroups/ToolChain/Configurations/GCC
It includes such things as flags, libraries, and sysroots. You might
find it useful to see what we're testing or, if new to compilers, what
a good starting point is.
-- Michael
== QEMU ==
* Finished off a first cut of the 64bit helper patch to QEMU
- Gave it to Peter and have reworked most of the things he commented on
* This also lead into a bit of a rabbit hole of finding various
generic QEMU threading issues
* Tested Peter's 11.08 QEMU release
(I used linaro-fetch-image-ui for the first time to grab the
release images; quite nice, hit
a couple of issues but much nicer than crawling around the site
to find where the hwpacks
are).
== Other ==
* Pinged gcc patches list for more comments on 64bit atomic patch
I'm on holiday the week of 22nd (i.e. the week after next).
Dave
== GDB ==
* Re-tested Linaro GDB 7.3 on Versatile Express (native
& remote testing).
* Committed patch to re-enable remote thread test cases
(#804401) to mainline and Linaro GDB 7.3.
* Reviewed Yao's latest Thumb-2 displaced stepping patch.
== GCC ==
* Patch review week.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber: OMAP3 patch upstreaming is slower progress than hoped
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
== linaro-qemu-11.11 ==
* put together release candidate tarball for 2011.08 release, tested
* added a workaround for omap kernel bug LP:727781 which had been fixed
in 2.6.x but has resurfaced in 3.0
* tarball now ready and only needs releasing next week
== 64-bit-sync-primitives ==
* reviewed David Gilbert's qemu patches to support 64 bit sync primitives
== upstream-omap3-patches ==
* testing/reading Avi's memory API patches to see how they fit in or
clash with the qdevification and other omap3 patches
== other ==
* more investigation/thought about LP:823902 -- qemu bug running
multithreaded programs in linux-user mode
* Manned Linaro demo stand at ARM Partner Meeting (Tue, Wed)
* Meetings: GSoC student x2, toolchain, toolchain standup, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
15-19 August: KVM Forum and LinuxCon NA, Vancouver