Hi,
Agenda for today's performance call . Sorry about the last minute
posting and I'll put this in the wiki soon enough.
1. Sync-up on what's been happening around the group:
a. Coremark regressions.
b. Thumb2 constants patch.
c. divmodsi4 and vfp register moves.
d. DENBench investigations.
2. Planning for the summit and turning some of the ideas into blueprints.
3. AOB.
cheers
Ramana
== Last week ==
* PR48250, rehaul arm_legitimize_reload_address(). Richard Sandiford
caught a bug of mine where I overlooked the valid index range of NEON
quad-word load/stores. Quickly whipped up a fix, soon approved and
committed upstream.
* LP #744754, ICE in NEON struct-mode auto-inc-dec MEMs. Pushed upstream
patch for a merge to Linaro 4.5.
* PR46888, bit-field insert optimization patch. Resumed investigating,
mailed Andrew Pinski for more information on that REG_EQUAL note issue
he mentioned on gcc-patches; can't quite reproduce it myself.
* CoreMark ARMv6/v7 regressions: posted a patch set to gcc-patches.
Still waiting review.
* Reported to Bernd and AndrewS on an issue (LP #748138) which seems to
be related to the shrink-wrap patch. This ICE does not seem to be
avoided by doing -fno-shrink-wrap.
* A few tasks related to Linaro-Budapest event travel.
== This week ==
* Do the merge of the new combine patches to Linaro, and test.
* LP #689887 is still in progress.
* Hope to experiment with a few more optimization ideas.
Michael mentioned that some users reported seeing better preformance from
RVCT using arm_neon.h then they did when coding directly in assembler.
He suggested we try the same thing for GCC. Here's an experiment using
the example that Jim Huang posted to the dev list recently:
https://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance
The summary is that the C version needs to borrow a trick from the
assembly code in order to be competitive. If it does that, though,
the C code can be faster. I think this is mostly down to scheduling,
although I haven't checked in detail yet.
Richard
== String and Memory routines ==
* Profiled denbench with perf and produced a set of stats to show
which programs spent how much time in libc and how
much time was spent in each routine. While some of the
benchmarks are good (like aes) and spend almost no time in libc
some of the others (MPEG codecs especially) seem to spend
significant times in libc.
* Ran all of denbench through latrace to generate sets of library
calls; post processed them to extract the section between the clock()
calls (and hence in the timed portion) and analysed the hot library
calls. I've looked at some of the output but not all of it yet; I
get output like:
Memcpy stats (dst align/src align/length/number of occurrences/total
size copied)
memcpy: 0,0,1 , 1588520, 1588520
memcpy: 16,28,4096 , 1, 4096
memcpy: 4,20,16384 , 855, 14008320
This shows that for a bunch of tests they do an inordinate number of 1
byte memcpy's, and a few hundred larger memcpy's with an address %32
which is 4
(and destination %32 is 20) - so not aligned but at least equally misaligned.
* Started writing up a report on some of the stats
* Also started to try and extract the same stuff from SPEC2k6
== QEMU ==
* Tested Peter's QEmu release earlier in the week (On Lucid so
didn't hit his natty bug)
* Wrote up a couple of specs (one for TrustZone and the other for
Device Tree integration)
== GDB ==
* Created Linaro GDB 7.2-2011.04-0 release.
* Committed patch to fix accessing "fpscr" register to Linaro GDB.
* Failure to disable address space randomization (bug #616001) has been
fixed in the kernel; closed the Linaro GDB bug.
== GCC ==
* Ongoing analysis of bug #759409 (Profiled bootstrap fails). Identified
two independent problems, one related to a new CodeSourcery feature,
and one a mis-optimization of final-stage cc1plus which is also present
in FSF GCC 4.5 (PR 43085). Ongoing investigation to track down the
root cause of the latter problem.
== Schedule ==
* Public holidays 04/22 - 04/25.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber:
Green: another monthly qemu-linaro release out on schedule
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
== maintain-beagle-models ==
* qemu-linaro 2011.04 tested and released
* had to do another last minute -1 respin to fix a problem caught by
ubuntu package builds; we need to come up with a process that lets
us do test package builds prior to release so we can fix this sort
of issue in a less last-minute fashion
== merge-correctness-fixes ==
* sent patch: fix semihosting SYS_HEAPINFO (seems to have issues though)
* sent patch: UNDEFs in Neon load/store space
* sent patches: fix build issues on sparc
* sent patch: bump the initrd load address to work with bigger kernels
* sent patch: set Invalid flag for float-to-int conversion of NaN
* sent patch: move vld/vst multiple to helper functions
* reviewed patches from Aurelien doing some general softfloat cleanup
* sent out a version of my performance counters patch which just does
a basic dummy implementation without the cycle counter (since the
cycle counter bits were going off down a blind alley rather and this
part is the last thing needed to be able to boot Linaro vexpress
images on stock upstream QEMU)
== other ==
* trying to nail down proposed QEMU work for next cycle; work-in-progress:
https://wiki.linaro.org/PeterMaydell/Qemu1111
* meetings: toolchain, standup
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
Hi,
libunwind:
* added initial support for resuming at a certain stack frame
* posted unw_resume support plus some some testsuite fixes on the ml
* there are still some issues left if signal handlers/frames are involved
Note: Friday is a public holiday.
Regards
Ken
== This week ==
* Iterated with upstream on some of the vectorisation patches. I think
only half a patch (the ARM implementation of array_mode_supported_p)
is still pending review; everything else has been approved.
* Backported the vldN and vstN intrinsics to Linaro 4.5.
* Finished off the microbenchmarks for libav.
* One of the problems in the original libav output was that the
vectoriser didn't realise that a group of N accesses really did form
a group. It instead generated N separate interleave operations and took
1 vector from each.
Submitted a fix for that, which is now committed upstream. Updated the
libav wiki page with the new, improved output.
(This actually allowed more libav loops to be vectorised, as well
as improving the output from some of the existing ones. I haven't
looked at the new ones. I expect this comes from interleaved stores.)
* Wrote an arm_neon.h version of Android's scanline_t32cb16_neon and
compared it with the original.
* Started (and only started) seeing how the vectorisation stuff
affects DENbench.
* Backported the dwarf2out OOM fix to Linaro 4.6.
== Next week ==
* Do something useful on DENbench.
* If all goes well, commit the vectorisation work upstream and backport
it to Linaro 4.5 and 4.6.
...but it's only a three day week.
Richard
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.04-1.
Linaro QEMU 2011.04-1 is the third release of qemu-linaro. Based
off upstream (trunk) qemu, it includes a number of ARM-focused
bug fixes and enhancements.
Interesting changes include:
- Compiling for an ARM host in Thumb mode now works
- As usual, various minor correctness fixes and other upstream changes
Known issues:
- The beagle and beaglexm models do not support USB, so there is no
keyboard, mouse or networking (#708703)
The only change over the shortlived 2011.04-0 is to fix a compilation
failure with gcc 4.5.
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.04-1
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro