== This week ==
* Reviewed patches for the release.
* ...then broke the release. Tried to spin a new one.
* Worked on a "real" fix for bug 850099. Now in testing.
* Looked more at auto-inc-dec stuff. Saw a case that didn't behave
as I expected on the A9. The A9 TRM doesn't describe what happens
for post-indexed addressing, so I asked Ramana. Apparently the
behaviour is expected. Once I have more info, I'll try to update
the patches.
* Worked on neon-highlow-extract and neon-strided-load-extract.
Posted the three patches upstream. Nicely, the one I thought
was going to be the most controversial actually got positive
feedback from Paolo (who wrote the affected code).
Richard
== GDB ==
* Completed hardware watchpoint support for gdbserver.
* Tracked down watchpoint resource accounting regression
on GDB mainline (not present in 7.3).
* Created and published Linaro GDB 7.3-2011.09 release.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
* Running SPEC2K on the Snowball board. A fresh kernel with HIGHMEM enabled
made it possible to run the tests. Great variations in the results indicates
that something strange is going on. Turning off one of the CPU:s gives
stable result (but slow), so my current guess is that the variations are
caused by a known bug that makes one cpu run slower.
http://igloocommunity.org/bugzilla3/show_bug.cgi?id=1
The patch for this bug was not included in my kernel. Will have another go
with a kernel where the patch is included, as a background activity.
* Planned and started working on the "Adding browsing benchmarks to our
current set of tests"-activity. I will try to keep documentation up to date
here:
https://wiki.linaro.org/AsaSandahl/Sandbox/BrowsingBenchmarks
Experimenting with building Firefox in different ways, so far for x86.
Best Regards
Åsa
(bouncing to linaro-dev as it's generally interesting)
On Fri, Sep 16, 2011 at 8:17 AM, Ramana Radhakrishnan
<ramana.radhakrishnan(a)linaro.org> wrote:
> Hi,
>
> I've been looking at some of the perf regressions we've been seeing
> these days in an attempt to understand what's going on in these cases.
> While I can use perf and get more statistics and do other things to
> figure out why there are perf regressions between 2 binaries along
> with perf record and report, I wonder if it is possible to use u-boot
> to accurately measure what's going on. I would like to try and get the
> values of the performance counters between 2 program points .
>
> I am aware that there are patches that are floating around that allow
> users to set and reset the PMU counters by allowing user level access
> to it in the kernel : while that maybe useful to some I'm not sure if
> I want to take a chance with some other process getting scheduled that
> ends up getting scheduled. Even if there are parts of the kernel that
> save and restore PMU counters associated per process with across
> context switches . I'm looking for as accurate measurements as
> possible in this case and I wonder if u-boot is the best bet for this
> ( in the absence of any dedicated hardware debug / trace unit) given
> not all of us have one.
>
>
> At the minimum to do this I believe we require u-boot or some start-up code to:
>
> * Turn on i-cache and d-cache. ( The current u-boot for panda that I
> get from the linaro-uboot git repo
> git://git.linaro.org/boot/u-boot-linaro-stable.git says "Warning
> Caches turned off" when starting up ). Googling around I find a few
> patches floating around that turn on the d-cache in August from Aneesh
> at TI . We should consider getting these in at some point.
>
> * Looking in $(UBOOT_TOP)/examples/api I see that there are simple
> printf routines and simple stand-alone applications that exist which
> could be used for this purpose. The one problem with this is the fact
> that u-boot appears to require use of -ffixed-r8 for it's purposes
> which *might* mean we need these if we were to use API calls into
> standard u-boot functions .
I wonder if R8 is used in the current ARM version? There's no reason
we can't cherry pick parts such as the serial I/O out into a library
and make the app completely self contained. Skip all of the
initialisation stuff and assume the boot loader has done it for you.
> * Turn on / off speculative prefetching - I believe the kernel does
> this already for a few boards, but could this be done in u-boot just
> before it launches a test application ?
>
> * Turn on the VFP and Neon units.
>
> * Turn on unaligned access so that unaligned accesses are allowed in
> the test applications. GCC will now move towards generating unaligned
> accesses on versions of the architecture that support it, the patches
> upstream have now been approved.
>
> * Memory map / linker scripts to make sure we are putting things in
> the right places (sigh, has to be per-board).
But everything goes in RAM so you have one generic linker script and a
per board MEMORY definition. Similar to:
http://bazaar.launchpad.net/~stm32f-dev/stm32f-dev/stm32f-startup/view/head…
...but even lighter.
> We then write a set of library functions that could then look at what
> performance counters are of interest to us and track them by resetting
> them to 0 and making sure they haven't overflown.
>
> Has anyone else in the group played with u-boot before or has any
> thoughts in this direction ? I am not suggesting that we do this work
> right now but it sounds like an interesting thought of where we can
> get to with this.
My worry is that we miss turning on a feature and get results that
aren't representative. That should be easy enough to check by
baselineing against a Linux hosted run.
We can use NFS or kermit to load the programs. u-boot has a network
console which is nice when you don't have serial. This combined with
an expect script (or LAVA? Paul?) should automate the whole process.
-- Michael
Hi,
* put the sources of the libunwind android port, the patches for
debuggerd and the Android test app online
* documented things at:
https://wiki.linaro.org/WorkingGroups/ToolChain/Outputs/LibunwindDebuggerd
* noticed differences between the old (debuggerd) and the new
(debuggerd+libunwind) backtraces
* I'm still not sure what's going on (maybe they are adding offsets
or something)
* however, the backtrace that libunwind does looks sane to me
Note: I'll be on vacation till October 7th.
Regards
Ken
Hi,
* testing widen-shifts patch on ARM
* SLP improvements:
- submitted a patch to allow not simple ivs in SLP
- committed a patch to allow read-after-read dependencies in SLP
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro QEMU 2011.09.
Linaro QEMU 2011.09 is the latest monthly release of
qemu-linaro. Based off upstream (trunk) QEMU, it includes a
number of ARM-focused bug fixes and enhancements.
New in this month's release:
- linux-user mode now supports the 64 bit cmpxchg kernel helpers
(only needed for applications compiled for ARMv6 or lower)
- PL111 display controller now supported; this fixes a problem
where BGR was interpreted as RGB on recent versatilepb kernels
Plus a few other minor bug fixes and the usual round of upstream
fixes and improvements.
Known issues:
- The beagle and beaglexm models still do not support USB networking;
we intend to fix this for the 2011.10 release
- There may be some problems with running multithreaded programs in
linux-user mode (LP:823902)
The source tarball is available at:
https://launchpad.net/qemu-linaro/+milestone/2011.09
Binary builds of this qemu-linaro release are being prepared and
will be available shortly for users of Ubuntu. Packages will be in
the linaro-maintainers tools ppa:
https://launchpad.net/~linaro-maintainers/+archive/tools/
More information on Linaro QEMU is available at:
https://launchpad.net/qemu-linaro
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.3.
Linaro GDB 7.3 2011.09 is the second release in the 7.3 series. Based
off the latest GDB 7.3, it includes a number of ARM-focused bug fixes
and enhancements.
This release contains:
* Support for hardware breakpoints and watchpoints in gdbserver
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.3-2011.09
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
Hi there. The 2011.09 release has been spun and is testing up well.
The 4.5 and 4.6 branches are now open so feel free to commit any
approved patches.
-- Michael
Hi!
When building Android with the Linaro toolchain, I encountered this link
time error when going from gcc 4.4.3 to gcc 4.6.
"arm-eabi-g++: error: unrecognized option '-avoid-version'"
I find several posts about people encountering the same thing for different
programs.
Was this option removed? Anyone know the story behind it?
Regards
Åsa
Release Management needs the list of the the blueprints that will be
delivered this month.
For each bp, they want a Headline and Acceptance criteria. The Headline is a
statement to include in the Monthly release announcement. The acceptance
criteria is a statement how to verify the work is done.
I have collected this month's blueprints in the spreadsheet, please review
it for accuracy. If a blueprint you are working on is missing, please add
it.
The Headline and Acceptance can be added into the whiteboard as follows:
Headline:
headline text
Acceptance:
acceptance text
Once you add the the headline and acceptance, please modify the spreadsheet
to reflect that it is done.
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AoZqvK7R1biJdGxSc…
Thanks in advance,
Mounir
--
Mounir Bsaibes
Project Manager
Follow Linaro.org:
facebook.com/pages/Linaro/155974581091106http://twitter.com/#!/linaroorghttp://www.linaro.org/linaro-blog <http://www.linaro.org/linaro-blog>
==GCC==
===Progress===
* Fixed https://bugs.launchpad.net/ubuntu/+source/gcc-4.6/+bug/838994
. Investigated Bernd's alternate patch . Will commit mine.
* Looked at PR48308 for sometime whihc might be a dup of PR50313 .
* Some blueprint foo.
* Committed a few of the outstanding approved patches into Linaro GCC-4.6
* Patch review week
* Caught up with email after vacation.
=== Plans ===
* Commit conditional compares patch.
* Commit the patch for LP838994.
* Investigate some of the performance issues with strlen and some of
the cases with - one of the ideas is to probably try and get the
specific testcases run under u-boot or as a bare-metal binary and look
at dumps of various performance monitoring counters and see what's
happening.
* Look at BRANCH_COST and finish that up next.
* Dust off patch for PR19599 . One of them them that has fallen into the cracks.
* Some patch review.
Meetings:
* 1-1s
* TCWG calls
* Thumb2 performance call.
Absences.
* 16th Oct (pm) - LLVM developer summit - London
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked - hotel
to be booked.
* 08 Nov - 11 Nov - Tentatively booked not approved.
* Dec 19 - 31st Dec - Tentatively booked not approved.
Merged both GCC 4.5 and 4.6 from FSF to Linaro. Matthias requested that
I avoid a particular upstream 4.6 commit, so I selected the revision
before that as the merge point. The problem was then fixes upstream, and
another fix was desirable, so I've redone the merge from the branch head.
Another widening-multiplies bug was reported to me (I logged it as
pr50318/lp843775), so I've fixed that and committed the fix upstream,
and filed a merge request on Launchpad.
Finished fixing the bugs in my thumb2 constants optimizations, and
backported the new patches to Linaro GCC 4.6. Pushed the updated stuff
to Launchpad for testing.
Richard Sandiford found a flaw in my patch for pr50193/lp836401, so I've
done another version of that and posted it upstream. Ramana didn't like
that version. I've started again trying to fix it a different way, but I
don't have it working just yet.
Continued work on my new constant reuse patch. I have it detecting many
constant expressions, and calculating the values for some of them. Once
it does that sufficiently well, the next step is to track what constants
are available where, I then I'll be in a position to find optimization
opportunities. At the moment, 'sufficiently well' could just mean
MOVW/MOVT pairs, as those are the most common
Tried to get the CS Panda Boards up and running again after the move. No
success. Ricardo is on the case. I'm still using the boards located at
Canonical.
Andrew
Continue looking at Richard's micro benchmarks taken from libav w.r.t
SMS and experiment with different patches that Richard wrote to
improve code generation.
Submitted SMS related patch for minor misc fixes
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00551.html
Trying to understand why to new version of the patch to support
instructions with
REG_INC_NOTE in SMS causes bootstrap failure. Will email to the ml regarding it.
(http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01216.html)
* Looked at LP bug 736661. Sent a patch upstream. Some positive feedback,
but it hasn't been approved or rejected yet.
* Looked at the old R_ARM_THM_CALL linker bug, after Matthias prepared
a self-contained testcase (thanks). Attached a patch to the bug.
Will submit upstream next week if testing goes OK.
* Looked at why the backport of lp823708-4.5 retriggered the same
bootstrap failure that Chung-Lin's patch did. Haven't been able
to reproduce yet.
* Looked at making the neon_vget_high/low patterns use ordinary
subreg moves. Found that this triggered a fair few latent bugs
in the rtl optimisers. Tried to fix those. This gave some nice
improvements in some of the libav loops.
* Added h264 loops to the libav microbenchmarks.
* Blueprints.
* Upstream patch review.
Richard
* Completed First-time wiki page, at least for now. Expecting to add more
information as I go.
* Running SPEC2K on the Snowball board. The tests are failing because I run
out out of memory. This is due to too little RAM available in the default
kernel configuration. (Official HW pack.) I had a go with creating a swap
file on the SD-card. The tests are then running, but results are slow (which
makes sense). Will try to make more memory available with changed
uboot-option, or with a fresh kernel.
* Discussing with Michael about benchmark candidates that will add a web
browsing perspective to the benchmarks we have. I suggest Sunspider and V8
benchmark suite for the JavaScript aspect, and EEMBC Browsing Bench and
perhaps ARMBBench for the load and render aspect. As for the imaging aspect
we have DENBench and ConsumerBench.
Best Regards
Åsa
== String routines ==
* Trying to understand my strlen behaviour that Michael identified
- Found lots of ways of making the faster case slower, but none of making
the slower case faster!
- Perf not being available on Panda (bug 702999/843628) made it
difficult to
dig down
* Fixing standards corner cases for strchr/memchr
- input match needs to be truncated to char (fixes bug 842258 & 791274)
* Tidying up formatting for cortex-strings release
* Looking at eglibc integration again
- getting confused by what has to happen in config.sub and how
other users of it
cope with triplets like armv7 even though it's not in config.sub
== QEMU ==
* Testing Peter's QEMU release
- All good
- Lost a few hours due to the broken version of l-i-f-ui in Oneiric
- PPA version works OK
* A little bit of perf profiling
== Other ==
* Managed to get hold of a nice fast build machine
== GDB ==
* Worked on hardware watchpoint support for gdbserver.
== GCC ==
* Analyzed root cause of three more ICEs when building Linux
kernel with mainline GCC (reported by Arnd):
PR target/50305: Inline asm reload failure when building Linux kernel
PR middle-end/50307: SSA checking ICE when building Linux kernel
PR tree-optimization/50318: ICE optimizing widening
multiply-and-accumulate
* Implemented proposed fix for PR target/50305 and posted for review.
== Misc ==
* Installed updated FPGA bitfiles on my Versatile Express and verified
that network stability issues (LP #673820) are now fixed.
* Booked Linaro Connect Q4.11 travel.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
RAG:
Red:
Amber:
Green: overrunning OMAP3 upstreaming work (mostly) replanned
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || ||
Historical Milestones:
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || 2011-06-16 ||
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
== upstream-omap3-patches ==
* more reshuffling of patches and dropping of unnecessary changes (eg
code reformatting)
* we're going to divide this blueprint into four, each of which has a
reasonably clearly defined submilestone and an estimated 3
engineering weeks of work in it
* in order to not have work on this completely push out other items on
the schedule, we're going to limit work done on this to 2 or 3 days
each week
* still todo: actually split the blueprint, set dates for
submilestones, check that other blueprints fit reasonably in the
other half-week
== linaro-qemu-11.11 ==
* built a pre-release tarball and tested it -- looks good for next
week's release
* investigated whether we can reinstate the firmware blobs in our
releases (bringing us back into line with upstream) -- should be
possible but need to go through the license approval process since
some are GPLv3
== a15-system-mode-planning ==
* starting to see some (gentle) pressure for A15 support
* thinking about what we should do here; my current opinion is that
QEMU should implement an "A15 without virtualization or LPAE" -- we
have Linux kernels that will boot on this, and it is essentially
what an A15-on-A15 hw virt guest would see. The device work will be
needed for KVM anyway. Need to write this up.
== other ==
* submitted TSC licensing request to add the firmware blobs back into
our qemu-linaro tarballs, in line with how upstream do releases
* I need to track better how much time I'm spending on things like code
review on qemu-devel, minor bug fixing and other things that aren't
blueprints
* all holiday to the end of the year now booked (see below)
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences (to end of year):
Sep 19, Sep 29-Oct 07, Oct 17, Nov 21, Dec 15-Jan 03: leave
Oct 30-Nov 04: Linaro Connect Q4.11
Hi,
* I've been setting up a new system because the old laptop died
* finished the initial port of libunwind for Android on ARM
* changed debuggerd to make use of libunwind to unwind the stack of
crashing applications
* it works and the output looks great :)
* I plan to document these things in the wiki by next week
Regards
Ken
Just as an FYI, I've added these loops to the libav microbenchmarks
avg-h264-chroma-mc8-8.txt
avg-pixels8-8.txt
ff-h264-idct-add-8-8.txt
ff-put-pixels8x16-8.txt
h264-loop-filter-luma-8.txt
idct-internal-8.txt
put-h264-chroma-mc8-8.txt
put-h264-qpel8-h-lowpass-8.txt
put-h264-qpel8-hv-lowpass-8.txt
put-h264-qpel8-v-lowpass-8.txt
based on Michael's h264 profile. These loops:
decode_residual
ff_h264_decode_mb_cavlc
fill_decode_caches
aren't really the kind of thing that the microbenchmark is designed for;
running the whole h264 benchmark is probably a better test. Some of the
functions in the profile just consist of two copies of a simpler loop,
one after the other, so for those I just used the simpler loop.
Usual microbenchmark caveats apply.
Richard
Hi,
* merged vector over-promotion patch to linaro-gcc-4.6
* committed upstream the change of the default vector size for NEON
* continued working on widening shifts
Ira
Hi Dave. I've been hacking away and have checked in a couple of
benchmarking and plotting scripts to lp:cortex-strings. The current
results are at:
http://people.linaro.org/~michaelh/incoming/strings-performance/
All are done on an A9. The results are very incomplete due to how
long things take to run. I'll leave ursa3 doing these over the
weekend which should flesh this out for the other routines.
Your new memcpy() is looking good as well - as fast as GLIBC.
-- Michael