== This week ==
* Spent about half of the week on auto increment/decrement. There are two
execution failures left.
* Looked at assembly comparisons between the old pass and various forms
of the new pass. The results look reasonable.
* Ran DENbench and my libav microbenchmarks to measure the difference
in performance. Saw that some tests were repeatably worse.
* Looked into those tests and realised that they were being hit by the
lack of an address writeback model in the scheduler (a known limitation).
Dependent stores were being scheduled in a block at the end of the loop
because we said that the dependencies had 0 latency.
* Spent most of the rest of the week on fixing that limitation. One of the
difficulties is that define_bypass currently requires a complete list
of instruction reservations. This is difficult for things like writeback
because the result could in principle be used by many different instructions.
Decided to generalise define_bypass so that it can handle filename-style
globs.
* Wrote a patch to model writeback in NEON.
* Wrote a patch to model writeback in core instructions. However,
while doing this, I noticed that the behaviour I'm seeing on our
Cortex-A8 doesn't match what I'd expected from GCC's A8 scheduler
description (or the documentation). Talked with Ramana about it.
Distilled a benchmark.
* These scheduler changes didn't improve the DENbench and libav
scores much by themselves, but the combination of the scheduler
and auto inc/dec changes did produce noticeable improvements
in some libav benchmarks and rather smaller improvements in
some DENbench ones.
== Next week ==
* Finish scheduler work, in light of observed behaviour.
* More testing prior to submission.
I'm away the week of 13th June.
Richard
Hi,
- bug fixes: PRs 49222, 49199, 49239, 49093
- widening multiplication: submitted a patch to support widen-mul for
unsigned types and constants in the vectorizer's pattern recognizer.
Now considering to move optimize_widening_mul pass before loop
optimizations and improve it to support unsigned and constants
Next week: holiday on Tuesday (half day) and Wednesday.
Ira
2011/5/29 Fathi Boudra <fathi.boudra(a)linaro.org>:
> Hi,
>
> The Linaro Team is pleased to announce the release of Linaro 11.05.
>
> 11.05 is the second public release that brings together the huge amount of
> engineering effort that has occurred within Linaro over the past 6 months.
>
> This is the first release delivering Android, Ubuntu and the Working Group
> components nicely bundled into one release. We will continue to pick up more
> Working Group and Landing Team outputs in the upcoming monthly releases.
>
> We encourage everybody to use the 11.05 release. The download links for all
> images and components are available on our release page:
>
> http://wiki.linaro.org/Cycles/1105/Final
>
> Highlights of this release:
>
> * Linaro GCC 4.5, GCC 4.6 and GDB 7.2 2011.05, recently released components
i have been wondering why always two versions are released at the same
time. what kind of users are expected to use 4.5, and what kind of
users are expected to use 4.6?
my another question is whether we have a policy to maintain old
realease. for example, in case1105 has some bugs, is it possible
linato toolchain team fix those bugs in the old version later. many
users are using old version with bugs, if they move to new version
directly, new feature maybe import new bugs. so people maybe want to
use old version with bug fixes, but without new features.
> created by the Toolchain Working Group.
> * Linaro Kernel 2011.05-2.6.38, the first source tarball release of Linux
> Linaro done by the Kernel Working Group.
> * Linaro Evaluation Builds (LEBs) for Android and Ubuntu on PandaBoard with
> 3D graphics acceleration.
> * Android cross toolchain based on latest gcc-linaro and gdb-linaro
> * Host development tools (cross compiler, image builders) readily integrated
> for the Ubuntu distribution users (Lucid, Maverick and Natty support).
> * And many more...
>
> Using the Android-based images
> ==============================
>
> The Android-based images come in three parts: system, userdata and boot.
> These need to be combined to form a complete Android install. For an
> explanation of how to do this please see:
>
> http://wiki.linaro.org/Platform/Android/ImageInstallation
>
> If you are interested in getting the source and building these images
> yourself please see the following pages:
>
> http://wiki.linaro.org/Platform/Android/GetSource
> http://wiki.linaro.org/Platform/Android/BuildSource
>
> Using the Ubuntu-based images
> =============================
>
> The Ubuntu-based images consist of two parts. The first part is a hardware
> pack, which can be found under the hwpacks directory and contains hardware
> specific packages (such as the kernel and bootloader). The second part is
> the rootfs, which is combined with the hardware pack to create a complete
> image. For more information on how to create an image please see:
>
> http://wiki.linaro.org/Platform/DevPlatform/Ubuntu/ImageInstallation
>
> Getting involved
> ================
>
> More information on Linaro can be found on our websites:
>
> * Homepage: http://www.linaro.org
> * Wiki: http://wiki.linaro.org
>
> Also subscribe to the important Linaro mailing lists and join our IRC
> channels to stay on top of Linaro developments:
>
> * Announcements:
> http://lists.linaro.org/mailman/listinfo/linaro-announce
> * Development:
> http://lists.linaro.org/mailman/listinfo/linaro-dev
> * IRC:
> #linaro on irc.linaro.org or irc.freenode.net
> #linaro-android irc.linaro.org or irc.freenode.net
>
> Known issues with this release
> ==============================
>
> For any errata issues, please see:
>
> http://wiki.linaro.org/Cycles/1105/Final#Known_Issues
>
> Bug reports for this release should be filed in Launchpad against the
> individual packages that are affected. If a suitable package cannot be
> identified, feel free to assign them to:
>
> http://www.launchpad.net/linaro
>
> Cheers,
>
> Fathi Boudra
> --
> Linaro Release Manager | Platform Project Manager
>
> _______________________________________________
> linaro-announce mailing list
> linaro-announce(a)lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-announce
>
Hi,
* finished the measuring of the overhead of the ARM specific unwind tables
https://wiki.linaro.org/KenWerner/Sandbox/libunwind#overhead_of_the_ARM_spe…
* started to get an environment up and running in order to build the
linaro-android sources
* encountered some build issues (I'm in the process to sort out some
issue with pfalcon of the android team)
* finshed 11.11 cycle planning
* I'll be out of office for the rest of the week (public holiday +
vacation)
Regards
Ken
Progress:
* Some trouble building the SPEC2k tools in the new multiarch world in
natty. Perl refuses to link libm and a number of other things
also end up failing . Appears to be a real pain with the new multiarch
world and SPEC2k's curious build system for its' tools . Will fall back
to an older chroot and get the tools built natively.
* Tried breaking down the T2 performance blueprint - initial breakdown
now available.
* Looked at the binutils vmov.i64 issue again. Looks like natty-updates
will now pick this up.
* Some patch review and bugzilla maintenance.
Plans:
* Get SPEC tools building.
* Look more at the T2 performance blueprint
* Spend some time on the VFP moves and look at ivopts for a bit.
* Merge review duty.
Meetings:
* 1-1s
* Linaro calls.
== Last week ==
* Investigated the CoreMark numbers posted by Michael Hope, mainly the
oddities of a significant Linaro 4.6 regression versus FSF 4.6. Later
verified to be a false alarm.
* Pushed a merge of some of my upstream CoreMark patches to Linaro 4.6.
* Did archeology for PR42017. Traced some history of the ARM prologue
from 2000 to 2007 (DF branch), posted upstream. Hope this clarification
gets my patch an approval soon.
* Tried the above PR42017 patch (which is supposed to release the use of
LR as a general register in leaf functions) on CoreMark, using Linaro
4.6, and was surprised to find that despite many reductions in spill
code and epilogue (now more often directly return by ldmfd), the
generated code still regresses in performance (!).
* Continuing above, suspecting something from experience (cough) added
-falign-functions=8 to the CoreMark compile options. Finally produced a
small improvement, while causing a regression for the
without-PR42017-patch case (victory?).
* Worked on PR48808, PR48792 over the weekend, which are cases where
paradoxical subregs caused ICE in reload. Posted an ARM backend patch
upstream, though now mostly taken over by Richard Sandiford :)
== This week ==
* Some other PRs, ideas, still work in progress.
* Started using the porter boards, will try to get LP:689887 over with
this week.
* Set-up SPEC2006 profile runs on PowerPC with trunk.
* Looked at SPEC2006's 462.libquantum.
* PR745743 - compared different versions mentioned in the PR.
* Wrote a patch to fix another issue related to how SMS handles debug_insn.
== String routines ==
* Finally finished the ltrace analysis of the whole of SPEC 2k6 and
have written it up - I'll proof read it next week and then send it out
to the benchmark list.
* Ran memset and memcpy benchmarks of larger than cache sizes on A9
* memcpy on larger than cache sizes (or probably mainly cache miss
data) does come back to Neon winning over ARM; my suspicion is that
with cache hits we run out of bandwidth on Neon, but that doesn't
happen in the cache miss case; why it's faster in that case I'm not
sure yet.
* memset is still not faster for Neon even on large sizes where
the destination isn't in the cache.
== Other ==
* Started looking at 64 bit atomics
* Looking at the pot of QEmu work with Peter.
Dave
Hi,
* the overhead of the ARM specific unwind tables for some binaries:
https://wiki.linaro.org/KenWerner/Sandbox/libunwind#overhead_of_the_ARM_spe…
* sometimes the size of the .text section differs which worries me a
bit (not necessarily a GCC issue, could be related to the build system)
* tested a couple of linaro-android images on my panda board
* ran into a l-i-t issue (now fixed) and discussed with asac and friends
* and finally got the network up and running :)
* some 11.11 cycle planning (libunwind work items, "in distributions"
spec)
Regards
Ken
RAG:
Red:
Amber:
Green: 1111 QEMU planning complete
Current Milestones:
| Planned | Estimate | Actual |
complete 1111 planning | 2011-05-28 | 2011-05-28 | 2011-05-27 |
qemu-linaro-2011-06 | 2011-06-16 | 2011-06-16 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 |
qemu-linaro 2011-05 | 2011-05-19 | 2011-05-19 | n/a |
close out 1105 blueprints | 2011-05-28 | 2011-05-28 | 2011-05-19 |
== other ==
* Completed planning work for 1111; all blueprints now created, fleshed
out with work items and assigned:
https://blueprints.launchpad.net/qemu-linaro
[Note that as expected some items under consideration have not made
the list; this includes the trustzone work]
* Some interesting upstream QEMU discussions (list and IRC) on
(a) performance improvements [good to see general interest in
this] and (b) overhauling the memory API [very long thread
but I think the proposed API should be OK for ARM system emulation
purposes]
* LP:768650: QEMU warnings on recent Linaro OMAP3 kernels: tracked down
to the kernel deliberately reading a register it knows doesn't exist
on OMAP2/3. Sent a query via Arnd about whether we can get this changed.
* rebased linaro-qemu to current master
* Sent patchset which starts ARM QEMU moving towards getting rid of the
implicit global CPUState pointer
* sent patch fixing a configure bug causing it to create recursive
symlinks
* sent a patchset which tightens up the compile time TCG value type
checking; this would have detected the build-breaking patch I sent
earlier this week...
* sent patch adding support for active-low interrupts to the LAN9118
model; this is needed when it is used in the Overo OMAP3 board model
Meetings: toolchain, standup, GSoC student, doughnuts
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
1-5 August: Linaro sprint 1111
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
Hi,
* PR 49087 - fixed
* PR 49038 - opened by Richard - fixed on 4.7, to be backported to 4.5 and 4.6
* working on widening multiplication for unsigned types and constants
(the signed case works fine)
Ira
Posted a new patch for 16 -> 64 bit multiply and accumulate:
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg05794.html
Pushed the same patch to a Launchpad branch for testing.
Pinged my addw/subw patch as a review didn't seem forthcoming.
Worked on a canonical form for HImode to DImode multiple-and-accumulate.
The problem isn't too hard to fix, but it's hard to do it in a nice way.
Attended Nathan S's reorg call. Followed up by talking to Nathan F about
what he's been working on with Wind River. Read up on the Wiki.
Looked at why the ARM smlal{tb,bt,tt} instructions are not generated.
I've added the proper patterns, but combine doesn't match them, and I've
run out of time this week to check why.
----
Upstream patched requiring review:
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* ARM Thumb2 addw/subw support.
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg03783.html
* Multiply and accumulate:
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg05794.html
== Last week ==
* Took Monday off, flew back to Taiwan on Tues., got home Wed. night.
* LP:689887, ICE in get_arm_condition_code(). Finally have some new
progress on this. Found my code was rejecting DImode comparisons,
causing uses of __aeabi_lcmp, etc. in expanded RTL. While this still
does not fully explain a bootstrap fail, it may be related, and it's
good I found this here rather then scratch heads on performance
regressions later... :)
* LP:771903: invalid ubfx asm produced by GCC. Mostly got down to the
bottom of this. This bug is rather well hidden, first avoided due to
some inlining heuristic changes after FSF 4.5 was branched (hence 4.6
and trunk doesn't show on the testcase), then hidden again later by
-ftree-bit-ccp. Was able to reproduce on mainline trunk after some
changes to testcase and options. Will send patch later.
* Talked with Ramana on IRC and mail about the '+' constraint modifiers
in the VFP fmul/fdiv patterns. Mostly concluded that these are typos,
and should be fixed.
== This week ==
* Continue with issues.
Hi there. The next two weeks is where we take the technical topics
from the TSC and the discussions had during the summit and turn them
into the concrete engineering blueprints for this cycle. I've created
a page at:
https://wiki.linaro.org/MichaelHope/Sandbox/1111Blueprints
listing all of the TRs. Could you please have a look through these,
find any with your name on them, and fill in the wiki page. I've put
more notes on the page itself. Some of the topics may warrant
specifications.
Let me know if you have questions on what the topics actually mean.
-- Michael
* Profiling SPEC 2k6 still; about 3/4 of the latrace files are
generated but it's taking some hand holding with some of them
(e.g. finding one that makes millions of calls to a library function
that we're not interested in but generates a huge log, and hence
needs it excluding).
* Working through the ones that I have with analysis scripts and
writing the interesting things up.
* Submitted ARM test suite fix for latrace (unsigned characterism)
* Verified Richard's binutils fix in natty-proposed fixed the vtk FTBFS
* Blueprint for 64bit sync primitives.
Dave
Hi,
* started to measure the overhead of -funwind-tables
* libunwind text size increase < 5%
* firefox4 is still building... :)
* found a small glitch when cross compiling the binutils deb package
* made a small patch, talked with doko, fix upstream
* installed android on the pandaboard
https://wiki.linaro.org/KenWerner/Sandbox/AndroidOnPanda
* setup an android development environment on my thinkpad
Regards
Ken
RAG:
Red:
Amber:
Green: 1105 work item status 100% complete
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-05 | 2011-05-19 | 2011-05-19 | n/a |
close out 1105 blueprints | 2011-05-28 | 2011-05-28 | 201--05-19 |
complete 1111 planning | 2011-05-28 | 2011-05-28 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 |
== merge-correctness-fixes ==
* last few work items for this blueprint either completed or postponed
[For the record, postponed work:
setting Cortex A8r2 device ID etc regs -- moved to omap3 upstreaming
trustzone -- may get its own blueprint this cycle
VCVT fp exception flags -- postponed as rather tricky and an
obscure corner case that is unlikely to be noticed by users]
== other ==
* tracked down bug with QEMU loading of Google Go produced ELF files,
submitted patch
* talked to our local trustzone expert, very useful
* reworked and resent FPSCR exception flags patches based on review
comments
* reviewed a patch for setting IFSR right for BKPT
* more planning effort
* sent patch to suppress SD card model warnings generated when Linux
probes to see if it's an SDIO card
* redid the "check for unused -nic options" patch as it turned out to
cause regressions with NICs created via -device.
Meetings: toolchain, standup, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
== This week ==
* Spent almost all the week on GCC's auto inc/dec pass. I first
continued with the incremental "clean ups" and recoding that I'd
started during free time at Budapest, with the idea of bolting the new
optimisations on top of that. However, in the end, I decided it would
be better to rewrite the pass entirely, using a different approach.
I've now got an early prototype of that rewrite, and it seems to be
working as expected on the test cases I've tried so far. I'm running
a regression test over the weekend, although TBH, I expect it to fail
at this stage.
* Tested the fix for vzip, vunz and vtrn. Went well, so I'll submit
next week.
* Blueprints.
== Next week ==
* More auto inc/dec:
* Round off some known rough edges in the prototype.
* Fix bugs.
* Run benchmarks.
* Run code comparison tests (diffing assembly code), both on ARM and
on other targets of interest.
Richard
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.
Linaro GDB 7.2 2011.05-0 is the sixth release in the 7.2 series. Based
off the latest GDB 7.2, it includes a number of ARM-focused bug fixes.
This release fixes:
* LP: #615972 Neon registers missing in core files
* LP: #615978 Failure to software single-step into signal handler
* LP: #615996 gdb.cp/templates.exp failures
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.2-2011.05-0
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
-- Michael
Can somebody please explain how development happens regarding qemu-linaro ?
I've taken a look here [0] and If I'm not mistaken, there's no code in the
repo. I can see a lot of blueprints, but I don't understand how work is
being done regarding those blueprints or when will it be done! Oh, and what
exactly is the 'qemu-linaro' tarball in the repo ?
I'm not sure how newbie this question is, but please bear with me. :D
Thanks in advance.
[0] https://launchpad.net/qemu-linaro
--
Karim Allah Ahmed.
LinkedIn <http://eg.linkedin.com/pub/karim-allah-ahmed/13/829/550/>
Hello,
* Sent 5 SMS related patches for review upstream.
* Backported two SMS patches from mainline to gcc-linaro and
gcc-linaro/4.6 (fixes for unfreed memory)
Thanks,
Revital
Hi,
* committed a patch that supports reductions in SLP (upstream)
* continued analyzing benchmarks: ffmpeg, EEMBC telecom, office, networking
* started to look into implementation of reverse accesses for Neon
* blueprints
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.5 and Linaro GCC 4.6.
Linaro GCC 4.5 2011.05 is the tenth release in the 4.5 series. Based
off the latest
GCC 4.5.3+svn173417, it adds new optimisations, much improved support
for strided load/stores, and fixes for many of the issues found in the
last month.
Interesting changes in 4.5 include:
* Updates to 4.5.3+r173417
* Performance improvements in NEON strided loads and stores
* Performance improvements targeted at EEMBC CoreMark
* Precompiled header support on recent Linux kernels
Fixes:
* LP: #660156: Heap randomisation causes PCH testsuite failures
* LP: #784375: vset_lane_u8 intrinsic generates wrong lane number
* LP: #759409: Profiled bootstrap fails in FSF GCC 4.5
* LP: #723086: Test regressions in the Fortran test suite
The strided load/store improvements allow both NEON intrinsics and the
vectoriser to efficiently access values that occur at every n'th
address, such as all of the red values in a RGB image or all of the
left channel samples in a interleaved audio array. Previous versions of GCC
would unpack the values onto the stack instead of using the registers
directly.
The CoreMark improvements improve the code generation for the hot
functions in benchmark. This release is now on par with Linaro GCC
4.4 and significantly ahead of other FSF or Linaro 4.5 based
compilers. This fixes the long-standing problems of ARMv5 being
faster than ARMv7 and 4.4 based compilers being faster than 4.5 based
ones.
Linaro GCC 4.6 is the third release in the 4.6 series. Based off the
latest GCC 4.6.0+svn173480, it adds new optimisations, vectoriser
improvements, and continues with the merge of many ARM-focused
changes.
Interesting changes include:
* Updates to 4.6.0+r173417
* Brings forward more of the performance improvements from Linaro GCC 4.5
* Adds support for swing-modulo scheduling
* Fixes precompiled header support on recent Linux kernels
* Changes the default NEON vector size to quads
* Adds auto-detection of the best vector size
* Adds vectorisation improvements due to better if-conversion
Fixes:
* LP: #714921: Uses an unreasonable amount of memory to compile QEMU on armel
* LP: #723086: Test regressions in the Fortran test suite
The source tarball is available from:
https://launchpad.net/gcc-linaro/+milestone/4.5-2011.05-0https://launchpad.net/gcc-linaro/+milestone/4.6-2011.05-0
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
Hi All,
This is based upon gcc version 4.5.3 (20110221 pre-release)
Any help appreciated
This shows a bug in the Linaro gcc compiler with the Arm NEON
vset_lane intrinsic
Note in the objdump that the vmov.8 instruction that places the
value in the vector for the non-q version uses 1 where it should use
2 and 3:
18: ee410bb0 vmov.8 d17[1], r0
1c: ee420bb0 vmov.8 d18[1], r0
20: ee400b90 vmov.8 d16[0], r0
3c: ee440bb0 vmov.8 d20[1], r0
For the q version the vmov.8 instructions are correct:
40: ee420bf0 vmov.8 d18[3], r0
54: ee420bd0 vmov.8 d18[2], r0
64: ee400b90 vmov.8 d16[0], r0
70: ee420bb0 vmov.8 d18[1], r0
/* Source code */
#include <arm_neon.h>
static uint8x8_t vec[5]
static uint8x16_t qvec[5];
void set(uint8_t value)
{
vec[1] = vset_lane_u8(value, vec[0], 3);
vec[2] = vset_lane_u8(value, vec[0], 2);
vec[3] = vset_lane_u8(value, vec[0], 1);
vec[4] = vset_lane_u8(value, vec[0], 0);
qvec[1] = vsetq_lane_u8(value, qvec[0], 3);
qvec[2] = vsetq_lane_u8(value, qvec[0], 2);
qvec[3] = vsetq_lane_u8(value, qvec[0], 1);
qvec[4] = vsetq_lane_u8(value, qvec[0], 0);
}
Thx
Lee
Hi there. The 2011.05 release has been spun and is testing up well.
The 4.5 and 4.6 branches are now open so feel free to commit any
approved patches.
-- Michael
Progress:
* Attended LDS from 9th -14th May.
Plans:
* Look at Thumb2 performance blueprint and break it down.
* Investigate more headroom for SPEC2k starting this week.
* Thumb2 performance call this week.
Meetings:
* 1-1s
* T2 performance.
Hello,
- Attended Linaro@UDS.
- SMS patches to support ARM do-loop pattern got approved in mainline
and merged into gcc-linaro 4.6 and 4.5.
- Sent merge request for two patches in trunk. (SMS_fixes_for_unfreed_memory)
- Implemented an optimization for the stage-count and now testing it.
Thanks,
Revital
== Last week ==
* At Linaro@UDS; I am still typing this in Budapest. Sparingly did some
work between sessions.
* PR42017, ARM LR register not being used. Discussed the patch with
Richard Sandiford at LDS. Re-tested a bit and about to resend a revised
patch according to his suggestion.
* LP:748138, redirect_jump() ICE. Committed patch to CS stable and
trunk. Submitted merge request to Linaro 4.5 branch.
* LP:689887. Got some suggestions from Revital on how to debug the
bootstrap failure caused by my patch, will look into applying it.
== This week ==
* Taking Monday off, I'll be flying back to Taiwan on Tuesday.
* Continue with issues after getting home.
RAG:
Red:
Amber:
Green: 1105 work item status 99% complete with 2 weeks to go
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-05 | 2011-05-19 | 2011-05-19 | n/a |
close out 1105 blueprints | 2011-05-28 | 2011-05-28 | |
complete 1111 planning | 2011-05-28 | 2011-05-28 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 |
== merge-correctness-fixes ==
* some of my pending patches have been applied; a number of others are
still under discussion or need further work/testing
== other ==
* We won't be making a qemu-linaro 2011-05 release, since there are no
changes since the 2011-04 release (due to a combination of the Easter
holiday and UDS week).
* Attended UDS
* almost all 1105 work items either complete or confirmed postponed
to next cycle
* Good progress on fleshing out blueprints for next cycle:
https://wiki.linaro.org/PeterMaydell/Qemu1111
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
Last week, Ramana pointed me at an upstream bug report about the
inefficient code that GCC generates for vzip, vuzp and vtrn:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48941
It was filed not longer after the Neon seminar at the summit;
I'm not sure whether that was a coincidence or not.
I attached a patch to the bug last week and will test it this week.
However, a cut-down version shows up another problem that isn't related
specifically to intrinsics. Given:
#include <arm_neon.h>
void foo (float32x4x2_t *__restrict dst, float32x4_t *__restrict src, int n)
{
while (n--)
{
dst[0] = vzipq_f32 (src[0], src[1]);
dst[1] = vzipq_f32 (src[2], src[3]);
dst += 2;
src += 4;
}
}
GCC produces:
cmp r2, #0
bxeq lr
.L3:
vldmia r1, {d16-d17}
vldr d18, [r1, #16]
vldr d19, [r1, #24]
vldr d20, [r1, #32]
vldr d21, [r1, #40]
vldr d22, [r1, #48]
vldr d23, [r1, #56]
add r3, r0, #32
vzip.32 q8, q9
vzip.32 q10, q11
subs r2, r2, #1
vstmia r0, {d16-d19}
add r1, r1, #64
vstmia r3, {d20-d23}
add r0, r0, #64
bne .L3
bx lr
We're missing many auto-increment opportunities here. I think this
is due to the limitations of GCC's auto-inc-dec pass rather than to
a problem in the ARM port itself. I think there are two main areas
for improvement:
- The pass only tries to use auto-incs in cases where there is a
separate addition and memory access. It doesn't try to handle
cases where there are two consecutive memory accesses of the
form *base and *(base + size), even if the address costs make
it clear that post-increments would be a win.
- The pass uses a backward scan rather than a forward scan,
which makes it harder to spot chains of more than two accesses.
FWIW, I've got fairly specific ideas about how to do this.
Unfortunately, the pass is in need of some TLC before it's
easy to make changes. So in terms of work items, how about:
1. Clean up the auto-inc pass so that it's easier to modify
2. Investigate improvements to the pass
3. Submit the changes upstream
4. Backport the changes to the Linaro branches
I wrote some patches for (1) last week.
I'd estimate it's about 2 weeks' work for (1) and (2). (3) and (4)
would hopefully be background tasks. The aim would be for something
like:
.L3:
vldmia r1!, {d16-d17}
vldmia r1!, {d18-d19}
vldmia r1!, {d20-d21}
vldmia r1!, {d22-d23}
vzip.32 q8, q9
vzip.32 q10, q11
subs r2, r2, #1
vstmia r0!, {d16-d19}
vstmia r0!, {d20-d23}
bne .L3
bx lr
This should help with auto-vectorised code, as well as normal core code.
(Combining the vldmias and vstmias is a different topic. The fact that
this particular example could be implemented using one load and one
store is to some extent coincidental.)
Richard
== String routines ==
* Gave up on perf on silverbell and redid it on ursa2; now have a
full set of perf figures and have updated the workload report to show
the spec
binaries that use significant time in libc and the routines they spend
it in; a handful of tests spend very significant amounts of time in
libm.
* Have ltrace results from about 75% of spec - some of the others
are fighting a bit
* Optimised the non-neon memcpy; it's now quite respectable except
in one or two cases (2 byte misaligned, and for some odd reason source
offset
by 8 bytes, destination by 12 is way down on any other combination)
(Current result graphs here
https://wiki.linaro.org/Internal/People/DaveGilbert?action=AttachFile&do=ge…
)
Dave
Hi,
* continued looking into ffmpeg/libavcodec:
- dcadsp.c - the inner loop contains reverse accesses which are not
supported on Neon. I think we can handle them using vrev and vswp.
- a lot of loops have unknown memory stride. I am exploring a
possibility of a combination of scalar loads and vmov into a vector
register, but it is probably too expensive.
* looking into telecom/conven
Ira
== Last week ==
* Launchpad #748138: "ICE in redirect_jump, at jump.c:1443". Related to
shrink-wrap, discussed a bit with Bernd off-list. Sent fix today (Mon.)
to gnu-internal; will need to merge to Linaro.
* CoreMark combine canonicalize compares patch set: bootstrapped and
tested with clean results on powerpc, added comments and updated
upstream submission. Machine independent parts okayed by Jeff Law, now
committed upstream. ARM parts still pending review.
* Compiled back-list of upstream patches, and sent to patches(a)linaro.org
* Traveled to Budapest, Hungary for Linaro Developer Summit on Saturday.
== This week ==
* Linaro Developer Summit at Budapest all week.
== GDB ==
* Committed support for NEON registers in core dumps (bug #615972)
to Linaro GDB (not yet in mainline).
* Investigated root cause of bug #615996 (gdb.cp/templates.exp) and
started exploring ways to fix it.
== GCC ==
* Committed fix for bug #759409 (Profiled bootstrap fails in GCC 4.5)
to FSF GCC 4.5 branch and Linaro GCC 4.5.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Worked on the ARM 16 -> 64-bit multiply-and-accumulate problem. Bernd
kindly provided a prototype patch to help. I've tried to understand what
needs to be done, but I didn't have enough time to get to the bottom of
it. So far, I think I know why the existing code doesn't work, and I
think I have a way forward. It does appear that the real problem ought
to be solved in the tree optimizers, though.
Committed the FSF GCC 4.5.3 merge to the Linaro 4.5 branch. Testing did
not show any trouble.
Matthias requested an additional 4.5 merge to pick up a new bug fix, so
I've done the merge, and submitted the merge request for testing.
Committed Maxim's compound conditionals optimization patch - a merge
from Linaro GCC 4.5.
There was some confusion caused by the lp:gcc-linaro/4.6 branch history
accidentally getting re-written. After some discussion on #bzr I managed
to figure out what happened, posted a warning to linaro-toolchain
mailing list, and changed the branch configuration to prevent it
happening again.
Committed Mark Shinwell's BRANCH_COST patch to Linaro GCC 4.6 - another
merge from GCC 4.5.
Merged from FSF GCC 4.6 to Linaro 4.6 and submitted the patch for testing.
Richard Earnshaw approved my recent Thumb2 constants patch, but only if
I modify it slightly. I've begun work on the changes, but I still need
to test them. I won't be able to commit them until the ADDW/SUBW patch
has been approved.
Ramana has reviewed my EABI half-precision function names patch, and
discovered that the return types are wrong. I have no idea how this
happened - the changes are deliberate so they must have been based on
something, but I no longer have the same documents I had when I did the
work, and it clearly doesn't match my current ones. In any case, the
changes make no practical difference as function return values are
always as wide a register anyway.
* Other
Public holiday on Monday.
* Next week
I will be attending UDS in Budapest from 8th - 14th May. I shall
continue to read my email, but will not be attending any calls.
----
Upstream patched requiring review:
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* ARM Thumb2 addw/subw support.
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg03783.html
== Bug fighting ==
* Tracked bug 774175 (apt segfault on armel on oneiric) down to the
cortex-a8 branch erratum bug that we found as part of the bug jam a
few weeks
ago (affecting the more obscure vtk package) - Richard's existing
binutils fix should fix this.
== String routines ==
* Struggled to get 'perf' to get sane results from profiling spec;
some of the samples are obviously being associated with the wrong
process somewhere
along the process (e.g. it's showing significant samples in the sh
process but in a library that's used by the actual benchmark.
* latrace on spec still running on ursa2
* Wrote a non-neon memcpy; as expected it's aligned performance is
very similar to libc/kernel - it's a bit faster in some places but
slower
in some odd places (e.g. n*32+1 bytes is a lot slower for some
reason). It's also really bad on mis-aligned cases, I tried to take
advantage
of the v7's ability to do misaligned loads - but they really are quite slow.
Dave
== This week ==
* Committed interleaved load/store vectorisation changes upstream.
* Merged the vldN and vstN intrinsic improvements into Linaro 4.5 and 4.6.
(Thanks for the quick reviews here.)
* Backported the interleaved load/store vectorisation changes to Linaro
4.5 and 4.6. This took a while because the patch series touches
turbulent code. Submitted merge requests.
* Merged Sergey Grechanik's NEON reload improvement into Linaro 4.5
and 4.6.
* Got ready for summit.
Richard
Hi,
* finished libunwind support of detection and handling of signal frames on
ARM Linux. RT and non-RT signal frames are handled for both >=2.6.18 and
<2.6.18 kernels. The *test-resume-sig testcases are passing now.
* briefly looked into what needs to be done in order to add 64bit __sync_*
ops
* prepared for LDS
Regards
Ken
RAG:
Red:
Amber:
Green: GSoC QEMU student project approved
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-05 | 2011-05-19 | 2011-05-19 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 |
(short week, following holiday)
== merge-correctness-fixes ==
* some minor patches committed: SPARC build issues, Neon UNDEFs,
restore base reg properly for Thumb LDMs that abort midway
* v2 of configure patch to print list of valid targets
* some work on fixing QEMU FPSCR status flags (last remaining item
in this blueprint); submitted a patchset fixing everything except
the various VCVT instructions (which have trickier softfloat bugs)
* submitted patch fixing NaN behaviour in VMLA/VMLS/VNMLA/VNMLS
== other ==
* the Google Summer of Code QEMU project to work on upstreaming
some of the Android emulator has been approved; I will be mentoring
Patrick Jackson, who is the student who will be doing this work
* qemu-linaro 2011-05 is unlikely to have any code changes since 2011-04;
we might release it anyway just because it's the final one of the cycle
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
13-19 May: UDS, Budapest
(maybe but unlikely) 15-16 August: QEMU/KVM strand at LinuxCon NA,
Vancouver [LinuxCon proper follows on 17-19th]
There's already a couple of tools-related questions on here. We should
probably make sure we monitor it regularly.
This isn't to hard, once you're signed up, you can mark certain topics
as 'interesting' and then you'll get email notifications when there's a
post.
Andrew
-------- Original Message --------
Subject: Linaro forums replaced by "Ask Linaro"
Date: Fri, 06 May 2011 15:58:53 +0200
From: Michael Opdenacker <michael.opdenacker(a)linaro.org>
Organization: Linaro
To: everyone <everyone(a)linaro.org>
Greetings,
We are pleased to announce the replacement of our forums by a new "Ask
Linaro" service.
Better than forums, we expect this service to bring more questions and
answers, and incite people to join the Linaro community by sharing their
experience.
We count on your participation.
See http://ask.linaro.org/
Cheers,
Michael.
--
Michael Opdenacker - Community Manager
Linaro, http://linaro.org
Cell: +33 621 604 642
IRC: 'opm' in #linaro on irc.freenode.net
Hi all,
We seem to have had an accident!
This morning I merged one of my patches to lp:gcc-linaro/4.6, and this
afternoon I got an email from Launchpad notifying me that a mystery
revision had been deleted.
It seems that Richard has somehow overwritten my change with his.
Luckily I've spotted it and will fix it now, but it could very easily
have got lost.
I'm not sure what's happened here, but I'm pretty sure bzr does not just
do this silently. I thought you needed to specify --overwrite to do this
on purpose, but perhaps there's a bzr bug here?
Anyway, could everyone please be very careful when they do bzr push to
the release branches.
Thanks
Andrew
Hello,
[1] Regarding the patch 'Support closing_branch_deps'
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00350.html
Continued discussions with Ayal Zaks (SMS maintainer) regrading this patch.
(http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00250.html)
I'm now working on simplifying the patch for resubmission according the
recommendation.
[2] Regarding the patch 'Avoid considering debug_insn when calculating SCCs'
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg01294.html The problem exposed
while testing patch [3] and it seems that is part of a bigger problem -- SMS
does not support debug_insn properly.
I've asked Alexandre Oliva who inserted the support for debug_insn to haifa
sched and selective sched to help with that.
[3] Regarding the patch 'Support instructions with REG_INC_NOTE'
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg01309.html
The fix for debug_insn (which the patch [2] tries to solve) is needed for
bootstrapping this patch. So I'm putting it on hold for now.
Thanks,
Revital
Hi,
- backported vzip fix to GCC 4.5 and 4.6 (PR 48252)
- merged auto-detection of vector size patch to gcc-linaro 4.6
- started looking into vectorization of ffmpeg
Ira
Hi,
I am trying add usb gadget mass storage to uboot , and got run-time
abortion on usbmsd.c with the default u-boot Os option. I use linaro
2011.03 4.5.3 version, and my trials as following:
1) During debugging , I found the abort exception in
usbmsd_init_strings(). if declaring it with noinline function, like
static void usbmsd_init_strings (void)
__attribute__((noinline));
it works, however, next steps still failed.
2) compling uboot with O0, everyhing is ok.
3) try other toolchain. It works with codesourcery's 2011.03-41
and ubuntu 4.5 version, while failed with 2011.04 4.5 linaro version.
4) I have dumped the obj files, seeming some funcs optimized
,analysing the details is out of my knowledge, :( . please check the
src and objdump files in one compressed file)
toolchain:
ypluo@ypluo-dt:~/work/Current/L38EVB/Main/out/prima2cb$
arm-none-linux-gnueabi-gcc --v
Using built-in specs.
COLLECT_GCC=arm-none-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/home/ypluo/work/Current/Tools/toolchain/linaro-201103-csr-build-armv7-vfpv3_x86_32/bin/../libexec/gcc/arm-none-linux-gnueabi/4.5.3/lto-wrapper
Target: arm-none-linux-gnueabi
Configured with:
/media/8ccb10fd-862c-47d2-99fb-144e7188e1fb/home/vmuser/development/toolchain/build-toolchain/gcc-linaro-4.5-2011.03-0/configure
--target=arm-none-linux-gnueabi
--prefix=/media/8ccb10fd-862c-47d2-99fb-144e7188e1fb/home/vmuser/development/toolchain/build-toolchain/tools
--enable-languages=c,c++ --with-arch=armv7-a --with-float=softfp
--with-fpu=vfpv3-d16 --with-mode=thumb --enable-shared
--enable-multiarch --enable-threads=posix --enable-nls
--enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --disable-werror
--with-pkgversion='Ubuntu/Linaro 4.5-2011.3-csr-build'
--enable-poison-system-directories
Thread model: posix
gcc version 4.5.3 20110221 (prerelease) (Ubuntu/Linaro 4.5-2011.3-csr-build)
ypluo@ypluo-dt:~/work/Current/L38EVB/Main/out/prima2cb$
/home/ypluo/mywork/toolchain/arm-2011.03/bin/arm-none-linux-gnueabi-gcc --v
Using built-in specs.
COLLECT_GCC=/home/ypluo/mywork/toolchain/arm-2011.03/bin/arm-none-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/mywork/toolchain/arm-2011.03/bin/../libexec/gcc/arm-none-linux-gnueabi/4.5.2/lto-wrapper
Target: arm-none-linux-gnueabi
Configured with:
/scratch/janisjo/arm-linux-lite/src/gcc-4.5-2011.03/configure
--build=i686-pc-linux-gnu --host=i686-pc-linux-gnu
--target=arm-none-linux-gnueabi --enable-threads --disable-libmudflap
--disable-libssp --disable-libstdcxx-pch
--enable-extra-sgxxlite-multilibs --with-arch=armv5te --with-gnu-as
--with-gnu-ld --with-specs='%{save-temps: -fverbose-asm}
%{funwind-tables|fno-unwind-tables|mabi=*|ffreestanding|nostdlib:;:-funwind-tables}
-D__CS_SOURCERYGXX_MAJ__=2011 -D__CS_SOURCERYGXX_MIN__=3
-D__CS_SOURCERYGXX_REV__=41 %{O2:%{!fno-remove-local-statics:
-fremove-local-statics}}
%{O*:%{O|O0|O1|O2|Os:;:%{!fno-remove-local-statics:
-fremove-local-statics}}}' --enable-languages=c,c++ --enable-shared
--enable-lto --enable-symvers=gnu --enable-__cxa_atexit
--with-pkgversion='Sourcery G++ Lite 2011.03-41'
--with-bugurl=https://support.codesourcery.com/GNUToolchain/
--disable-nls --prefix=/opt/codesourcery
--with-sysroot=/opt/codesourcery/arm-none-linux-gnueabi/libc
--with-build-sysroot=/scratch/janisjo/arm-linux-lite/install/arm-none-linux-gnueabi/libc
--with-gmp=/scratch/janisjo/arm-linux-lite/obj/host-libs-2011.03-41-arm-none-linux-gnueabi-i686-pc-linux-gnu/usr
--with-mpfr=/scratch/janisjo/arm-linux-lite/obj/host-libs-2011.03-41-arm-none-linux-gnueabi-i686-pc-linux-gnu/usr
--with-mpc=/scratch/janisjo/arm-linux-lite/obj/host-libs-2011.03-41-arm-none-linux-gnueabi-i686-pc-linux-gnu/usr
--with-ppl=/scratch/janisjo/arm-linux-lite/obj/host-libs-2011.03-41-arm-none-linux-gnueabi-i686-pc-linux-gnu/usr
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic
-lm' --with-cloog=/scratch/janisjo/arm-linux-lite/obj/host-libs-2011.03-41-arm-none-linux-gnueabi-i686-pc-linux-gnu/usr
--with-libelf=/scratch/janisjo/arm-linux-lite/obj/host-libs-2011.03-41-arm-none-linux-gnueabi-i686-pc-linux-gnu/usr
--disable-libgomp --enable-poison-system-directories
--with-build-time-tools=/scratch/janisjo/arm-linux-lite/install/arm-none-linux-gnueabi/bin
--with-build-time-tools=/scratch/janisjo/arm-linux-lite/install/arm-none-linux-gnueabi/bin
Thread model: posix
gcc version 4.5.2 (Sourcery G++ Lite 2011.03-41)
Please let me know if other info needed.
Many thanks.
Yuping