Hi,
* finished the measuring of the overhead of the ARM specific unwind tables
https://wiki.linaro.org/KenWerner/Sandbox/libunwind#overhead_of_the_ARM_spe…
* started to get an environment up and running in order to build the
linaro-android sources
* encountered some build issues (I'm in the process to sort out some
issue with pfalcon of the android team)
* finshed 11.11 cycle planning
* I'll be out of office for the rest of the week (public holiday +
vacation)
Regards
Ken
Progress:
* Some trouble building the SPEC2k tools in the new multiarch world in
natty. Perl refuses to link libm and a number of other things
also end up failing . Appears to be a real pain with the new multiarch
world and SPEC2k's curious build system for its' tools . Will fall back
to an older chroot and get the tools built natively.
* Tried breaking down the T2 performance blueprint - initial breakdown
now available.
* Looked at the binutils vmov.i64 issue again. Looks like natty-updates
will now pick this up.
* Some patch review and bugzilla maintenance.
Plans:
* Get SPEC tools building.
* Look more at the T2 performance blueprint
* Spend some time on the VFP moves and look at ivopts for a bit.
* Merge review duty.
Meetings:
* 1-1s
* Linaro calls.
== Last week ==
* Investigated the CoreMark numbers posted by Michael Hope, mainly the
oddities of a significant Linaro 4.6 regression versus FSF 4.6. Later
verified to be a false alarm.
* Pushed a merge of some of my upstream CoreMark patches to Linaro 4.6.
* Did archeology for PR42017. Traced some history of the ARM prologue
from 2000 to 2007 (DF branch), posted upstream. Hope this clarification
gets my patch an approval soon.
* Tried the above PR42017 patch (which is supposed to release the use of
LR as a general register in leaf functions) on CoreMark, using Linaro
4.6, and was surprised to find that despite many reductions in spill
code and epilogue (now more often directly return by ldmfd), the
generated code still regresses in performance (!).
* Continuing above, suspecting something from experience (cough) added
-falign-functions=8 to the CoreMark compile options. Finally produced a
small improvement, while causing a regression for the
without-PR42017-patch case (victory?).
* Worked on PR48808, PR48792 over the weekend, which are cases where
paradoxical subregs caused ICE in reload. Posted an ARM backend patch
upstream, though now mostly taken over by Richard Sandiford :)
== This week ==
* Some other PRs, ideas, still work in progress.
* Started using the porter boards, will try to get LP:689887 over with
this week.
* Set-up SPEC2006 profile runs on PowerPC with trunk.
* Looked at SPEC2006's 462.libquantum.
* PR745743 - compared different versions mentioned in the PR.
* Wrote a patch to fix another issue related to how SMS handles debug_insn.
== String routines ==
* Finally finished the ltrace analysis of the whole of SPEC 2k6 and
have written it up - I'll proof read it next week and then send it out
to the benchmark list.
* Ran memset and memcpy benchmarks of larger than cache sizes on A9
* memcpy on larger than cache sizes (or probably mainly cache miss
data) does come back to Neon winning over ARM; my suspicion is that
with cache hits we run out of bandwidth on Neon, but that doesn't
happen in the cache miss case; why it's faster in that case I'm not
sure yet.
* memset is still not faster for Neon even on large sizes where
the destination isn't in the cache.
== Other ==
* Started looking at 64 bit atomics
* Looking at the pot of QEmu work with Peter.
Dave
Hi,
* the overhead of the ARM specific unwind tables for some binaries:
https://wiki.linaro.org/KenWerner/Sandbox/libunwind#overhead_of_the_ARM_spe…
* sometimes the size of the .text section differs which worries me a
bit (not necessarily a GCC issue, could be related to the build system)
* tested a couple of linaro-android images on my panda board
* ran into a l-i-t issue (now fixed) and discussed with asac and friends
* and finally got the network up and running :)
* some 11.11 cycle planning (libunwind work items, "in distributions"
spec)
Regards
Ken
RAG:
Red:
Amber:
Green: 1111 QEMU planning complete
Current Milestones:
| Planned | Estimate | Actual |
complete 1111 planning | 2011-05-28 | 2011-05-28 | 2011-05-27 |
qemu-linaro-2011-06 | 2011-06-16 | 2011-06-16 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 |
qemu-linaro 2011-05 | 2011-05-19 | 2011-05-19 | n/a |
close out 1105 blueprints | 2011-05-28 | 2011-05-28 | 2011-05-19 |
== other ==
* Completed planning work for 1111; all blueprints now created, fleshed
out with work items and assigned:
https://blueprints.launchpad.net/qemu-linaro
[Note that as expected some items under consideration have not made
the list; this includes the trustzone work]
* Some interesting upstream QEMU discussions (list and IRC) on
(a) performance improvements [good to see general interest in
this] and (b) overhauling the memory API [very long thread
but I think the proposed API should be OK for ARM system emulation
purposes]
* LP:768650: QEMU warnings on recent Linaro OMAP3 kernels: tracked down
to the kernel deliberately reading a register it knows doesn't exist
on OMAP2/3. Sent a query via Arnd about whether we can get this changed.
* rebased linaro-qemu to current master
* Sent patchset which starts ARM QEMU moving towards getting rid of the
implicit global CPUState pointer
* sent patch fixing a configure bug causing it to create recursive
symlinks
* sent a patchset which tightens up the compile time TCG value type
checking; this would have detected the build-breaking patch I sent
earlier this week...
* sent patch adding support for active-low interrupts to the LAN9118
model; this is needed when it is used in the Overo OMAP3 board model
Meetings: toolchain, standup, GSoC student, doughnuts
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
1-5 August: Linaro sprint 1111
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
Hi,
* PR 49087 - fixed
* PR 49038 - opened by Richard - fixed on 4.7, to be backported to 4.5 and 4.6
* working on widening multiplication for unsigned types and constants
(the signed case works fine)
Ira
Posted a new patch for 16 -> 64 bit multiply and accumulate:
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg05794.html
Pushed the same patch to a Launchpad branch for testing.
Pinged my addw/subw patch as a review didn't seem forthcoming.
Worked on a canonical form for HImode to DImode multiple-and-accumulate.
The problem isn't too hard to fix, but it's hard to do it in a nice way.
Attended Nathan S's reorg call. Followed up by talking to Nathan F about
what he's been working on with Wind River. Read up on the Wiki.
Looked at why the ARM smlal{tb,bt,tt} instructions are not generated.
I've added the proper patterns, but combine doesn't match them, and I've
run out of time this week to check why.
----
Upstream patched requiring review:
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
* ARM Thumb2 addw/subw support.
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg03783.html
* Multiply and accumulate:
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg05794.html
== Last week ==
* Took Monday off, flew back to Taiwan on Tues., got home Wed. night.
* LP:689887, ICE in get_arm_condition_code(). Finally have some new
progress on this. Found my code was rejecting DImode comparisons,
causing uses of __aeabi_lcmp, etc. in expanded RTL. While this still
does not fully explain a bootstrap fail, it may be related, and it's
good I found this here rather then scratch heads on performance
regressions later... :)
* LP:771903: invalid ubfx asm produced by GCC. Mostly got down to the
bottom of this. This bug is rather well hidden, first avoided due to
some inlining heuristic changes after FSF 4.5 was branched (hence 4.6
and trunk doesn't show on the testcase), then hidden again later by
-ftree-bit-ccp. Was able to reproduce on mainline trunk after some
changes to testcase and options. Will send patch later.
* Talked with Ramana on IRC and mail about the '+' constraint modifiers
in the VFP fmul/fdiv patterns. Mostly concluded that these are typos,
and should be fixed.
== This week ==
* Continue with issues.
Hi there. The next two weeks is where we take the technical topics
from the TSC and the discussions had during the summit and turn them
into the concrete engineering blueprints for this cycle. I've created
a page at:
https://wiki.linaro.org/MichaelHope/Sandbox/1111Blueprints
listing all of the TRs. Could you please have a look through these,
find any with your name on them, and fill in the wiki page. I've put
more notes on the page itself. Some of the topics may warrant
specifications.
Let me know if you have questions on what the topics actually mean.
-- Michael
* Profiling SPEC 2k6 still; about 3/4 of the latrace files are
generated but it's taking some hand holding with some of them
(e.g. finding one that makes millions of calls to a library function
that we're not interested in but generates a huge log, and hence
needs it excluding).
* Working through the ones that I have with analysis scripts and
writing the interesting things up.
* Submitted ARM test suite fix for latrace (unsigned characterism)
* Verified Richard's binutils fix in natty-proposed fixed the vtk FTBFS
* Blueprint for 64bit sync primitives.
Dave
Hi,
* started to measure the overhead of -funwind-tables
* libunwind text size increase < 5%
* firefox4 is still building... :)
* found a small glitch when cross compiling the binutils deb package
* made a small patch, talked with doko, fix upstream
* installed android on the pandaboard
https://wiki.linaro.org/KenWerner/Sandbox/AndroidOnPanda
* setup an android development environment on my thinkpad
Regards
Ken
RAG:
Red:
Amber:
Green: 1105 work item status 100% complete
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-05 | 2011-05-19 | 2011-05-19 | n/a |
close out 1105 blueprints | 2011-05-28 | 2011-05-28 | 201--05-19 |
complete 1111 planning | 2011-05-28 | 2011-05-28 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 |
== merge-correctness-fixes ==
* last few work items for this blueprint either completed or postponed
[For the record, postponed work:
setting Cortex A8r2 device ID etc regs -- moved to omap3 upstreaming
trustzone -- may get its own blueprint this cycle
VCVT fp exception flags -- postponed as rather tricky and an
obscure corner case that is unlikely to be noticed by users]
== other ==
* tracked down bug with QEMU loading of Google Go produced ELF files,
submitted patch
* talked to our local trustzone expert, very useful
* reworked and resent FPSCR exception flags patches based on review
comments
* reviewed a patch for setting IFSR right for BKPT
* more planning effort
* sent patch to suppress SD card model warnings generated when Linux
probes to see if it's an SDIO card
* redid the "check for unused -nic options" patch as it turned out to
cause regressions with NICs created via -device.
Meetings: toolchain, standup, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
== This week ==
* Spent almost all the week on GCC's auto inc/dec pass. I first
continued with the incremental "clean ups" and recoding that I'd
started during free time at Budapest, with the idea of bolting the new
optimisations on top of that. However, in the end, I decided it would
be better to rewrite the pass entirely, using a different approach.
I've now got an early prototype of that rewrite, and it seems to be
working as expected on the test cases I've tried so far. I'm running
a regression test over the weekend, although TBH, I expect it to fail
at this stage.
* Tested the fix for vzip, vunz and vtrn. Went well, so I'll submit
next week.
* Blueprints.
== Next week ==
* More auto inc/dec:
* Round off some known rough edges in the prototype.
* Fix bugs.
* Run benchmarks.
* Run code comparison tests (diffing assembly code), both on ARM and
on other targets of interest.
Richard
The Linaro Toolchain Working Group is pleased to announce the release
of Linaro GDB 7.2.
Linaro GDB 7.2 2011.05-0 is the sixth release in the 7.2 series. Based
off the latest GDB 7.2, it includes a number of ARM-focused bug fixes.
This release fixes:
* LP: #615972 Neon registers missing in core files
* LP: #615978 Failure to software single-step into signal handler
* LP: #615996 gdb.cp/templates.exp failures
The source tarball is available at:
https://launchpad.net/gdb-linaro/+milestone/7.2-2011.05-0
More information on Linaro GDB is available at:
https://launchpad.net/gdb-linaro
-- Michael
Can somebody please explain how development happens regarding qemu-linaro ?
I've taken a look here [0] and If I'm not mistaken, there's no code in the
repo. I can see a lot of blueprints, but I don't understand how work is
being done regarding those blueprints or when will it be done! Oh, and what
exactly is the 'qemu-linaro' tarball in the repo ?
I'm not sure how newbie this question is, but please bear with me. :D
Thanks in advance.
[0] https://launchpad.net/qemu-linaro
--
Karim Allah Ahmed.
LinkedIn <http://eg.linkedin.com/pub/karim-allah-ahmed/13/829/550/>
Hello,
* Sent 5 SMS related patches for review upstream.
* Backported two SMS patches from mainline to gcc-linaro and
gcc-linaro/4.6 (fixes for unfreed memory)
Thanks,
Revital
Hi,
* committed a patch that supports reductions in SLP (upstream)
* continued analyzing benchmarks: ffmpeg, EEMBC telecom, office, networking
* started to look into implementation of reverse accesses for Neon
* blueprints
Ira
The Linaro Toolchain Working Group is pleased to announce the release
of both Linaro GCC 4.5 and Linaro GCC 4.6.
Linaro GCC 4.5 2011.05 is the tenth release in the 4.5 series. Based
off the latest
GCC 4.5.3+svn173417, it adds new optimisations, much improved support
for strided load/stores, and fixes for many of the issues found in the
last month.
Interesting changes in 4.5 include:
* Updates to 4.5.3+r173417
* Performance improvements in NEON strided loads and stores
* Performance improvements targeted at EEMBC CoreMark
* Precompiled header support on recent Linux kernels
Fixes:
* LP: #660156: Heap randomisation causes PCH testsuite failures
* LP: #784375: vset_lane_u8 intrinsic generates wrong lane number
* LP: #759409: Profiled bootstrap fails in FSF GCC 4.5
* LP: #723086: Test regressions in the Fortran test suite
The strided load/store improvements allow both NEON intrinsics and the
vectoriser to efficiently access values that occur at every n'th
address, such as all of the red values in a RGB image or all of the
left channel samples in a interleaved audio array. Previous versions of GCC
would unpack the values onto the stack instead of using the registers
directly.
The CoreMark improvements improve the code generation for the hot
functions in benchmark. This release is now on par with Linaro GCC
4.4 and significantly ahead of other FSF or Linaro 4.5 based
compilers. This fixes the long-standing problems of ARMv5 being
faster than ARMv7 and 4.4 based compilers being faster than 4.5 based
ones.
Linaro GCC 4.6 is the third release in the 4.6 series. Based off the
latest GCC 4.6.0+svn173480, it adds new optimisations, vectoriser
improvements, and continues with the merge of many ARM-focused
changes.
Interesting changes include:
* Updates to 4.6.0+r173417
* Brings forward more of the performance improvements from Linaro GCC 4.5
* Adds support for swing-modulo scheduling
* Fixes precompiled header support on recent Linux kernels
* Changes the default NEON vector size to quads
* Adds auto-detection of the best vector size
* Adds vectorisation improvements due to better if-conversion
Fixes:
* LP: #714921: Uses an unreasonable amount of memory to compile QEMU on armel
* LP: #723086: Test regressions in the Fortran test suite
The source tarball is available from:
https://launchpad.net/gcc-linaro/+milestone/4.5-2011.05-0https://launchpad.net/gcc-linaro/+milestone/4.6-2011.05-0
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
Interested in commercial support? inquire at support(a)linaro.org
-- Michael
Hi All,
This is based upon gcc version 4.5.3 (20110221 pre-release)
Any help appreciated
This shows a bug in the Linaro gcc compiler with the Arm NEON
vset_lane intrinsic
Note in the objdump that the vmov.8 instruction that places the
value in the vector for the non-q version uses 1 where it should use
2 and 3:
18: ee410bb0 vmov.8 d17[1], r0
1c: ee420bb0 vmov.8 d18[1], r0
20: ee400b90 vmov.8 d16[0], r0
3c: ee440bb0 vmov.8 d20[1], r0
For the q version the vmov.8 instructions are correct:
40: ee420bf0 vmov.8 d18[3], r0
54: ee420bd0 vmov.8 d18[2], r0
64: ee400b90 vmov.8 d16[0], r0
70: ee420bb0 vmov.8 d18[1], r0
/* Source code */
#include <arm_neon.h>
static uint8x8_t vec[5]
static uint8x16_t qvec[5];
void set(uint8_t value)
{
vec[1] = vset_lane_u8(value, vec[0], 3);
vec[2] = vset_lane_u8(value, vec[0], 2);
vec[3] = vset_lane_u8(value, vec[0], 1);
vec[4] = vset_lane_u8(value, vec[0], 0);
qvec[1] = vsetq_lane_u8(value, qvec[0], 3);
qvec[2] = vsetq_lane_u8(value, qvec[0], 2);
qvec[3] = vsetq_lane_u8(value, qvec[0], 1);
qvec[4] = vsetq_lane_u8(value, qvec[0], 0);
}
Thx
Lee