- linaro-toolchain - lists.linaro.org

Re: Extension elimination pass breaks SPU (Fw: spu gcc-4.5 linaro build failure)

by Ulrich Weigand

Ulrich Weigand/Germany/IBM wrote on 12/20/2010 06:01:21 PM: > Mark Mitchell <mark(a)codesourcery.com> wrote: > > On 12/20/2010 8:35 AM, Ulrich Weigand wrote: > > > Now, I guess there's two ways forward: either the outcome of the ongoing > > > discussions on gcc-patches is that it is in fact not a good idea to > > > generate such sets, and the EE pass is subsequently rewritten to avoid > > > them; or else, if those instructions are considered valid, I'll have to > > > extend the SPU move expander to handle them. Thoughts? > > > > I haven't participated in the upstream discussion -- I'm way behind on > > that list :-( :-( -- but I think such sets should be considered valid. > > OK, I'll have a look at fixing the SPU back-end then. I've now fixed this problem in the back-end upstream: http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01694.html I've also created a back-port to Linaro GCC 4.5 and proposed the branch for merge; you can find the details at: https://bugs.launchpad.net/gcc-linaro/4.5/+bug/693425 Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

14 years, 6 months

1
0
0 0

[ACTIVITY] Dec. 13 -- Dec. 19

by Zach Welch

== Last Week == * Continued working on ARM unwinding in libunwind. Produced a draft write-up of my progress in the event that I don't finish this work before being swapped out of Linaro. * (Re-)submitted patches to fix ltrace test suite. Hopefully, these will be the last changes before the new release. == This Week == * Continue working on libunwind. -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

14 years, 6 months

1
0
0 0

GCC for Android

by Patrik Ryd

Hi, We would like to build Android with the Linaro tool chain. Do any of you know what kind of work will be needed to adapt Linaro gcc to Android? Regards, Patrik

14 years, 6 months

1
0
0 0

[ACTIVITY] Dec.13 -- Dec.19

by Chung-Lin Tang

== GCC related == * CS Issue #10201 / PR46883, unrecognizable insn ICE when compiling Samba. Fixed this by changing the predicates of two split patterns. Patch reviewed in CS internally and upstream, committed upstream, will backport to SG++ and Linaro soon. * LP:641397/PR46888: bitfield insert optimization. Andrew Pinski found a testcase that escapes the CSE patch gets handled by combine, and also found another bug with REG_EQUIV notes. Only looked at this minimally last week, will really work on this later. * LP:687406/PR46865, -save-temps creating different code. Backported and bzr-pushed the upstream fix by Jakub Jelinek. * PR45416, ARM code regression. Mostly can generate what I wanted by now, under ARM and x86, although patch is still not in a submittable state. * VFP index patch. Uncommitted GCC patch of mine from last year; added Thumb-2 bits and corrected some things in the testcase. Committed upstream. == This week == * Really get January travel stuff nailed. * Upstream patch review is probably going to start getting slow/suspended this week. Will probably do some study stuff on larger projects. * Continue to look at GCC issues.

14 years, 6 months

1
0
0 0

[ACTIVITY] Dec. 13th -- Dec. 18th

by Yao Qi

== Linaro GDB == * LP:615972 Different output of 'info register' w/ and wo/ corefile. Understands gcore impl in gdb. Two patches are reviewed in upstreams. One is approved by Dan, and the other is still be reviewed. Evaluate the two approaches for NEON registers in corefile. Resume the discussion on kernel support for dumping NEON registers. Need a decision with kernel side, but no progress on it. * LP:685494 Revise patch per Pedro's suggestion. Waiting for someone to approve it. * LP:685702 Get it approved for FSF GDB 7.2 branch. Committed to both FSF 7.2 branch and Linaro tree. * LP:616003 gdb.mi/mi-var-display.exp failure GDB always assumes $fp is r11, even code is in thumb mode. Current GDB infrastructure can't handle mapping the same alias to two different registers. Proposed a new gdbarch took for this in upstreams, in order to increase the flexibility of GDB. No reply yet. * LP:616001 gdb.mi/mi-var-cmd.exp failure Ulrich pointed out it is caused by stack randomization. Confirmed this by setting "kernel.randomize_va_space" to zero. Figure out why this case passes on x86, because it is more restricted to turn on stack randomization on x86. * LP:615980 Support displaced stepping on Thumb Understands displaced stepping in GDB/ARM. Find a bug when GDB tries to execute ARM instruction in copy area, which is in Thumb mode (copy area starts from "_start + 4", and it is compiled in Thumb mode in Ubuntu). The fix is an one-line patch, which doesn't update status register when writing PC in displaced stepping. Write a test case for arm displaced stepping. Write code in ARM asm directly for the first time, which is very helpful to remember ARM asm instructions. Read ARM ARM and decode 16-bit thumb instructions in GDB for displaced stepping. It doesn't work so far because breakpoint instruction after instructions in copy area is still hard-coded to ARM breakpoint insn. == Misc == * Linaro GCC optimization meeting. == This Week == * LP:615980 Support displaced stepping on Thumb Send my fix and test case to upstreams for review. Make displaced stepping work on 16-bit instruction. * Ping other GDB patches. -- Yao (齐尧)

14 years, 6 months

1
0
0 0

Re: Profile guided and string routines?

by Steven Bosscher

Mark Mitchell wrote: > > If Profile Guiding could spot that a particular callsite to say strlen() > > was often associated with strings > > of at least 'n' characters we could call a different implementation. > > I don't believe this is possible current profile-guided optimization, > but certainly it could be done. It looks to me like a case of value profiling, see tree-profile.c, for the various "stringops" optimizations. Unless I misunderstand David's idea here or missing something else, it seems that this kind of optimization should fit in the existing infrastructure without too much effort. Ciao! Steven

14 years, 6 months

2
1
0 0

Profile guided and string routines?

by David Gilbert

Does anyone have any experience of what can be profiled in the profiled guided optimisations? One of the problems with some of the string routines is that you can write pretty neat fast routines that work well for long strings - but most of the calls actually pass short strings and the overhead of the fast routine means that for most cases you are slower than you would have been with a simple routine. If Profile Guiding could spot that a particular callsite to say strlen() was often associated with strings of at least 'n' characters we could call a different implementation. Dave

14 years, 6 months

3
3
0 0

[ACTIVITY] 13th -18th December 2010

by Andrew Stubbs

* Linaro GCC lp:686381: C++ link failure on ARM Reproduced the bug and posted my findings to the bug report - user error. Changed the way the Linaro GCC version numbers are handled. Hopefully the new system should be less distasteful to Matthias. Updated the GCC release procedure document to match. Organised and chaired a meeting to discuss GCC optimization opportunities for ARM. It was well attended, and I think we had some useful discussion. Spend quite some time preparing beforehand, and writing it up afterwards. Next step is to come up with some actual plans to implement something. I imagine we can discuss this at the sprint in Dallas next month. See https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations * Upstream GCC My upstream patch to fix ARM smlabb has been approved and committed to GCC 4.6 (mainline). Only another three patches need approval now! Continued testing upstream GCC 4.6 with both cross and native builds. It appears to be in a buildable state now, with no extra patches required. I've updated the Linaro GCC 4.6 branch with the buildable state. * Other Updated my ESTA, and added my security details to the airline bookings. ------ Future availability 20th Dec .. 3rd Jan - Vacation/Holiday 4th Jan .. 8th Jan - Business as usual 9th Jan .. 14th Jan - Linaro Sprint, Dallas 15th Jan .. 21st Jan - CodeSourcery/Mentor Annual Meeting, Scottsdale 24th Jan onwards - normal service restored!

14 years, 6 months

1
0
0 0

[ACTIVITY] 2010-12-17

by David Gilbert

Got SPEC2006 building on Silverbell (VExpress) and Canis1 (Orion). There are still some issues; The builds are still going (6 hours so far on a 1GHz A9 for a build and 'test' case), and the Silverbell one has hit an ICE on one of the tests that looks like 635409, and also looks like it needs some help getting Perl to work. The build on Canis has only just started, but hasn't got Fortran installed. (The SPEC2006 tools build also failed in the Perl testsuite on sprintf.t and sprintf2.t which seem to test integer overflow cases in sprintf % fields) Added a few of the kernel string/memory routines and bionic routines into my string/memory graphs and also ran the tests on the Orion board (similar to other A9 performance - no surprise). Wrote up a draft of an email to libffi-dev describing the varargs state; and as I was doing it realised that one of the ways didn't quite work and was more messy. Using rdepends to find all packages using ffi, need to figure out if any actually care about varargs. Dave

14 years, 6 months

1
0
0 0

[ACTIVITY] Dec 13 - Dec 17

by Ulrich Weigand

== GCC == * Completed first successful bootstrap and regression test run of GCC mainline on my IGEPv2 board. * Worked on implementing fix for #617384 (.debug_line is wrong with -fpic) * Worked on backporting fix for #662324 (Pointer type information lost in 4.5 debuginfo) * Analyzed root cause of PR target/46883 (GCC ICE with error: unrecognizable insn) Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

14 years, 6 months

1
0
0 0

No meetings over the Christmas break

by Michael Hope

Hi there. I've cancelled the weekly and standup calls for the next two weeks. The next scheduled call is the standup call on Wednesday the 5th of January. Please attend if you can as it's our last one before the sprint. See you then! -- Michael

14 years, 6 months

1
0
0 0

Topics for the sprint

by Michael Hope

Hi there. The sprint is just around the corner and it's a good time to think about how we can make best use of the week. I've put some topics up at: https://wiki.linaro.org/Events/2011-01-LinaroSprint/ToolChainWG Please feel free to add to it. Have a think about anything that's easier to do while everyone is in the same room - things like discussions, kicking off some work, a bit of pair programming on a problem, or anything that overlaps with another group or Ubuntu. -- Michael

14 years, 6 months

1
0
0 0

[ACTIVITY] report week 50

by Peter Maydell

RAG: Red: Amber: Green: Milestones: | Planned | Estimate | Actual | finish virtio-system | 2010-08-27 | postponed | | get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 | complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 | finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 | Progress: * merge-correctness-fixes: ** Submitted patchset upstream to fix NaN propagation to follow ARM ARM rules rather than x87 semantics: http://patchwork.ozlabs.org/patch/75742/ http://patchwork.ozlabs.org/patch/75743/ * maintain-beagle-models: ** Finished implementation of the OMAP NAND prefetch/postwrite engine including its DMA support. Patches submitted to the qemu-maemo upstream tree and merged by Juha: http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/1 ** Fixed the (cosmetic) bug https://bugs.launchpad.net/qemu-maemo/+bug/622408 where we were complaining about "Unknown CMD52" when Linux probed for the presence of SDIO cards. Fix merged into qemu-maemo: http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/2 * qemu-continuous-integration: ** Discussion with Loic about setting up jobs on his Hudson instance for testing qemu against snapshots/hwpacks. * packageselection-arm-n-more-stable-vm-solution-for-arm ** Discussion about Ubuntu moving to using a qemu-maemo based qemu for ARM purposes. The Ubuntu blueprint is https://blueprints.launchpad.net/ubuntu/+spec/packageselection-arm-n-more-s… We need to come to agreement about what parts of this are going to be done by variously Linaro toolchain, Linaro foundations and Ubuntu. ** I'm going through doing another rebase-and-package of the Linaro qemu, and finishing off writing up the notes on the process: https://wiki.linaro.org/WorkingGroups/ToolChain/QemuReleaseProcess Meetings: toolchain, pdsw-tools, pdsw-tools xmas lunch :-) Issues: * a number of qemu patches in progress are logjammed behind the outstanding git pull request * the dbgsym debug packages for linaro kernels seem to have vanished: https://bugs.launchpad.net/linaro-images/+bug/691192 Absences: (complete to end of 2010) Fri 17 Dec - Tue 4 Jan inclusive. 2011: Dallas Linaro sprint 9-15 Jan. Holiday 22 Apr - 2 May.

14 years, 6 months

1
0
0 0

[ACTIVITY] weekly status

by Ken Werner

Hi, * I've spent some time for testing the patches that allow the GCC trunk to bootstrap again on ARM and posted the results to gcc-testresults * finally tested and posted the patch that optimizes the __sync_* builtins (#681138) on gcc-patches * investigated on the state of the crash utility on ARM (or rather its prerequisites like kexec) https://wiki.linaro.org/KenWerner/Sandbox/crash-utility * I'm on holiday now :) Regards Ken

14 years, 6 months

1
0
0 0

Re: RFC: -mimplicit-it and GCC upstream

by Dave Martin

Hi, On Wed, Dec 15, 2010 at 1:44 AM, Michael Hope <michael.hope(a)linaro.org> wrote: > On Wed, Dec 15, 2010 at 1:05 PM, Steve Langasek > <steve.langasek(a)linaro.org> wrote: >> Hi Michael, >> >> On Wed, Dec 15, 2010 at 09:29:38AM +1300, Michael Hope wrote: >>> Hi Steve. I'd like to hand the rest of this over to you if that's OK. >> >> Yep, we can take it from here. To be clear, is this an additional change >> above and beyond what Matthias reports is currently in Ubuntu gcc >> (http://lists.linaro.org/pipermail/linaro-toolchain/2010-November/000441.html), >> and if so, in what version of Linaro GCC is it going to become effective? >> Do we have documentation of what the relevant failure modes caused by this >> change *look* like, so that we can at least be triaging them appropriately >> until there's some documentation on how to fix the resulting bugs? > > There will be many failures in many packages. The problem is when you > use conditional suffixes on instructions: previously the compiler > would insert an implicit instruction before that; now we have to be > explicit. > > The failures are easy to diagnose and fix. The build will fail with a > message from the assembler along the lines of 'xxx instruction outside > an IT block'. The fix is to find the inline assembly code, insert the > appropriate IT instruction, and re-build. The assembler will validate > the IT instruction against the following conditional instructions so > the change is quite safe. Did someone manage to find out which versions of binutils can silently accept the IT instructions when assembling for ARM? This affects what advice we should give on how to avoid breaking upstream with our additions. The safest approach is #ifdefs, but it will be better for maintenance if we can avoid this, since it will render the code very messy. Cheers ---Dave

14 years, 6 months

1
0
0 0

Blueprint changes

by Michael Hope

Hi there. Some of the tr-* blueprints had work items in them and this was interfering with the tools that the PM guys use. I've created new engineering blueprints, pulled the work items across into them, and added the new engineering blueprint as a dependency of the old TR. Sorry for the blueprint spam. In most cases the new blueprint has the same name and subject as the TR one, such as the TR: https://blueprints.launchpad.net/linaro/+spec/tr-toolchain-4.5-in-distros which is backed by the engineering blueprint: https://blueprints.launchpad.net/gcc-linaro/+spec/4.5-in-distros -- Michael

14 years, 6 months

1
0
0 0

RFC: -mimplicit-it and GCC upstream

by Dave Martin

Hi Richard, Recapping on this earlier conversation: http://lists.linaro.org/pipermail/linaro-toolchain/2010-July/000030.html http://lists.linaro.org/pipermail/linaro-toolchain/2010-July/000035.html Is it worth another attempt to make a case to upstream for supporting passing -mimplicit-it=thumb by default to gas? According to my understanding of this issue, my argument would go as follows: * gcc currently estimates the size of asm blocks, rather than determining the size accurately. * gcc cannot guarantee the right answer for asm block size when asm blocks contain directives etc., however use of directives in asm blocks is widespread * gcc cannot guarantee the right answer for asm block size in Thumb-2. gcc conservatively overestimates the size by assuming that each statement of the asm block expands to 4 bytes. * All of Ubuntu lucid and maverick has been built with -mimplicit-it=thumb passwd by default, with no known build or runtime failures arising from this (so size issues aside, we have confidence that the resulting code generation is sound) * -mimplicit-it=thumb -mthumb makes the asm block size estimation unsafe: the asm block can exceed the estimated size even in the absence of directives, which may lead to fixup range errors during assembly. * Following the principles already established for Thumb-2 in general the estimation can be made safe (or, as safe as the established Thumb-2 behaviour) by raising the assumed maximum statement expansion size for asm blocks to 6 bytes, since -mimplicit-it will add as most a single (16-bit) IT instruction to each statement. * The vast majority of all asm blocks are small (< 20 instructions, say), so the overall overestimate in sizes will generally be modest for any given compilation unit. * -mimplicit-it is already _required_ by the Linux kernel and possible other projects. ...so... * With -mimplicit-it=thumb and a 6-byte asm block statement expansion size estimate, we have toolchain behaviour which is as reliable, and as correct, as it is in upstream at present. * Layout of data in the compiler output will be more optimal in some cases, and less optimal in other cases, compared with the the current Thumb-2 behaviour, due to differing asm block size estimates. The exact behaviour will depend on the distribution of conditional instructions within asm blocks. * Taken over a whole compilation unit, the total code size overestimate (and therefore the impact on object layout) will normally be modest, due to the small typical size of asm blocks. * Behaviour for -marm will not be impacted at all. If gcc currently estimated asm block code size accurately, then I could understand upstream's objection; but as it stands it seems to me we wouldn't be making anything worse in practice with the proposed change; and there is no compatibility impact (other than positive impact). Of course, I may have some wrong assumptions here, or there may be some background I'm not aware of... Comments? Cheers ---Dave

14 years, 6 months

5
10
0 0

[ACTIVITY] December 6th-12th

by Julian Brown

== Linaro GCC == * Finish testing for big-endian/quad-word patch on mainline, and send upstream. Not yet reviewed by an ARM maintainer, but Joseph suggested tweaking DejaGnu's target-supports to better reflect the new capabilities of the vectorizer in big-endian mode. I've not looked into that yet. * Started looking at improving element/structure load/store intrinsics. Made it so that the structs used for loads/stores are created in the backend so that the types can be used directly by the builtins, but discovered that the front-end/middle-end would not play along with that plan as they are. Thought about ways to fix that. * Some time spent on other CodeSourcery stuff.

14 years, 6 months

1
0
0 0

Linaro GCC 4.4 and 4.5 2010-12 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the release of both Linaro GCC 4.4 and Linaro GCC 4.5. Linaro GCC 4.5 is the fifth release in the 4.5 series. Based off the latest GCC 4.5.1+svn167157, it includes many ARM-focused performance improvements and bug fixes. Linaro GCC 4.4 is the fifth release in the 4.4 series. Based off the latest GCC 4.4.5, it is a maintenance release that fixes one problem found through use. Interesting changes include: * A new performance focused extension elimination pass * Speed and size improvements when loading constants * Performance improvements on compound conditionals * A range of correctness improvements The source tarballs are available from: https://launchpad.net/gcc-linaro/+milestone/4.5-2010.12-0 and https://launchpad.net/gcc-linaro/+milestone/4.4-2010.12-0 Downloads are available from the Linaro GCC page on Launchpad: https://launchpad.net/gcc-linaro No changes have been committed to Linaro GDB 7.2 this month. -- Michael

14 years, 6 months

1
0
0 0

[ACTIVITY] 6th - 10th December

by Andrew Stubbs

* Linaro GCC 4.4/4.5 Merged the latest CS patches and Linaro merge requests into Linaro GCC (4.4 and 4.5). Ran regression tests. Yao's patch failed so I backed it out, and made the release tarballs. Uploaded the releases to Michael Hope for release. lp:686381: luatex fails to build with gcc-4.5 Fired off a test build to reproduce the problem. Will come back to this next week. * GCC 4.6/4.7 Posted my various queued patches up to gcc-patches(a)gcc.gnu.org for review. Looked at the state of the GCC 4.6 upstream build. There are currently two problems: 1. libquadmath must be disabled in a cross-build for the bootstrap phases. 2. libstdc++ doesn't build. There is a patch for it on the mail list, but it's not applied yet. Once gcc 4.6 builds cleanly, I shall update the Launchpad 4.6 branch, and declare that the baseline for our development. We'll then have somewhere to commit and track patches awaiting GCC stage 1 development. * Other Caught up with email following my holiday. Yet again, my IGEPv2 board suffered a corrupt file system. I've now upgraded the kernel and configured it to use an NFS root. The board is now somewhat less mobile, but should work more reliably. Continued organizing the a brain-storming session for GCC optimization improvements. Organised flights and hotel for both the Linaro Sprint and CodeSourcery annual meeting in January.

14 years, 6 months

2
1
0 0

[ACTIVITY] Dec. 06 -- Dec. 12

by Zach Welch

== Last Week == * Spent time tracking down a couple of regressions that appeared in the new ltrace release-candidate tree. Submitted a bunch of patches to fix the issues that were discovered during that process; most have been applied. * Finished writing fairly generic code for handling ARM-specific unwind tables, from lookup through decoding and dispatch. It uses a few definitions specific to libunwind, but those probably could be eliminated with more work. == This Week == * Integrate new ARM-specific bits into libunwind framework. * Rewrite part of a portability patch for ltrace and hope that those changes reflect the very last effort that will be required for that particular task. -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

14 years, 6 months

1
0
0 0

Re: Neon registers in core files

by Dave Martin

On Fri, Dec 3, 2010 at 9:06 AM, Yao Qi <yao.qi(a)linaro.org> wrote: > Hi, Kernel WG, > Can recent kernel handle NEON registers in corefiles? > > Seems we've had plan for this in "Ensure full NEON debug support" in > https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Specs/BSPInvestig… > Any progress on this piece of work? We want to handle NEON registers in > corefiles from GDB, which required kernel dump them in corefile first. Hmmm, actually that bullet may have ended up in the wrong place ... since it's not a BSP-specific feature. Anyway, looking at the kernel code, it looks like the VFP/NEON state is not dumped into the core file. If it makes you feel better, the state of the obsolete FPE extension registers is dumped, if used :/ My guess is that it shouldn't be hard to dump the VFP/NEON state, but GDB and the kernel need to agree on the format. Rather that trying to hack the existing register dump format in a compatible way, I suggest it's simplest if the kernel creates an extra section in the dump containing something like: .long format_version /* reserved for future expansion - must be 0 */ .long FPSID .long FPSCR .long MVFR0 /* or 0 if not present in the hardware */ .long MVFR1 /* or 0 if not present in the hardware */ .long d0 .long d1 /* ... d2-d14 ... */ .long d15 If 32 D-registers in the hardware [ .long d16 .long d17 /* ... d18-d30 ... */ .long d31 ] I believe we don't need any extra flags to indicate whether the MVFRx fields are valid, since 0 in these registers indicates the VFPv2/legacy behaviour anyway. Note that some VFPv2 implementations (such as ARM1176) do provide these registers, and where the hardware has them, the kernel can fill them in when doing the coredump. We _should_ be prepared to ignore these fields (or interpret them differently) if a vendor-specific VFP subarchitecture is specified (by (FPSID & 0x4000) == 0x4000) The number of D-registers can be deduced from the FPSID and MVFRx registers, so we don't need to record it explicitly. When MVFRx are not present, there are 16 D-registers. When MVFRx are present, and (MVFR0 & 0xF) >= 2, there are 32 (or more) D-registers This is just a sketch -- the ARM ARM is the authoritative reference on the meanings of these bitfields. Any views on this? Cheers ---Dave > > _______________________________________________ > linaro-dev mailing list > linaro-dev(a)lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-dev >

14 years, 6 months

3
2
0 0

[ACTIVITY] Dec 6 - Dec 10

by Ulrich Weigand

== GCC == * Tracked down root cause of GCC mainline bootstrap failure on ARM (PR 46040 - "__DTOR_LIST__ undeclared"). == Miscellaneous == * Set up IGEP v2 board (w/ local disk, network, ...) as native GCC / GDB build environment. * Gave talk on Linaro at the "2010 Linux Community Event" at Siemens Munich (w/ Arnd Bergmann). Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

14 years, 6 months

1
0
0 0

Invitation: TWG GCC Optimization @ Wed 2010-12-15 9am - 10am (linaro-toolchain@lists.linaro.org)

by Andrew Stubbs

You have been invited to the following event. Title: TWG GCC Optimization Discuss ideas for improving GCC optimization for ARM. Open to anybody who wants to contribute. https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations International: +44 1452 567 588 UK: 0844 493 3801 Brazil: 08008912092 China: 108007121533 India: 0008001006354 Taiwan: 00801126472 United States: 18666161738 Conference code 2634417169# When: Wed 2010-12-15 9am – 10am London Where: Conference call code 263 441 7169 Calendar: linaro-toolchain(a)lists.linaro.org Who: * andrew.stubbs(a)linaro.org - creator * Ken Werner - optional * stevenb.gcc(a)gmail.com - optional * Michael Hope - optional * David Gilbert - optional * Peter Maydell - optional * ulrich.weigand(a)de.ibm.com - optional * Yao Qi - optional * Julian Brown - optional * paul(a)codesourcery.com - optional * Ira Rosen - optional * richard.earnshaw(a)arm.com - optional * Chung-Lin Tang - optional * mark(a)codesourcery.com - optional * linaro-toolchain(a)lists.linaro.org - optional * Richard Sandiford - optional * Marcin Juszkiewicz - optional * zwelch(a)codesourcery.com - optional Your attendance is optional. Event details: https://www.google.com/calendar/event?action=VIEW&eid=aXJlc2Z2bWZmZzQ5cGQ4b… Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account linaro-toolchain(a)lists.linaro.org because you are an attendee of this event. To stop receiving future notifications for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.

14 years, 6 months

1
0
0 0

[ACTIVITY] Dec.6th -- Dec.12th

by Chung-Lin Tang

== GCC related == * PR44557, Thumb-1 ICE: found two small needed corrections in the ARM backend to fix this. Sent patch upstream. * LP:641397/PR46888: bitfield insert optimization. Posted patch along with Andrew Stubbs' CSE patch upstream. The two-patch situation seemed to stir some discussion :) It seems both patches were deemed okay, though it would be better if there was some testcase that passed through unhandled by Andrew's CSE patch, but processed by my combine fix later in the pass pipeline. Both patches queued at GCC bugzilla page, to be handled in the next stage1. * LP:687406/PR46865, -save-temps creating different code. Analyzed problem, though slow to send fix upstream. Had some discussion on the list on how the fix should be like. * PR46667, section type conflicts. Tested and sent mail to gcc-patches. Jan Hubicka picked it up and pinged for an approval again. Hope this get resolved soon. * PR45416, ARM code regression. ARM considerations are mostly okay, main issue remaining is how to solve the x86 regression. The current expand code does a full DImode shift just to obtain a single bit, might be point of improvement to solve this. * LP:685534, ftbfs with gcc-linaro 4.5 on amd64. Found to be another erroneous inline asm case. Fixed and updated on LP. == This week == * More upstream and Linaro GCC issues. * Start dealing with January travel. * Think more about larger (Linaro) GCC optimization projects.

14 years, 6 months

1
0
0 0

[ACTIVITY] Dec 6th -- Dec 12th

by Yao Qi

== Linaro GDB == * LP:616000 Handle -fstack-protector prologue code Understand how frame affect expression validation in GDB. Improve i386 prologue parsing to handle 'and/add' sequence. Revise i386 prologue parsing for stack protector. Patch is not submitted since still lack of i386 prologue knowledge, and not very confident on that patch. Understand prologue-value used in ARM prologue parsing and relationship of symbol and frame in GDB. Make ARM prologue parsing understands stack protector code by identify the code sequence. Improve the patch by supporting ARM mode and ARMv5T, in order to make this patch accepted by upstreams. * LP:615972 gdb.base/gcore.exp failure. The failure is about "corefile restored general registers", which is not related to NEON register support in corefile. It is caused by inconsistent register types and names between tdesc and arm. The cause of this failure is found, but upstreams reviewer doesn't agree on one of my fix. As he suggested, arm-core.xml is modified to add "type=XXX", but get some errors when regenerate arm-with-iwmmxt.c. Filed GDB PR 12308. == GCC == * register rename improvements (LP:633243) Finally, got both middle-end part and ARM part approved. Committed to upstreams mainline. Some benchmarks in EEMBC shows 0.1%~0.2% code size reduction. == This Week == * Ping my GDB patches. * Fix GDB PR 12308, which blocks my fix for LP:615972. * Backport my approved patches to Linaro GDB if any. Fix other GDB bugs. * Pass one GCC patch in my queue to review, if I have extra time. == Vacation == Take vacation on Dec 30th and 31st. Travel to ChengDu, and back on 3rd Jan (It is public holiday in China). Back to work on 4th Jan. -- Yao (齐尧)

14 years, 6 months

1
0
0 0

[ACTIVITY] Weekly status

by Richard Sandiford

== This week == * Away Monday and Tuesday. * Very little the rest of the week due to other IBM commitments. I've just finished the main part of that work, so all being well, it should only need a bit of nannying next week. Most of the week should be Linaro. * Started trying to reproduce #641126, but realised that I'd need to set myself up for general Ubuntu cross package building first. Started to look at what's involved. == Next week == * Get stuck into #641126. * More STT_GNU_IFUNC and vectors. Richard

14 years, 6 months

2
1
0 0

[ACTIVITY] report week 49

by Peter Maydell

RAG: Red: Amber: Green: Milestones: | Planned | Estimate | Actual | finish virtio-system | 2010-08-27 | postponed | | get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 | complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 | finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 | Progress: * merge-correctness-fixes: ** I have sent out an updated ARM fixes pull request, all of whose components have been Reviewed-by: Nathan Froyd; I expect this to be merged shortly. ** vqshl(reg) patch posted to list; I have an update which also addresses vqshl{,u}(imm) which I'll send out as v2 once the first part has been reviewed. ** reviewed and retransmitted Wolfgang's semihosting commandline patches since he is having trouble sending unmangled mail to the list ** went through the monster qemu-maemo commit "Lots of ARM TCG changes" http://meego.gitorious.org/qemu-maemo/qemu/commit/3f17d4e1cb identifying what fixes it includes ** started looking at a VRSQRTS patch. This uncovered a number of qemu issues: NaN propagation is wrong, flush-to-zero handling is only flushing output denormals, not input denormals, and we don't handle the Neon "standard FPSCR value" but always use the real FPSCR. I have some preliminary patches for at least some of this, but since they affect a number of the same bits of code that are touched by existing not-yet-committed patches I'm waiting for those to be committed first. * verify-emulation: ** wrote a README for risu and made it public at http://git.linaro.org/gitweb?p=people/pmaydell/risu.git;a=summary * maintain-beagle-models: ** the ubuntu maverick netbook image doesn't boot on qemu because it uses the OMAP NAND prefetch/DMA, which isn't modelled https://bugs.launchpad.net/qemu-maemo/+bug/645311 I've started on this and am perhaps halfway through (basic prefetch code implemented, but DMA and debugging still to go) * other: ** took part in an OBS mini-sprint where we were walked through how the OBS buildsystem works and can be used to do test rebuilds of Meego with new versions of the toolchain. Meetings: toolchain, pdsw-tools Plans: * finish omap NAND prefetch engine work * make sure ARM changes get committed to qemu... Absences: (complete to end of 2010) Fri 17 Dec - Tue 4 Jan inclusive. 2011: Dallas Linaro sprint 9-15 Jan. Holiday 22 Apr - 2 May.

14 years, 6 months

1
0
0 0

[ACTIVITY] weekly status

by Ken Werner

Hi, * created custom kernel deb packages from the linaro-linux tree in order to * test the various ftrace tracers and profilers available on ARM * results at: https://wiki.linaro.org/KenWerner/Sandbox/ftrace * started to look into crash (kexec, kdump) but wasn't able to generate a kernel dump yet Regards Ken

14 years, 6 months

1
0
0 0

Perfromance Test Results using gcc-linaro-4.5-2010.11-1

by Prashanth S

Dear All Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers 1.. Gentoo Complier(part of Chrome OS Build Environment) 2.. GCC 4.4.1 (Code Sourcery). 3.. Linaro (gcc-linaro-4.5-2010.11-1) Flags used to Build Linaro Tool chain used Michael Hope Script .Just modified "GCCFLAGS = --with-mode=thumb --with-arch=armv7-a --with-float=softfp --with-fpu=neon --with-fpu=vfpv3-d16" a.. Using the above three tool chains we compiled the kernel of Chrome OS and did Coremark Performance test.(With same optimisation flag mentioned in the attachment) b.. Test Environment for all the three are the same. My Questions 1.. Is there any build options that I am missing while I am building the Cross Compiler? 2.. Else is this performance degradation is a know issue and is the tool chain group working on it?.(If so whom to contact?) Any Pointers from you would be of great help to me. If you need any further details also do ping me Regards Prashanth S

14 years, 6 months

5
6
0 0

GCC Optimization Brain Storming Session

by Andrew Stubbs

Hi All, As we discussed on Monday, I think it might be helpful to get a number of knowledgeable people together on a call to discuss GCC optimization opportunities. So, I'd like to get some idea of who would like to attend, and we'll try to find a slot we can all make. I'm on vacation next week, so I expect it'll be in two or three week's time. Before we get there, I'd like to have a list of ideas to discuss. Partly so that we don't forget anything, and partly so that people can have a think about them before the day. I'm really looking for bigger picture stuff, rather than individual poor code generation bugs. So here's a few to kick off: * Costs tuning. - GCC 4.6 has a new costs model, but are we making full use of it? - What about optimizing for size? - Do the optimizers take any notice? [1] * Instruction set coverage. - Are there any ARM/Thumb2 instructions that we are not taking advantage of? [2] - Do we test that we use the instructions we do have? [3] * Constant pools - it might be a very handy space optimization to have small functions share one constant pool, but the way the passes work one function at a time makes this hard. (LP:625233) * NEON - There's already a lot of work going on here, and I don't want it to hog all our time, but it might be worth touching on. What else? I'm not the most experienced person with GCC internals, and I'm relatively new to the ARM specific parts of those, so somebody else must be able to come up with something far more exciting! So, please, get brain-storming! Andrew [1] We discovered recently that combine is happy to take two insns and combine them into a pattern that matches a splitter that then explodes into three insns (partly due to being no longer able to generate pseudo-registers). [2] For example, I just wrote a patch to add addw and subw support (not yet submitted). [3] LP:643479 is an example of a case where we don't.

14 years, 6 months

9
10
0 0

[ACTIVITY] 2010-12-09

by David Gilbert

Mostly more working with libffi; swapping some ideas back and forwards with Marcus Shawcroft and it looks like we have a good way forward. Got an armhf chroot going, libffi built. Got a testcase failing as expected. Trying to look at other processors ABIs to understand why varargs works for anyone else. Cut through one layer of red tape; can now do the next level of comparison in the string routine work. Started looking at SPEC; hit problems with network stability on VExpress (turns out to be bug 673820) long long weekend; short weeks=2; Back in on Tuesday. Dave

14 years, 6 months

1
0
0 0

Silverbell

by David Gilbert

Hi, Those of you use silverbell may be glad to know it's back up. Be a little careful, if you shovel large amounts of stuff over it's network the network tends to disappear. (Not sure if this is hardware or driver) Dave

14 years, 6 months

1
0
0 0

Hard float chroot

by David Gilbert

Hi, As mentioned on the standup, I just got an armhf chroot going, thanks to markos for pointing me at using multistrap I put the following in a armhfmultistrap.conf and did multistrap -f armhfmultistrap.conf Once that's done, chroot in and then do dpkg --configure -a it's pretty sparse in there, but it's enough to get going. Dave ============================================== [General] arch=armhf directory=/discs/more/armhf cleanup=true noauth=true unpack=true explicitsuite=false aptsources=unstable unreleased bootstrap=unstable unreleased [unstable] packages= source=http://ftp.de.debian.org/debian-ports/ keyring=debian-archive-keyring suite=unstable omitdebsrc=true [unreleased] packages= source=http://ftp.de.debian.org/debian-ports/ keyring=debian-archive-keyring suite=unreleased omitdebsrc=true

14 years, 6 months

2
1
0 0

risu instruction set test harness now publicly available

by Peter Maydell

Hi. As part of my work on qemu I've written a simplistic random instruction sequence generator and test harness. To quote the README: risu is a tool intended to assist in testing the implementation of models of the ARM architecture such as qemu and valgrind. In particular it restricts itself to considering the parts of the architecture visible from Linux userspace, so it can be used to test programs which only implement userspace, like valgrind and qemu's linux-user mode. I don't particularly expect this tool to be of much general interest outside people developing either qemu or valgrind or similar models, but I have in any case made it publicly available now: http://git.linaro.org/gitweb?p=people/pmaydell/risu.git;a=tree -- PMM

14 years, 6 months

2
3
0 0

RFC: Dynamic hwcaps

by Dave Martin

Hi all, I'd be interested in people's views on the following idea-- feel free to ignore if it doesn't interest you. For power-management purposes, it's useful to be able to turn off functional blocks on the SoC. For on-SoC peripherals, this can be managed through the driver framework in the kernel, but for functional blocks of the CPU itself which are used by instruction set extensions, such as NEON or other media accelerators, it would be interesting if processes could adapt to these units appearing and disappearing at runtime. This would mean that user processes would need to select dynamically between different implementations of accelerated functionality at runtime. This allows for more active power management of such functional blocks: if the CPU is not fully loaded, you can turn them off -- the kernel can spot when there is significant idle time and do this. If the CPU becomes fully loaded, applications which have soft-realtime constraints can notice this and switch to their accelerated code (which will cause the kernel to switch the functional unit(s) on). Or, the kernel can react to increasing CPU load by speculatively turn it on instead. This is analogous to the behaviour of other power governors in the system. Non-aware applications will still work seamlessly -- these may simply run accelerated code if the hardware supports it, causing the kernel to turn the affected functional block(s) on. In order for this to work, some dynamic status information would need to be visible to each user process, and polled each time a function with a dynamically switchable choice of implementations gets called. You probably don't need to worry about race conditions either-- if the process accidentally tries to use a turned-off feature, you will take a fault which gives the kernel the chance to turn the feature back on. Generally, this should be a rare occurrence. The dynamic feature status information should ideally be per-CPU global, though we could have a separate copy per thread, at the cost of more memory. It can't be system-global, since different CPUs may have a different set of functional blocks active at any one time -- for this reason, the information can't be stored in an existing mapping such as the vectors page. Conversely, existing mechanisms such sysfs probably involve too much overhead to be polled every time you call copy_pixmap() or whatever. Alternatively, each thread could register a userspace buffer (a single word is probably adequate) into which the CPU pokes the hardware status flags each time it returns to userspace, if the hardware status has changed or if the thread has been migrated. Either of the above approaches could be prototyped as an mmap'able driver, though this may not be the best approach in the long run. Does anyone have a view on whether this is a worthwhile idea, or what the best approach would be? Cheers ---Dave

14 years, 6 months

7
17
0 0

[ACTIVITY] November 29th-December 5th

by Julian Brown

== Linaro GCC == * Worked on quad-word/big-endian fixes patch. Sent off a version on Tuesday which worked OK, but which made some awkward changes to the middle-end. Tried to re-think those parts, but without much luck: came to the conclusion that spending more time trying to fix element-ordering-dependent operations on quad-word vectors in big-endian mode was probably not worth the effort (since we plan to be changing things in that area anyway). Wrote a much-simplified patch which simply disables those patterns, and ported it to mainline. * Then, spent some time trying to set up big-endian testing with a mainline build, since the lack of such an option is partly why we got into this mess to start with. My current plan (as well as testing the above patch) is to create an upstreamable patch to easily enable big-endian (Linux) multilibs, in the hope that it'll generally make big-endian testing easier. (Of course people will still need test harness configurations which will allow running big & little-endian code, which most won't have.) * Also, ping lp675347 (volatile bitfields vs. QT atomics), and do some some extra checks suggested by DJ Delorie, which seemed to work out fine. Backported patch for lp629671 to Linaro 4.4 branch, and ran tests (also fine). * Continued discussion of internal representations for fancy vector loads/stores in GIMPLE/RTL on linaro-toolchain.

14 years, 6 months

1
0
0 0

[ACTIVITY] Nov. 19 -- Dec. 05

by Zach Welch

== Last Week == * Continued implementing support for ARM unwind tables in libunwind. * Sent patches upstream to improve binutils's readelf, adding support for all remaining unwind table instructions (i.e. VFP/NEON and WMMX). When used on ARMv7a, provides meaningful output for previously 'unsupported' opcodes that get used in some libraries (e.g. glibc). == This Week == * Continue working on libunwind. -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

14 years, 6 months

1
0
0 0

[ACTIVITY] Nov 29 - Dec 3

by Ulrich Weigand

== GDB == * Posted updated implementation of #661253 (Improve backtrace by using ARM exception tables) to gdb-patches, which includes several changes requested by reviewers * Posted updated patch to further improve backtrace (in the absence of debug info) to gdb-patches * Commented on a couple of GDB LP bugs == Miscellaneous == * Started setting up IGEP v2 board Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

14 years, 6 months

1
0
0 0

qemu linux-user mode issues

by Peter Maydell

Since it came up in the toolchain meeting this morning, some links to issues people are having doing ARM scratchbox-style builds because of generic linux-user issues: http://bugs.meego.com/show_bug.cgi?id=10529 # linux-user's mmap implementation isn't very smart https://bugs.launchpad.net/qemu/+bug/668799 # qemu locking issues which can cause build failures (sometimes) -- PMM

14 years, 6 months

1
0
0 0

[ACTIVITY] week 48

by Marcin Juszkiewicz

https://blueprints.launchpad.net/ubuntu/+spec/other-linaro-n-cross-compilers: - wrote patches for creating backports PPA - each component [1] generates versioned -source binary package (eglibc-2.12.1-source etc) - a-c-t-b [2] got "PPA" boolean variable in rules to have one source package for archive and for backports - I built a-c-t-b with all components backports from natty in lucid pbuilder Bugs: - 684625 - libc6 is compiled for armv5 instead of armv7a - confirmed, wrote fix, will sent for review and merge - 683832 - gcc fails to cross compile Qt - confirmed in maverick for cross gcc 4.4/4.5 - need to check with fixed (bug 684625) toolchain - FTFBS of armel-cross-toolchain-base 1.53/natty - issue is lack of LTO plugin built in gcc/stage2 - have first patches for it, need to test 1. component = eglibc, gcc-4.4/4.5, binutils, linux 2. a-c-t-b = armel-cross-toolchain-base Regards, -- JID: hrw(a)jabber.org Website: http://marcin.juszkiewicz.com.pl/ LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

14 years, 6 months

1
0
0 0

[ACTIVITY] Nov.29 -- Dec.05

by Chung-Lin Tang

== GCC related == * PR44557, Thumb-1 ICE: originally thought a fix of constraint will work, however after simplifying the testcase, received another ICE in postreload, due to a load of IP, which is not permitted in Thumb-1. Looking at some reload internals as part of fixing this. * PR45416, ARM code generation regression. First fix from last week hit an assert FAIL in the alias-oracle due to ARRAY_REFs not being handled there. Also, further found some expand code quality regressions due to this change. Turned to a more conservative fix by adding the related TER substitution to expr.c:do_store_flag(), which produced more focused results. However, 32-bit x86 slightly regressed in the same flag storing code (did not use the 'testl' insn after the change). Still WIP. * PR46667, submitted a section type conflict bug fix upstream, see http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00137.html , which supposedly fixes the upstream ARM-Linux C++ build. Jan Hubicka later gave another fix, so still in discussion. * PR45886, this PR is call for backporting the __ARM_PCS* preprocessor symbols to gcc-4_5-branch. Submitted a mail to ask for approval, no response. == libffi VFP hard-float == * PR46508, libffi VFP assembly error. I missed this earlier due to using a compiler configured with --with-fpu=vfp. Submitted assembly fix to add the needed FPU directives. Committed to upstream trunk. == This week == * Hope to wrap up the above in-progress PRs, as well as continue to look at other PRs of interest. * LP #685534 popped up on Sunday, and manifests on upstream trunk too. Add this to queue. * Think about GCC performance opportunities (Linaro)

14 years, 6 months

1
0
0 0

[ACTIVITY] Nov 29th -- Dec 5

by Yao Qi

== Linaro GCC == * Reproduce regression of my ldm/stm backport on 4.5, which is caused by the other two merged patches in ifcvt.c. Fix them. Propose merge request again. Learn how to sync/merge changes from one branch to the other branch. * Fix VFP_D0_D7 handling in predicate vfp_register_operand. Approved and committed upstreams. * Test new regrename improvement patch on x86_64, and measure effectiveness of it on ARM. Code size of bash-3.2 is reduced 0.2% with option "-march=armv7a -mthumb -O2 -frename-registers". Eric B. is almost OK with this patch except some wording in comments. == Linaro GDB == As discussed in UDS, I'll move to GDB work for gdb correctness. http://ex.seabright.co.nz/helpers/planner#tr-toolchain-gdb-correctness In this month, I'll focus on GDB testsuite failures fixing. * Analyze LP:615978, failures in gdb.base/annota3.exp. Signal is not delivered to child while software single-stepping. The same as LP:649121. * Fix failure in gdb.xml/tdesc-regs.exp. LP:685494 It is caused by a target triplet matching error, when target is set to "armv7l-linux-gnueabi". Target triplet matching in test cases should be changed. Patch is being reviewed in upstreams. * LP:616000 failures caused by -fstack-protector. Homework to understand frame-related code in GDB. Got some big picture of usage of some key data structures inside GDB on frame. Compared with prologue with and without stack-protector, find some difference there. Still no clue on how to educate GDB to identify whether stack-protector is turned on or off. * Fix one failure in printcmds.exp. LP:685702 This test case on 7.2 branch is a little bit out of date, compared with GDB trunk. Backport one patch on trunk to 7.2 branch can fix this problem. Backport patch is being reviewed in upstreams. * Neon registers in kernel dump file. LP:615972 Ask Linaro kernel WG to see how to move forward on this. Discussion is still ongoing. == This Week == * Report the rest of GDB testsuite failures. * Pick up some of them, and fix. * Pass gcc patches in my queue one by one to gcc-patches to review. -- Yao (齐尧)

14 years, 6 months

1
0
0 0

[ACTIVITY] report week 48

by Peter Maydell

RAG: Red: Amber: Green: Milestones: | Planned | Estimate | Actual | finish virtio-system | 2010-08-27 | postponed | | get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 | complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 | finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 | Progress: * merge-correctness-fixes: ** Nathan Froyd (CodeSourcery) has reviewed a lot of my ARM patches. Most were OK, one or two needed tweaking We seem to have come to agreement on how best to treat the API between qemu and the softfloat library, and I have a V2 patchset ready to mail as soon as Nathan has commented on the final patch. ** posted a patch to rename a very misleading _is_nan() function ** identified list of correctness patches in meego and samsung trees and issues noted within ARM ** qemu: posted patch to remove an unused function ** started looking at the first patch in the meego tree, which fixes VQSHL. I have already discovered a bug in this insn not covered by the meego patch... Meetings: toolchain, PD update, ARM 20th birthday party Plans: - qemu consolidation Issues: * Locking in qemu is definitely insufficient, especially (but not exclusively) when running multi-threaded programs in linux-user mode. https://bugs.launchpad.net/qemu/+bug/668799 has an example problem and some discussion; I'm hoping some other qemu developers have an opinion, but the nicest approach IMHO would involve fairly invasive changes to how qemu implements interrupting a cpu which is executing TCG code. Not sure where this should sit in the priority list. Things of note: - there has been some discussion of broadening the "KVM Forum" conference to include other virtualisation related topics including Xen and also the TCG aspects of Qemu. Still all up in the air but possibly colocated with LinuxCon in Vancouver in August. See: http://www.linux-kvm.org/page/KVM_Forum_2011 Absences: (complete to end of 2010) Fri 17 Dec - Tue 4 Jan inclusive. 2011: Dallas Linaro sprint 9-15 Jan. Holiday 22 Apr - 2 May.

14 years, 7 months

1
0
0 0

[ACTIVITY] Weekly status

by Richard Sandiford

== This week == * Looked at a generic bug in GAS's handling of ifuncs. Sent a patch upstream: http://sourceware.org/ml/binutils/2010-11/msg00495.html Alan quite reasonably wanted me to test on a variety of targets. For want of anything better, I wrote a script to test Alan's list of 118 targets. Tests went OK, patch committed upstream. * Wrote more IFUNC tests. Found another problem (as yet unresolved). * Looked at vector stuff, but nothing tangible yet. (I also had to spend some time on other IBM things, sorry.) == Next week == * Away Monday and Tuesday. * More STT_GNU_IFUNC and vectors. Richard

14 years, 7 months

1
0
0 0

[ACTIVITY] 2010-12-03

by David Gilbert

* Benchmarking of simple package builds with various string routine versions; not finding enough difference in the noise to make any large conclusions * Looking at the string routine behaviour with perf to see where the time is going - getting hit by the Linaro kernels on silverbell missing Perf enablement in the config - Useful amount of time does seem to be spent outside the main 'fast aligned' chunks of code - pushing/popping registers does seem to be pretty expensive * Started looking at libffi and hard float - Started writing a spec https://wiki.linaro.org/WorkingGroups/ToolChain/Specs/LibFFI-variadic - It's going to need an API change to libffi, although the change shouldn't break any existing code on existing platforms where they work. * Helping with the image testing Dave

14 years, 7 months

1
0
0 0

[ACTIVITY] weekly status

by Ken Werner

Hi, * got llvm+clang working on ARM: https://wiki.linaro.org/KenWerner/Sandbox/HowToBuildToolchainComponents#llv… * checked whether llvm inlines the __sync_* builtins on ARM or not: https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations#LLVM * developed a patch for #681138 (tested with current gcc-linaro) * spent some time for bootstrapping the GCC trunk in order to test and post that patch on the ml but wasn't successful (finally ran into the issues discussed at #659713) * did some verification work on #674090 * preparing to work on the "investigate current developer tools" item Regards Ken

14 years, 7 months

1
0
0 0

Using inline NEON code

by Michael Hope

Hi there. Currently you can't use NEON instructions in inline assembly if the compiler is set to -mfpu=vfp such as Ubuntu's -mfpu=vfpv3-d16. Trying code like this: int main() { asm("veor d1, d2, d3"); return 0; } gives an error message like: test.s: Assembler messages: test.s:29: Error: selected processor does not support Thumb mode `veor d1,d2,d3' The problem is that -mfpu=vfpv3-d16 has two jobs: it tells the compiler what instructions to use, and also tells the assembler what instructions are valid. We might want the compiler to use the VFP for compatibility or power reasons, but still be able to use NEON instructions in inline assembler without passing extra flags. Inserting ".fpu neon" to the start of the inline assembly fixes the problem. Is this valid? Are assembly files with multiple .fpu statements allowed? Passing '-Wa,-mfpu=neon' to GCC doesn't work as gas seems to ignore the second -mfpu. What's the best way to handle this? Some options are: * Add '.fpu neon' directives to the start of any inline assembly * Separate out the features, so you can specify the capabilities with one option and restrict the compiler to a subset with another. Something like '-mfpu=neon -mfpu-tune=vfpv3-d16' * Relax the assembler so that any instructions are accepted. We'd lose some checking of GCC's output though. -- Michael

14 years, 7 months

4
3
0 0

[ACTIVITY] Nov 29 - Dec 2

by Ira Rosen

- Continued looking into NEON special loads and stores. - Benchmarks: concentrated on EEMBC Telecom: - autcor gets vectorized - viterbi, besides strided data accesses, needs to sink conditional stores to allow if-conversion and make the main loop vectorizable. Since the potential here is 4x, I think it's worthwhile to work on this. - conven, fbital also have control-flow issue, but much more complicated than viterbi - fft has a problem with loop count, I would like to investigate this a bit more - diffmeasure doesn't seem to have vectorization potential - Fixed GCC PR 46663 on trunk, testing the fix for 4.3, 4.4, 4.5.

14 years, 7 months

1
0
0 0

[PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

by Julian Brown

Hi, Here's a work-in-progress patch which fixes many execution failures seen in big-endian mode when -mvectorize-with-neon-quad is in effect (which is soon to be the default, if we stick to the current plan). But, it's pretty hairy, and I'm not at all convinced it's not working "for the wrong reason" in a couple of places. I'm mainly posting to gauge opinions on what we should do in big-endian mode. This patch works with the assumption that quad-word vectors in big-endian mode are in "vldm" order (i.e. with constituent double-words in little-endian order: see previous discussions). But, that's pretty confusing, leads to less than optimal code, and is bound to cause more problems in the future. So I'm not sure how much effort to expend on making it work right, given that we might be throwing that vector ordering away in the future (at least in some cases: see below). The "problem" patterns are as follows. * Full-vector shifts: these don't work with big-endian vldm-order quad vectors. For now, I've disabled them, although they could potentially be implemented using vtbl (at some cost). * Widening moves (unpacks) & widening multiplies: when widening from D-reg to Q-reg size, we must swap double-words in the result (I've done this with vext). This seems to work fine, but what "hi" and "lo" refer to is rather muddled (in my head!). Also they should be expanders instead of emitting multiple assembler insns. * Narrowing moves: implemented by "open-coded" permute & vmovn (for 2x D-reg -> D-reg), or 2x vmovn and vrev64.32 for Q-regs (as suggested by Paul). These seem to work fine. * Reduction operations: when reducing Q-reg values, GCC currently tries to extract the result from the "wrong half" of the reduced vector. The fix in the attached patch is rather dubious, but seems to work (I'd like to understand why better). We can sort those bits out, but the question is, do we want to go that route? Vectors are used in three quite distinct ways by GCC: 1. By the vectorizer. 2. By the NEON intrinsics. 3. By the "generic vector" support. For the first of these, I think we can get away with changing the vectorizer to use explicit "array" loads and stores (i.e. vldN/vstN), so that vector registers will hold elements in memory order -- so, all the contortions in the attached patch will be unnecessary. ABI issues are irrelevant, since vectors are "invisible" at the source code layer generally, including at ABI boundaries. For the second, intrinsics, we should do exactly what the user requests: so, vectors are essentially treated as opaque objects. This isn't a problem as such, but might mean that instruction patterns written using "canonical" RTL for the vectorizer can't be shared with intrinsics when the order of elements matters. (I'm not sure how many patterns this would refer to at present; possibly none.) The third case would continue to use "vldm" ordering, so if users inadvertantly write code such as: res = vaddq_u32 (*foo, bar); instead of writing an explicit vld* intrinsic (for the load of *foo), the result might be different from what they expect. It'd be nice to diagnose such code as erroneous, but that's another issue. The important observation is that vectors from case 1 and from cases 2/3 never interact: it's quite safe for them to use different element orderings, without extensive changes to GCC infrastructure (i.e., multiple internal representations). I don't think I quite realised this previously. So, anyway, back to the patch in question. The choices are, I think: 1. Apply as-is (after I've ironed out the wrinkles), and then remove the "ugly" bits at a later point when vectorizer "array load/store" support is implemented. 2. Apply a version which simply disables all the troublesome patterns until the same support appears. Apologies if I'm retreading old ground ;-). (The CANNOT_CHANGE_MODE_CLASS fragment is necessary to generate good code for the quad-word vec_pack_trunc_<mode> pattern. It would eventually be applied as a separate patch.) Thoughts? Julian ChangeLog gcc/ * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Allow changing mode of vector registers. * config/arm/neon.md (vec_shr_<mode>, vec_shl_<mode>): Disable in big-endian mode. (reduc_splus_<mode>, reduc_smin_<mode>, reduc_smax_<mode>) (reduc_umin_<mode>, reduc_umax_<mode>) (neon_vec_unpack<US>_lo_<mode>, neon_vec_unpack<US>_hi_<mode>) (neon_vec_<US>mult_lo_<mode>, neon_vec_<US>mult_hi_<mode>) (vec_pack_trunc_<mode>, neon_vec_pack_trunc_<mode>): Handle big-endian mode for quad-word vectors.

14 years, 7 months

5
16
0 0

Building a cross compiler

by Michael Hope

Hi there. I've had a few questions recently about how to build a cross compiler, so I took a stab at writing the steps down in a Makefile. See: https://code.launchpad.net/~michaelh1/+junk/cross-build Hopefully it's easy to follow. It uses a binary sysroot and gives you vanilla binutils 2.20 and Linaro GCC 2010.11 in a good enough way that you can cross-compile for Maverick. The script is minimal and trades readability for flexibility. Note that Marcin's cross compiler packages or the Embedian toolchains are a better way to go, but if you want to see the steps involved check out the script. Marcin or Matthias, would you mind reviewing it? -- Michael

14 years, 7 months

2
1
0 0

[ACTIVITY] November 21-25

by Ira Rosen

Hi, - the struggle with the board took a lot of time - continued to investigate special loads/stores - looked for benchmarks: EEMBC Consumer filters rgbcmy and rgbyiq should be vectorizable once vld3, vst3/4 are supported EEMBC Telecom viterbi is supposed to give 4x on NEON once vectorized (according to http://www.jp.arm.com/event/pdf/forum2007/t1-5.pdf slide 29). My old version of viterbi is not vectorizable because of if-conversion problems. I'd be really happy to check the new version (it is supposed to be slightly different). Looking into other EEMBC benchmarks. FFMPEG http://www.ffmpeg.org/ (got this from Rony Nandy from User Platforms). It contains hand-vectorized code for NEON. Investigating. I am probably taking a day off on Sunday. Ira

14 years, 7 months

2
2
0 0

microoptimising atomic memory ops

by Peter Maydell

This wiki page came up during the toolchain call: https://wiki.linaro.org/Internal/People/KenWerner/AtomicMemoryOperations/ It gives the code generated for __sync_val_compare_and_swap as including a push {r4} / pop {r4} pair because it uses too many temporaries to fit them all in callee-saves registers. I think you can tweak it a bit to get rid of that: # int __sync_val_compare_and_swap (int *mem, int old, int new); # if the current value of *mem is old, then write new into *mem # r0: mem, r1 old, r2 new mov r3, r0 # move r0 into r3 dmb sy # full memory barrier .LSYT7: ldrex r0, [r3] # load (exclusive) from memory pointed to by r3 into r0 cmp r0, r1 # compare contents of r0 (mem) with r1 (old) -> updates the condition flag bne .LSYB7 # branch to LSYB7 if mem != old # This strex trashes the r0 we just loaded, but since we didn't take # the branch we know that r0 == r1 strex r0, r2, [r3] # store r2 (new) into memory pointed to by r3 (mem) # r0 contains 0 if the store was successful, otherwise 1 teq r0, #0 # compares contents of r0 with zero -> updates the condition flag bne .LSYT7 # branch to LSYT7 if r0 != 0 (if the store wasn't successful) # Move the value that was in memory into the right register to return it mov r0, r1 dmb sy # full memory barrier .LSYB7: bx lr # return I think you can do a similar trick with __sync_fetch_and_add (although you have to use a subtract to regenerate r0 from r1 and r2). On the other hand I just looked at the gcc code that does this and it's not simply dumping canned sequences out to the assembler, so maybe it's not worth the effort just to drop a stack push/pop. -- PMM

14 years, 7 months

4
4
0 0

[ACTIVITY] November 22nd-28th

by Julian Brown

== Linaro GCC == * Finished testing patch for lp675347 (QT inline-asm atomics), and send upstream for comments (no response yet). Suggested reverting a patch (which enabled -fstrict-volatile-bitfields by default on ARM) locally for Ubuntu in the bug log. * Continued working on NEON quad-word vectors/big-endian patch. This turned out to be slightly fiddlier than I expected: I think I now have semantics which make sense, though my patch requires (a) slight middle-end changes, and (b) workarounds for unexpected combiner behaviour re: subregs & sign/zero-extend ops. I will send a new version of the patch to linaro-toolchain fairly soon for comments.

14 years, 7 months

1
0
0 0

A question about thumb2 cbnz/cbz implementation in thumb2.md

by Revital1 Eres

Hello, I have a question about cbnz/cbz thumb-2 instruction implementation in thumb2.md file: I have an example where we jump to a label which appears before the branch; for example: L4 ... cmp r3, 0 bne .L4 It seems that cbnz instruction should be applied in this loop; replacing cmp+bne; however, cbnz fails to be applied as diff = ADDRESS (L4) - ADDRESS (bne .L4) is negative and according to thumb2_cbz in thumb2.md it should be 2<=diff<=128 (please see snippet below taken from thumb2_cbz). So I want to double check if the current implementation of thumb2_cbnz in thumb2.md needs to be changed to enable it. The following is from thumb2_cbnz in thumb2.md: [(set (attr "length") (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int 2)) (le (minus (match_dup 1) (pc)) (const_int 128)) (eq (symbol_ref ("which_alternative")) (const_int 0))) (const_int 2) (const_int 8)))] Thanks, Revital

14 years, 7 months

2
4
0 0

[ACTIVITY] Weekly status

by Richard Sandiford

== This week == * More ARM testing of binutils support for STT_GNU_IFUNC. * Implemented the GLIBC support for STT_GNU_IFUNC. Simple ARM testcases seem to run correctly. * Ran the GLIBC testsuite -- which includes some ifunc coverage -- but haven't analysed the results yet. * Started looking at Thumb for STT_GNU_IFUNC. The problem is that BFD internally represents Thumb symbols with an even value and a special st_type (STT_ARM_TFUNC); this is also the old, pre-EABI external representation. We need something different for STT_GNU_IFUNC. * Tried making BFD represent Thumb symbols as odd-value functions internally. I got it to "work", but I wasn't really happy with the results. * Looked at alternatives, and in the end decided that it would be better to have extra internal-only information in Elf_Internal_Sym. This "works" too, and seems cleaner to me. Sent an RFC upstream: http://sourceware.org/ml/binutils/2010-11/msg00475.html * Started writing some Thumb tests for STT_GNU_IFUNC. * Investigated #618684. Turned out to be something that Bernd had already fixed upstream. Tested a backport. == Next week == * More IFUNC tests (ARM and Thumb, EGLIBC and binutils). Richard

14 years, 7 months

2
2
0 0

[ACTIVITY] Nov.22 -- Nov.28

by Chung-Lin Tang

== Linaro and upstream GCC == * LP #674146, dpkg segfaults during debootstrap on natty armel: analyzed and found this should be a case of PR44768, backported mainline revision 161947 to fix this. * LP #641379, bitfields poorly optimized. Discussed some with Andrew Stubbs. * GCC bugzilla PR45416: Code generation regression on ARM. Been looking at this regression, that started from the expand from SSA changes since 4.5-experimental. The problem seems to be TER not properly being substituted during expand (compared to prior "convert to GENERIC then expand"). I now have a patch for this, which fixes the PR's testcase, but when testing current upstream trunk, hit an assert fail ICE on several testcases in the alias-oracle; it does however, test without regressions on a 4.5 based compiler. I am still looking at the upstream failures. == This week == * Continue with GCC issues and PRs. * Think about GCC performance opportunities (Linaro)

14 years, 7 months

1
0
0 0

[ACTIVITY] Nov. 22 -- Nov. 28

by Zach Welch

== Last Week == * Started writing libunwind support for ARM-specific unwinding, but realized that the native ARM toolchain may be causing problems. Spent time trying to isolate the issues, but haven't found the culprit yet. * Began to consider the possibility of developing an ARM-specific unwinding library which would integrate into libunwind (and be reusable elsewhere). Basically, the ARM.ex{idx,tbl} sections are unique to ARM. * Looked at other applications where unwinding functionality already exists to see how ARM unwinding is done (e.g. GDB). Could/should that functionality be replaced with calls to libunwind (or to the aforementioned ARM-specific helper library)? == This Week == * Continue to implement ARM-specific unwinding in libunwind. -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

14 years, 7 months

1
0
0 0

[ACTIVITY] 22nd -- 28th Nov.

by Yao Qi

== Linaro GCC == * LP:634738: Firstly, fix this in combiner. Try the other approach (without changes to arm.md) suggested by Andrew S, to fix arm_gen_constant in some cases to generate lsl/lsr ranther loading constant. Some piece of code in arm.c was written in 1998, hard to understand with few comments. During this, find some lsl/lsr can be replaced by ubfx. Use gen_extzv_t2 when arm_arch_thumb2 is true to transform lsl + lsr to ubfx. Two patches are ready. * LP:633243: Got build failures on FSF trunk for arm-none-linux-gnueabi. Test patch on FSF trunk 2010-10-21. No regression. * LP:638935: predicate "vfp_register_operand" should return true for VFP_D0_D7_REGS registers. Fixed. predicates {store,load}_multiple_operation assumes mode is SImode, and size of data is 4. Fix them to accept multiple VFP operations. Write three new test cases for stm/fldm/fstm pattern. Test patch on FSF trunk 2010-10-21. No regression. * SMS on thumb2. Discussed with Revital Eres back and forth on doloop pattern for thumb2. doloop pattern is not recognized so far on thumb2. Revital has a fix to thumb2_cbz pattern. After this fix, doloop pattern should be recognized. * Ping ARM fix PR45701 in gcc-patches for the fifth time. Still no reply. == This Week == * Look at regressions of ldm/stm backport on Linaro GCC 4.5. * Internal review of patch to LP:638935 * Try SMS on thumb2 for EEMBC, if Revital's thumb2_cbz pattern fix is accepted by upstreams. -- Yao (齐尧)

14 years, 7 months

1
0
0 0

New data centre host

by Michael Hope

Hi there. Our Versatile Express has been installed in the data centre and is available for use. See: https://wiki.linaro.org/WorkingGroups/ToolChain/Hardware for the details. If you're a member of the Toolchain WG then you should already have an account. Dave is currently using this machine for benchmarking. Until we get more hardware, please use IRC or email to manage access. -- Michael

14 years, 7 months

1
0
0 0

[ACTIVITY] 22nd - 28th November

by Andrew Stubbs

Reviewed Yao's patch for AND optimization. Some back on forth on the best way to tackle this problem. LP:663939 - thumb2 constant loading - backported my patches to GCC 4.5 - awaiting review LP:595479 - .eh_frame broken. - Discovered this problem had been fixed (with Thomas' patch) since August, and has also been fixed upstream, albeit with an alternative patch. Nothing to do here. LP:641379 - bitfields poorly optimized. - analysed the problem. The code in cse.c that is supposed to fix this does not recognise the case. - created a patch and tested it for both GCC 4.6 and 4.5. - awaiting review LP:674146 - dpkg segfault. - started looking at this, but Chung-Lin took it first. While trying to reproduce lp:674146, I discovered that my IGEPv2 had a corrupted rootfs, again. I only fixed it last week, so I looked into it more deeply. It seems the SD card has developed at least one bad block. Reformatted, scanned and reinstalled the files from backup. I think the problem was caused by the daily apt package download (it was always those files that were corrupt), so I've disabled that. I've also disabled access-time-stamps. If it happens again I will have to consider using a different underlying filesystem format. LP:643479 / CS Issue:8610 - Multiply and accumulate optimization - created patches for both issues. - both were machine description subtleties. - backported the patches to 4.5 - the patches apply and work fine, but ... - found an extra problem with redundant moves - awaiting review GCC 4.6 - Created a new Launchpad series and branch to track GCC 4.6 development. - Set up the CS internal build config. - Tried to build the latest checkout and failed - glibc problem still unfixed - Jie has reported it now. - libquadmath build fails Merged FSF GCC trunk (pre-4.5.2) into Linaro GCC 4.5 tree. Merged the outstanding Launchpad merge requests into GCC 4.5. The testing showed regressions, so I backed out most of the merges and did them in smaller batches. Chung-Lin and Richard's patches passed the testing, so that leaves Yao's as the problem patch. I didn't get time to test this assertion this week. ---------------------------------- Next week: Vacation.

14 years, 7 months

1
0
0 0

Re: GCC Optimization Brain Storming Session

by Steven Bosscher

Andrew Stubbs wrote: > * Instruction set coverage. > - Are there any ARM/Thumb2 instructions that we are not taking > advantage of? [2] > - Do we test that we use the instructions we do have? [3] There is no general frame work to test instruction set coverage. The only way to find out, really, is to create some test cases where you expect the compiler to produce a certain insn. Is there a list of all ARM/Thumb2 instructions and the ones implemented in the GCC ARM machine descriptions? > * Constant pools > - it might be a very handy space optimization to have small > functions share one constant pool, but the way the passes work one > function at a time makes this hard. (LP:625233) There are also passes working on the entire program, or a partition. Isn't it more a question of how to group and process functions that are candidates for sharing a constant pool with a neighbor? Are there algorithms for this kind of pool sharing in the academic or ARM-specific literature? Other suggestions for the discussion: * Better use of conditional execution. - No idea how much this really helps for ARM, but there are bug reports about missed opportunities from time to time, so... - How to model conditional execution before register allocation? - How exploit opportunities better in GCC (ifcvt is inadequate and too late in the pipeline). - Also look at LLVM here, it appears to have a better cost model for if-conversion than GCC (taking into account a target-dependent branch misprediction penalty, for example). * Basic block re-ordering for speed/size. - The existing basic block reordering pass in GCC implements only a reordering strategy for speed. - The pass does not run at all for functions optimized for size. * Comparing ARM cost models and param settings to x86_64 - Compare, for some set of functions/benchmarks, the results of estimate_num_insns, estimate_operator_cost, and estimate_move_cost, between ARM and x86_64. Rationalize or fix any significant differences. See whether heuristics based on these functions require tuning for ARM. - Go through params.def and see if there are further ARM tuning opportunities. There are more than 100 DEFPARAMs and many of them guide heuristics but have only been tuned on x86_64. (There is set_default_param_value, but most backends do not change the defaults.) Hoping this is helpful, Ciao! Steven

14 years, 7 months

1
0
0 0

Silverbell dchroots

by Christian Robottom Reis

David G. requested a few packages installed on silverbell today (the quad-A9 VE porter machine we host in the datacenter). We got dchroots instead: ----- Forwarded message from LaMont Jones via RT <rt(a)admin.canonical.com> ----- Date: Fri, 26 Nov 2010 21:02:22 +0000 Subject: [rt.admin.canonical.com #42662] Simple package installs on silverbell On Fri Nov 26 17:08:28 2010, kiko(a)canonical.com wrote: > Hi there, > > Could we get installed on silverbell: > > build-essential > debhelper > fakeroot > > And could we get deb-src's added to sources.list and do an apt-get > update to allow us to apt-get source certain packages? This is for > simple compilation benchmarks. Thanks! Dchroot environments have been created on silverbell for both maverick and natty. Within the chroot, you can sudo apt-get install to install packages. apt-get update/dist-upgrade and installs that cause package removal will require a GSA to do them. To build for maverick: dchroot -c maverick (and then build however you want...) lamont ----- End forwarded message ----- -- Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko

14 years, 7 months

1
0
0 0

[ACTIVITY] weekly status

by Ken Werner

Hi, * the ARM __sync_* glibc-ports patch was accepted upstream * posted proposal for consolidating sync primitives but stdatomic seems to be the future * used my small gcc testsuite patch to verify __sync_* support of the gcc- linaro * created: https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations * looked into GOMP support on ARM: - #pragma omp atomic results in proper asm code (dmb, ldrex, strex, dmb) - #pragma omp flush results in a DMB instruction - #pragma omp barrier results to a call to GOMP_barrier (I'm not sure if this is the desired behavior) * started to look into #681138 Regards Ken

14 years, 7 months

1
0
0 0

[ACTIVITY] 2010-11-26

by David Gilbert

Hand crafted a simple strchr and comparing it with Libc: https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr It's interesting it's significantly faster than libc's on A9's, but on A8's it's slower for large sizes. I've not really looked why yet; my implementation is just the absolute simplest thumb-2 version. Did some ltrace profiling to see what typical strchr and strlen sizes were, and got a bit surprised at some of the typical behaviours (Lots of cases where strchr is being used in loops to see if another string contains anyone of a set of characters, a few cases of strchr being called with Null strings, and the corner case in the spec that allows you to call strchr with \0 as the character to search for). Trying some other benchmarks (pybench spends very little time in libc,package builds of simple packages seem to have a more interesting mix of libc use). Sorting out some of the red tape for contributing. Dave

14 years, 7 months

1
0
0 0

Notes on mixing D16/D32 code

by Michael Hope

It's a bit of a newbie question, but I've been wondering if you can intermix hard float VFPv3-D16 code with VFPv3-D32 code. You can as: According to the ABI: * d0-d15 are used for floating point parameters, no matter if you are D16 or D32 * d0-d15 are not preserved across function calls * d16-d31 must be preserved across function calls The scenarios are: A D32 function calls a D16 function: * The first 16 (!) parameters are passed in D0-D15 * Any remaining are passed on the stack * The D16 function doesn't know about D16-D31, doesn't use them, and hence preserves them A D16 function calls a D32 function: * The first 16 parameters are passed in D0-D15 * Any remaining are passed on the stack * The D32 function preserves any of the D16-D31 registers that it uses. Redundant, but fine. A D32 function (A) calls a D16 function (B) which calls a D32 function (C): * Parameters are OK, as above * B doesn't use D16-D31 and hence preserves them * C preserves any of the D16-D31 that it uses, which preserves them from A's point of view -- Michael

14 years, 7 months

2
3
0 0

[ACTIVITY] report week 47

by Peter Maydell

(short week: only three days) RAG: Red: Amber: Green: qemu: initial pull req sent; vfp-in-sighandlers patchset sent Milestones: | Planned | Estimate | Actual | finish virtio-system | 2010-08-27 | postponed | | get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 | complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 | finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 | Progress: * qemu: final polish on a patchset for saving/restoring VFP and iWMMXT registers across linux-user mode signal handlers; patch series sent to mailing list * qemu: sent a pull request for a small set of ARM fixes (make SMC undef; fix PXHxx; fix saturating add/sub; fix VCVT) * reviewed arm semihosting SYS_GET_CMDLINE patch v2 * I now have enough qemu patches in flight that I'm tracking them at https://wiki.linaro.org/PeterMaydell/QemuPatchStatus (simple manual list for now, hopefully will be sufficient) Meetings: toolchain, pdsw-tools Plans - qemu consolidation Absences: (complete to end of 2010) Thu/Fri 25-26 Nov; Fri 17 Dec - Tue 4 Jan inclusive. (Dallas Linaro sprint 9-15 Jan.)

14 years, 7 months

1
0
0 0

__sync barriers

by Richard Sandiford

For the record, the thing I half-remembered on the call was: http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html and: http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html The problem is that all __sync operations besides __sync_lock_test_and_set and __sync_lock_release are defined to be full barriers. Using something like __sync_val_compare_and_swap for __arch_compare_and_exchange_val_*_acq and __arch_compare_and_exchange_val_*_rel may on some architectures be too heavyweight, since those macros only need acquire/after and release/before barriers. See in particular: http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00928.html from the first thread, where the feeling was that the future wasn't these __sync builtins, but the new C and C++ atomic memory support. Probably already known, sorry. I just wasn't sure that trying to convert everyone (not just ARM) to __sync_* was necessarily going to go down well. Richard

14 years, 7 months

3
2
0 0

Status Report 11-22-2010

by Zach Welch

== Last Week == * Reached the point with understanding libunwind where I can begin writing patches for parsing unwind information out of .ARM.exidx and .ARM.extab ELF sections. == This Week == * Begin writing support for ARM-specific unwind information to libunwind. -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

14 years, 7 months

1
0
0 0

[ACTIVITY] November 15-21st

by Julian Brown

== Linaro GCC == * Continued looking at big-endian/quad-vector patch: attempted to figure out the proper semantics for vec_extract in big endian mode (about 1 day). Put on hold temporarily to work on lp675347, QT failing to build due to constraint failure in inline asm statements used for atomic operations: found the patch which introduced the failure, and suggested a workaround to the OP. Came up with a plausible-looking patch, and started testing it, after spending some time trying to figure out why ARM Linux mainline doesn't build at present. Patch sent upstream.

14 years, 7 months

1
0
0 0

Of instruction timings

by David Gilbert

Hi Richard, As per the discussion at this mornings call; I've reread the TRM and I agree with you about the LSLS being the same speed as the TST. (1 cycle) However as we agreed, the uxtb does look like 2 cycles v the AND 1 cycle. On the space v perf theme, one thing that would be interesting to know is whether there are any icache/issue stage limitations; i.e. if I have a stream of 32-bit Thumb-2 instructions that are all listed as 1 cycle and are all in i-cache, can they be fetched and issued fast enough, or is there a performance advantage to short instructions? Dave

14 years, 7 months

1
0
0 0

[ACTIVITY] 15th - 19th November

by Andrew Stubbs

LP:663939 - Thumb2 constants * Continued testing, found a few bugs. Tidied a few bits up. * Wrote some new testcases to go with the patch. LP:618684 - ICE * Begun looking at this one. So far I can't reproduce it. I have a debuggable native toolchain building, but it'd been delayed by hardware issues. In the course of testing I discovered that the ARM FSF config wasn't testing the right thing, so begun work on a new, more appropriate FSF build/test config for Linaro work. Also found the the SD card rootfs in my IGEPv2 board was corrupted. I've restored it from backup, and now it's working once more.

14 years, 7 months

1
0
0 0

[ACTIVITY] Nov.15 -- Nov.21

by Chung-Lin Tang

== Linaro and upstream GCC == * Linaro launchpad issues: - LP #672833, x64-64 varargs regression: after testing pushed bzr branch for merging. - LP #634738, inefficient low bit extraction: some discussion with Yao. - LP #618684, ICE when building ziproxy: looked into and quickly found not reproducible anymore of Linaro 4.5 trunk. * Worked on some GCC bugzilla PRs: - PR44557, ICE in Thumb-1 secondary reload: this should be fixed by a change of the scratch operand constraint of "reload_inhi" from "r" to "l". Interesting to note that this was from the merged-arm-thumb-backend-branch merge, from about 10 years ago. - PR46508: libffi fails to build on VFP asm instructions, seems to need a '.fpu vfp' directive. Probably missed earlier because my toolchain was configured with --with-fpu=vfp. - PR45416: 4.6 code generation regression on ARM, after expand from SSA changes. Looking at this currently. == This week == * Look at Linaro issues with higher priority. * Continue working on GCC PRs.

14 years, 7 months

1
0
0 0

[ACTIVITY] (Yao Qi) Nov 15th -- 21st

by Yao Qi

== Linaro GCC == * Merge ldm/stm patch to Linaro 4.5 tree. Found two regressions on the last minute of proposing merge request in pass ce3. Revert one of ldm/stm patches about ifcvt. Complete testcase in branch. * Try Richard E.'s "TST to LSLS transformation" patch on cortex-a9 with FFMPEG. No speed improvements. * Various Linaro GCC Bug fixing. ** LP:634738 Follow the fix to GCC PR40697, and create a new patch, which emits extzv or shift rather than loading constants in some cases. Tested on FSF GCC trunk, and no regression. However, found a regression by eyes in pr44999.c, in which, ubfx (4byte) is generated, rather than uxth (2byte). uxth is produced by combiner from ashift and lshiftrt. During reading arm.c, find that constant handling in thumb2 should be improved to some extent. ** LP:633243 Re-implement regrename improvement, as Eric B. suggested in gcc-patches. Spend some time on understanding API in GCC related to hard-reg. Tested on x86_64-linux. No regression. ** LP:638935 Update my tree to FSF trunk, and find RTL seq for fldm/fstm peephole disappears due to fix to PR45722. Extend arm-ldmstm.ml to support vfp. Peephole and RTL patterns for vfp are done. Will revise arm.c:{load,store}_multiple_sequence to accept vfp data. Fix a bug in ldm/stm peephole when starting offset is negative. == This week == * LP:634738: Figure out how uxth is produced by combiner. * LP:633243: Test it on ARM. * LP:638935: Revise {load,store}_multiple_sequence to accept vfp data. -- Yao (齐尧)

14 years, 7 months

1
0
0 0

GCC SVN vs. BZR/LP

by Andrew Stubbs

Re my recent email "Upstream GCC feature freeze", I think we're agreed that we need to create a branch that tracks GCC 4.6 development, but has our own performance improvements included. The question is where to host it? Option 1: Launchpad/bzr Pros: * We need no permission to do it * The branch will naturally evolve into our 4.6 release series in time. * The 3-way merge works well (if slowly) * We can include patches that we have no intention of posting upstream ever * Our patch tracker will Just Work. * Merge requests will be available. Cons: * Bzr ;) * It's hidden away from the view of most GCC developers Option 2: GCC SVN branch Pros: * We can work in the open, submitting patches via gcc-patches, as usual * The final merge to GCC trunk (come stage 1) will be eased, a little Cons: * We can't really apply anything we want just for ourselves * we may end up maintaining an LP branch shadowing the svn branch * When we do want to do 4.6 in LP, we'll have to backport all our patches from 4.7, and this may no longer be straightforward. * Write permissions not clear. * Although I think you can just go ahead and do it? OK, so I'm sure I've missed some big ones. Please discuss! ;) I think the big question here is, when will we start wanting to make (unstable/experimental) Linaro GCC 4.6 releases? If we want to do it early, then we'll have no choice but to have an LP branch to release from. Andrew

14 years, 7 months

6
13
0 0

[ACTIVITY] week 46

by Marcin Juszkiewicz

Like everyone from Toolchain WG I will share my activites in last week: 1. cross compilers for archive - discussed with doko about dropping update-alternatives use - wrote gcc-defaults-armel-cross 1.4 which does proper symlinks for cross compilers - wrote gcc-4.5-armel-cross 1.41 which removes update-alternatives support - wrote gcc-4.4-armel-cross 1.37 which removes update-alternatives support - wrote armel-cross-toolchain-base 1.53 which has all updates which I had - sent all of them to Steve for review Status of changes: - default version of armel cross compiler will be 4.5 like it is in Natty - both 4.4 and 4.5 will be provided as it is for native - any traces of update-alternatives use should be removed Needs to be done: - adding conflicts on older cross compilers to gcc-defaults-armel-cross Order of upload to archive: - armel-cross-toolchain-base - gcc-4.5-armel-cross - gcc-4.4-armel-cross - gcc-defaults-armel-cross 2. Checked few old bugs do they still apply: - Bugs #646729, #637454, #671455 are done with armel-cross-toolchain-base 1.52 (landed in maverick-proposed) Regards, -- JID: hrw(a)jabber.org Website: http://marcin.juszkiewicz.com.pl/ LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

14 years, 7 months

1
0
0 0

[ACTIVITY] 2010-11-19

by David Gilbert

Short week. Finally got external hard drive for my beagle - makes it sanely possible to natively build things. Got eglibc cross built (Thanks to Wookey for pointing me in the right direction with the magic incantation of dpkg-buildpackage -aarmel --target=binary) and easily rebuilding . I have a version with the neon version of my memset built into it - it doesn't seem to make a noticeable difference to my ghostscript benchmark though. Panda's aren't likely to turn up until mid December; arranging borrowing an A9 is turning out to be difficult, but it looks like we should be able to get access to the one in the London datacentre - although it has a disc problem at the moment. I did manage to get a colleague to try my tests on his own Toshiba AC-10 (Tegra-2 - no Neon); the graphs had approximately the same shape as my previous Panda tests. Memchr looked pretty good on there. Also trying to look at the sign off I need for various libc access. Dave

14 years, 7 months

1
0
0 0

[ACTIVITY] weekly status

by Ken Werner

I mainly worked on the atomic memory operations blueprint/item: * posted an updated patch for #643171 on the libc-ports ml after running the glibc testsuite natively on the vexpress * continued to learn about the ARM instructions involved :) * started to write some gcc testcases that scan the asm output of the __sync builtins (mainly to detect differences between the gcc versions - not sure how useful those tests would be for upstream as the sequences may easily change) Ken

14 years, 7 months

1
0
0 0

[ACTIVITY] report week 46

by Peter Maydell

RAG: Red: Amber: Green: Milestones: | Planned | Estimate | Actual | finish virtio-system | 2010-08-27 | postponed | | get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 | complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 | finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 | Progress: * Most of this week spent at the Meego conference in Dublin. This seemed to be a rather apps-developer centric conf, with not much of interest on the low-level side. There were a few useful talks/conversations, though. * Intel were giving away Atom-based netbooks to all attendees; that's a lot of developers who are going to be testing and optimising their apps for Atom devices rather than ARM... * qemu: looked at https://bugs.launchpad.net/bugs/668799 ; we don't seem to be taking the right lock before we manipulate the graph of translation blocks. I have a fix which stops the reported segfault, but the code has a number of "XXX not thread safe" and "FIXME: not SMP safe" comments and generally doesn't seem to have a coherent locking design :-( * qemu: sent some minor patches upstream: + enable iwmmxt coprocessors in user mode + remove some unused functions from target-arm and target-sparc + fix a failure to build bug in a makefile * qemu: some review of a patch to fix semihosting SYS_GET_CMDLINE Plans - qemu consolidation - post-toolchain-review, sort out some milestones for this report Absences: (complete to end of 2010) Thu/Fri 25-26 Nov; Fri 17 Dec - Tue 4 Jan inclusive. (Dallas Linaro sprint 9-15 Jan.)

14 years, 7 months

1
0
0 0

[ACTIVITY] Weekly status

by Richard Sandiford

== This week == Started looking at STT_GNU_IFUNC support in BFD. There were a couple of janitorial changes I needed to make in order to prepare elf32-arm.c for the main patch. I tested those separately and submitted them upstream: http://sourceware.org/ml/binutils/2010-11/msg00330.html http://sourceware.org/ml/binutils/2010-11/msg00331.html I've now finished a prototype implementation of the STT_GNU_IFUNC support itself. It wasn't as mechanical as I'd originally assumed, which was nice. Tests that I've run by hand seem to be doing the right thing. I've now started writing tests for the testsuite (meaning: I've completed 1 test so far). == Next week == * Add more tests, including Thumb coverage. * Start on the libc changes. Richard

14 years, 7 months

1
0
0 0

gcc seg fault on kernel build

by Rob Herring

Doing an allmodconfig build on the kernel, I get the following: CC arch/arm/kernel/asm-offsets.s In file included from /home/rob/proj/git/linux-2.6-dt/include/linux/kernel.h:12, from /home/rob/proj/git/linux-2.6-dt/include/linux/sched.h:54, from /home/rob/proj/git/linux-2.6-dt/arch/arm/kernel/asm-offsets.c:13: /usr/lib/gcc/arm-linux-gnueabi/4.4.5/include/stdarg.h:40: internal compiler error: Segmentation fault It occurs on Maverick 4.4, 4.5 and CodeSourcery 2009Q1 cross toolchains. It's confirmed by Codesourcery here: http://www.codesourcery.com/archives/arm-gnu/msg03719.html What's the status on this issue? I didn't see anything in Linaro gcc bugs that looks related. Rob

14 years, 7 months

4
5
0 0

[ACTIVITY] November 14-18

by Ira Rosen

Hi, This week I continued looking into vld/vst support in GCC. I also fixed GCC PR 46312 - testsuite failures on ARM. Ira

14 years, 7 months

1
0
0 0

STT_GNU_IFUNC and R_ARM_IRELATIVE

by Richard Sandiford

The STT_GNU_IFUNC blueprint: https://wiki.linaro.org/WorkingGroups/ToolChain/Specs/Binutils-STT_GNU_IFUNC says "the ARM EABI will be updated to support STT_GNU_IFUNC's requirements". I suppose the most obvious thing that needs to be defined is the relocation number for R_ARM_IRELATIVE. What's the best way of handling that? The main options seem to be: 1. Reserve a relocation number with ARM first (129?). 2. Go ahead and implement it without having the EABI updated. See whether the results are good before deciding whether to bless it in the EABI. 3. Since STT_GNU_IFUNC is a GNU-specific, treat R_ARM_IRELATIVE as GNU-specific too, and pinch one of the R_ARM_PRIVATE relocs. I'm pretty sure (3)'s not the way to go, but I was aiming for completeness. :-) Richard

14 years, 7 months

3
3
0 0

Fwd: Fw: GCC SVN vs. BZR/LP

by Ira Rosen

Hi, On 17 November 2010 05:35, Michael Hope <michael.hope(a)linaro.org> wrote: > 1. How easy is it to frequently merge in SVN? It used to be terrible > as you had to manually track the merges. These days can you do a 'svn > merge trunk' and have it just work? I asked Mike Meissner to answer this question. Mike is very experienced in GCC and GCC SVN branch management. I am attaching his reply. Ira I sent this recently to ppc64-toolchain(a)linux.ibm.com on how to use svnmerge to manage branches: This script (also ~meissner/meissner/bin.sh/svnmerge) is what I use to update svn directories, such as ibm-gcc-4_5-branch. I think I originally got it from Ben E. and it may be in the contrib directory. Typically the way I start a branch, such as my normal power7-meissner branch, I do the following: $ export TRUNK="svn+ssh://@gcc.gnu.org/svn/gcc/trunk" $ export BNAME="power7-meissner" $ export BRANCH="svn+ssh://@gcc.gnu.org/svn/gcc/branches/ibm/$BNAME" $ export SRC="$HOME/fsf-src" $ svn delete -m"delete old branch" $BRANCH $ svn copy -m"Clone new branch" $TRUNK $BRANCH $ cd $SRC $ svn co $BRANCH $ cd $BNAME $ svnmerge init $ svn update # this is sometimes needed $ svn commit -m'Create svnmerge init info' $ export REV="xxxx" # substitute subversion id for xxxxx $ echo "power7-meissner branch, based on $REV." > gcc/REVISION $ touch gcc/ChangeLog.power7 $ <edit gcc/ChangeLog.power to create initial contents> $ svn add gcc/ChangeLog.power gcc/REVISION $ svn commit -m'Add REVISION to branch' In particular, creating GCC/REVISION allows you to tell what subversion revision the source is based against. You can find the information via: $ svn propget svnmerge-integrated but it is a lot easier if you have a compiler tree to do gcc -v. After you do a propget, you will need to do a svn update. In this case, I use gcc/ChangeLog.power7 to hold the ChangeLog entries local to the branch. That way I can see a summary of the changes, but not pollute the normal ChangeLog files. To do merges, you need to make sure that all local changes are checked into the branch. Then do: $ cd $SRC/$BNAME $ svnmerge merge $ <edit gcc/REVISION and ChangeLog.power7 to indicate merge> $ <test merged files, if satisified, check them in> $ export REV="xxxx" # substitute subversion id for xxxxx $ svn update # just in case $ svn commit -m"Update to subversion id $REV" Now, to create a patch file do, make sure the files are checked in: $ cd $SRC/$BNAME $ export PATCHFILE="$HOME/patches/mypatch.patch01" $ <make ChangeLog entries in $PATCHFILE> $ svn diff --old $TRUNK --new . -r $REV >> $PATCHFILE $ <delete ChangeLog.power7, REVISION, property changes from $PATCHFILE> $ submit patch To see if there are changes to be merge in: $ svnmerge avail For example on the ibm-gcc-4_5-branch, the following changes are available to be merged in: 164657-166510 when I originally wrote this message on the 9th of November, and Peter has subsequently updated the merge. I put the folliwng in ~/.subversion/config to provide my own diff command: ### Set diff-cmd to the absolute path of your 'diff' program. ### This will override the compile-time default, which is to use ### Subversion's internal diff implementation. diff-cmd = /home/meissner/bin.sh/svndiff Every so often, I find svnmerge misses, for example in deleting directories. It is helpful to do a diff from the mainline every so often to make sure you are not missing newly created files or still are keeping older files or just missed a change. I'll include svndiff for the smarter svndiff command and mrm-changelog.el that looks for the ChangeLog.<name> files I use in different branches. Feel free to contact me to clarify some stuff. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meissner(a)linux.vnet.ibm.com fax +1 (978) 399-6899 (See attached file: svnmerge)(See attached file: svndiff)(See attached file: mrm-changelog.el)

14 years, 7 months

1
0
0 0

Public plan review recording

by Michael Hope

Hi there, There's a recording of this mornings public plan review available on the wiki at: https://wiki.linaro.org/Releases/1105/PublicPlanReview Also included is a copy of the slides and supporting documents. Might be interesting for those who missed it. -- Michael

14 years, 7 months

1
0
0 0

Thumb-2 performance discussion

by Michael Hope

A heads up. I'd like to have a brainstorming session on potential Thumb-2 performance improvements in GCC. Think about what you'd like in such a session, and what preperation should be done, and we can discuss the discussion (heh) on Monday. -- Michael

14 years, 7 months

2
1
0 0

Abstract submissions for QEMU Users Forum (March 18th)

by Christian Robottom Reis

Hi there, I noticed that there's a QEMU users forum at: http://adt.cs.upb.de/quf/ and that the abstract submission phase is still open, and closes November 28th. It would be great to see some participation there and help identify other key people interested in using and improving QEMU. -- Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko

14 years, 7 months

5
6
0 0

[ACTIVITY] Digest for 2010-11-16

by Michael Hope

Zach Welch -- == Last Week == * Continued working on libunwind support. Trying to figure out why my signal frame detection doesn't work as expected. * Kept pace with the ltrace tree, testing recent patches on ARM. == This Week == * Continue to work on libunwind signal frame detection. Julian Brown -- == Linaro GCC == * Looked at issues #663198 (double-precision register expected) -- which was already fixed on the linaro branch, but the bug was reported against a version just prior to that, and #667490 -- which involved a possible problem with the NEON "load 0.0" patch. Experimented for a while with the latter, but could not find anything wrong. Followed up upstream, and requested a stand-alone test case. * Worked on a proper solution to the VMOVN-in-big-endian-mode problem, discovering that several other quadword-register operations were similarly broken in the process. WIP patch sent to linaro-toolchain for discussion, but it needs a little more work before it can be applied. Peter Maydell -- Progress: * qemu: more cleanup of signal handler VFP patchset; I think I just need to add iwmmx support and it's good * qemu: VCVT: found yet another bug, did final patchset cleanup: submitted to upstream list [8 patch series] * qemu: submitted a trivial patch to fix a problem where __get_user/_put_user macros had an unnecessary local var which could clash with a var being used by the macro user * set up a tree on git.linaro.org which we can use for a branch to make pull requests for ARM qemu fixes * did a rough estimate of time to do an Eagle qemu model (6 months + testing/bug fixing time) Issues: * lost some time to a problem where Linux VMs stopped being able to talk to the LDAP server; however I have a workaround and IT are investigating Meetings: * toolchain, toolchain standup, pdsw-tools, PD doughnuts Plans - attend Meego conference in Dublin (Nov 15-18 inc travel) http://conference2010.meego.com/ - start on qemu consolidation by upstreaming various ARMv7 correctness fixes Andrew Stubbs -- == GCC 4.5 == * Continued working on LP:663939. * I still have not worked out how best to fix the constant propagation problem that has been thwarting my optimization patch, however I think I understand it better now. * I have started on adding replicated pattern support to the constant splitting. Initial results were good, but I discovered that I had to rearrange the code somewhat to get the cost estimation and negative/inverted constant support working correctly. So far, I have it successfully using 16-bit replication pattern constants for set/add/subtract operations. Other operations appear broken at the moment, but it's almost certainly just a few tweaks required. * TODO: Add support for 32-bit replicated pattern constants. Adjust some of the other two-instruction constant generation techniques to let them fall through to this new code, where it would be beneficial. * Pushed the latest set of GCC patches into Linaro GCC 4.5. Chung-Lin Tang -- == Linaro GCC == * Linaro #672833, one batch of my backports of Bernd's postreload patches exposed some varargs regressions for x86-64, was reverted by Michael. Tested the compiler and found it was fixed on mainline rev.162384. Backporting this revision plus the postreload patches fixed the regressions; x86-64 bootstrap also verified okay. There is however another PR45027 fix that was needed on trunk, but needs a bit more clarification if needed on a 4.5 compiler. * Linaro #641397, CS issue #6753: bitfield optimization. Patch tested without regressions, posted for CS internal review, should soon push for Linaro merge. * Started looking further at some GCC DF, IRA internals. == This week == * Look at more Linaro issues. * Maybe start looking at some GCC bugzilla PRs. * There is a local ARM technical event in Hsinchu on Thursday, might go and look around. Yao Qi -- == Linaro GCC == * Mainline patch backport to Linaro 4.5. ** Patch "Fix an if statement in arm_rtx_costs_1". Verified on Linaro 4.5. 0.1% smaller on size, and 0.2% faster on speed. Merged to Linaro 4.5 by Andrew S. ** Try Nathan F's ifcvt-cond-move patch on cortex-a8 with -O2/O3. No improvements on speed/size for EEMBC. ** Bernd's ldm/stm patch. Analyze the reason of regression on Linaro 4.5. Found something wrong in IRA rtl dump, and spend sometime on understanding IRA rtl dump log. Thanks to Chung-Lin, I realize that IRA dump is correct, and look back to ARM RTL patterns on ldm/stm. Compared RTL patterns in 4.5 and 4.6, found some difference. Regenerate ldmstm.md for Linaro 4.5 after update arm-ldmstm.ml a little bit. Regressions goes away! No speed improvement, but code is smaller by 0.2% in EEMBC. Still prefer to merge to Linaro 4.5. ocaml is an interesting language, but not easy to learn and read in vim. * Some discussion on Linaro development process. * My regrename improvement patch (re. LP:633243). Communicate with Eric Botcazou back and forth, but current patch is still too target-dependent to him, as a Middle-End maintainer. Still need some improvements. * Build FSF GCC trunk. CLoog requirement in configure is wrong, revert configure to previous version, and then pass the version checking during gcc configure.

14 years, 7 months

1
0
0 0

Activity reports

by Michael Hope

Hi there. Could everyone in the toolchain working group start sending their activity reports to this list please? Put [ACTIVITY] at the start of the subject line so that they can be filtered. Ta, -- Michael

14 years, 7 months

1
0
0 0

Status reports

by Michael Hope

Hi there. Attached are the status reports from the Toolchain WG members for last week. -- Michael Ken Werner -- Hi Michael, * got access to the internal wiki/calendar/email :) * continued to setup the borrowed vexpress board * upgraded to the Linaro 10.11 release * encountered various issues until I found that the /etc/hosts is empty (#674090) * learned that the SD card issue is a known problem (#632798) * the network interface sometimes dies if stressed (Matt was able to reproduce this) * the disabled CONFIG_SWAP is being tracked as #672656 * sometimes the entire system hangs (when under heavy load?) * David noticed that /proc/cpuinfo lacks neon support (but his string benchmark/testcase ran fine) * wondering why the kernel reports only about 800 BogoMIPS while it's around 2k on the panda board * started to work on the atomic memory operations item * identified the relevant GCC patches * still looking for a good way to verify the GCC support * posted a patch on the glibc-ports ml with regard to #643171 David A Gilbert -- I managed to get to try Ken Werner's Versatile Express board with an A9MP tile; the shape of the graphs matches that from the Panda, but the raw performance is down by a factor of about 3 - I'm guessing it's clocked lower for some reason. It confirms however that the Neon behaviour I was seeing with memset is not Panda/OMAP4 specific; no one has replied to my post to linaro-toolchain. It's a difficult situation in that my fastest memset on Beagle is with Neon, and my fastest on v9 is without Neon - what would you select on? I've just finished writing memchr tests and my first crack at a faster version; I realised I could use the same trick that I had used for strlen and it works nicely - it seems to be about 50% faster than the libc version; I've not tested against any other versions yet. Paul Mckenney hasn't replied yet about the OSSC stuff, but apparently he's out travelling and back next week; so I'll catch him then. I tried preloading my faster memset into ghostscript, but found it was blatantly ignoring it - I think the memset is being called from somewhere inside libc; I managed to get xdeb to cross build me a libc but haven't yet got my changes into it. My order for a USB hard drive for my beagle seems to have been delayed by the supplier; I'm pushing this but it's starting to be a bit of a pain. Richard Sandiford -- == Last Week == * Pinged my GAS fix for Thumb PLT branches to locally-defined symbols. Committed it to binutils trunk and 2.21 branch after approval. This fixes the libgcc.so build failure that I was seeing with GOLD. * Worked on a patch to fix GOLD's handling of non-function references to weak undefined symbols. This ended up touching every backend (i386, x86_64, ARM, Power and SPARC) and was quite invasive, so it took a while in the end. Committed to binutils trunk after approval. * Ran more tests, both with -marm and -mthumb. I'm getting identical GCC test results (including gfortran and objc) for GOLD and BFD ld, so I think we're at the stage where GOLD is a viable replacement for the BFD linker. == Next Week == * I'll start looking at the IFUNC support. * I'll take another look at launchpad bug 665598. Peter Maydell -- Progress: * qemu: more cleanup of signal handler VFP patchset; I think I just need to add iwmmx support and it's good * qemu: VCVT: found yet another bug, did final patchset cleanup: submitted to upstream list [8 patch series] * qemu: submitted a trivial patch to fix a problem where __get_user/_put_user macros had an unnecessary local var which could clash with a var being used by the macro user * set up a tree on git.linaro.org which we can use for a branch to make pull requests for ARM qemu fixes * did a rough estimate of time to do an Eagle qemu model (6 months + testing/bug fixing time) Issues: * lost some time to a problem where Linux VMs stopped being able to talk to the LDAP server; however I have a workaround and IT are investigating Meetings: * toolchain, toolchain standup, pdsw-tools, PD doughnuts Plans - attend Meego conference in Dublin (Nov 15-18 inc travel) http://conference2010.meego.com/ - start on qemu consolidation by upstreaming various ARMv7 correctness fixes Ira Rosen -- Here is this week report: 1. BeagleBoard installed, now "playing" with it 2. Continued to work on auto-detection of vector size 3. Looked into mixed vector sizes 4. Learning about vld and vst instructions It looks like I won't be able to participate in Wed calls, since I am alone with the kids on Wednesday evenings.

14 years, 7 months

1
0
0 0

Assembler bug blocking Thumb-2 kernel builds

by Dave Martin

Hi all, I've hit a probable assembler bug trying to build a Thumb-2 kernel: Trying to assemble the attached file, I get: arch/arm/kernel/relocate_kernel.S: Assembler messages: arch/arm/kernel/relocate_kernel.S:10: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) arch/arm/kernel/relocate_kernel.S:11: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) arch/arm/kernel/relocate_kernel.S:58: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) arch/arm/kernel/relocate_kernel.S:59: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) The code appears correct and resonable, except that there should be a .align directive before the data words at the end of the file (but adding this doesn't fix the error) Assembling in ARM (i.e., without -mthumb), or deleting the .globl lines associated with the affected target symbols, the problem goes away. I believe this may be already by tracked by CodeSourcery as is issue #8775 (?) Has anyone hit this issue before? Is it fixed upstream? Any help much appreciated. Cheers ---Dave

14 years, 7 months

3
6
0 0

A9 Neon confusion

by David Gilbert

Hi, I've been looking at some basic libc routine optimisation and have a curious problem with memset and wondered if anyone can offer some insights. Some graphs and links to code are on https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemset I've written a simple memset in both a with and without Neon variety and tested them on a Beagle(C4) and a Panda board and I'm finding that the Neon version is faster than the non-neon version (a bit) on the Beagle but a LOT slower on the Panda - and I'd like to understand why it's slower than the non-neon version - I'm guessing it's some form of cache interaction. The graphs on that page are all generated by timing a loop that repeatedly memsets the same area of memory; the X axis is the size of the memset. Prior to the test loop the area is read into cache (I came to the conclusion the A8 didn't write allocate?). There are two variants of the graphs - absolute in MB/s on Y, and a relative set (below the absolute) that are relative to the performance of the libc routines. (The ones below those pairs are just older versions). if you look at the top left graph on that page you can see that on the Beagle (left) my Neon routine beats my Thumb routine a bit (both beating libc). If you look on the top right you see the Panda performance with my Thumb code being the fastest and generally following libc, but the Neon code (red line) topping out at about 2.5GB/s which is substantially below the peak of the libc and ARM code. The core loop of the Neon code (see the bzr link for the full thing) is: 4: subs r4,r4,#32 vst2.8 {d0,d1,d2,d3}, [ r3:256 ]! bne 4b while the core of the non-Neon version is: 4: subs r4,r4,#16 stmia r3!,{r1,r5,r6,r7} bne 4b I've also tried vst1 and vstm in the neon loop and it still won't match the non-Neon version. All suggestions welcome, plus I'd appreciate if anyone can suggest which particular limit it's hitting - does anyone have figures for the theoretical bus and L1 and L2 write bandwidths for a Panda (and Beagle) ? Thanks in advance, Dave

14 years, 7 months

1
0
0 0

Draft of next weeks public review

by Michael Hope

Hi there. I've uploaded a draft of the slides and notes for next weeks public review at: http://bazaar.launchpad.net/~linaro-toolchain-wg/+junk/publicreview1105/fil… 'Toolchain Public Review 11.05.odp' is a set of slides I'll talk to. The first 15-20 minutes will go through these to describe our focus and goals and how they tie together the blueprints and priorities. The rest of the session will go through the current blueprints and priorities. See: Toolchain Blueprints (short).pdf for the summary version and: Toolchain Blueprints (long).pdf for the long version. The long version is interesting if you can't find a particular tool or technology. It may be small enough to be called out as a single work item. These are only a draft, but I realised I haven't shared the plans with the rest of the group very well and Monday's meeting won't be the best. I'm on holiday tomorrow but feel free to send me any comments, -- Michael

14 years, 7 months

1
0
0 0

Reviewing blueprints for the TSC

by Michael Hope

Hi there. I've been going through the blueprints in preparation for next weeks TSC review. The top level topics are good, and I'd like to have the rest of the engineering blueprints checked over and updated to match what we talked about at the summit. Ira, could you please create blueprints for the areas you plan to look into? Anything that will take longer than a month should have a blueprint. It's worth having a catch-all blueprint for anything left over. Please add these as a dependency to: https://blueprints.launchpad.net/linaro/+spec/tr-toolchain-neon-performance Zach, could you check: https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/ltrace-support https://blueprints.launchpad.net/linaro/+spec/tr-toolchain-openocd Peter, there's a whole range of QEMU ones that could do with a pass over. For more about the review, see: https://wiki.linaro.org/Releases/1105/PublicPlanReview For a list of all of the toolchain blueprints, see: http://ex.seabright.co.nz/helpers/blueprints#toolchain -- Michael

14 years, 7 months

2
4
0 0

Mixed vector sizes

by Ira Rosen

Hi, I started to look into mixed vector sizes (in the same loop). My main reason for this was to allow widening and narrowing instructions, that have different vector sizes for src and dest, to work properly. My example was widen_mult (int = short * short), I thought its implementation was not optimal. But now that I have a working GCC mainline for ARM, I see that it works just fine. short ub[], uc[]; int c[]; for (i = 0; i < n; i++) c[i] = ub[i] * ua[i]; is compiled as: .L11: add r1, r1, #1 vldmia r4!, {d18-d19} cmp r5, r1 vldmia ip!, {d16-d17} vmull.s16 q10, d18, d16 vstr d20, [r3, #-32] vstr d21, [r3, #-24] vmull.s16 q8, d19, d17 vstr d16, [r3, #-16] vstr d17, [r3, #-8] add r3, r3, #32 bhi .L11 which looks good to me at least from the vmull point of view. Does anyone have an example when mixed vector size instructions are not used properly? Another reason for mixed sizes could be cases where only part of the loop can be vectorized with the wider vectors. I don't know how common this is. Are there any other reasons to implement mixed vector sizes? I understand that this can be a useful feature, I am just not sure it's the most important one. Thanks, Ira

14 years, 7 months

1
0
0 0

Backport criteria

by Michael Hope

I've been going through the ChangeLog for the release and am having trouble justifying some of the changes brought in. In particular: * -fstrict-volatile-bitfields, which is more appropriate for bare metal/kernel code * Cortex-M4 support * C locale support in libstdc++-v3 The march/mcpu clean up is OK but marginal. Our focus is time based performance on the Cortex-A series with an implied applications over kernel/bare metal. This is a very narrow view, but every non-performance line of code we bring in can also bring in a bug. Any thoughts? For those who are looking at using our toolchain, is earlier access to other toolchain improvements interesting? -- Michael

14 years, 7 months

4
3
0 0

Upstream GCC feature freeze

by Andrew Stubbs

Hi all, As you may or may not know, upstream GCC has now entered 'stage 3' of it's development cycle. This will last until spring. This means that they are only accepting bug fixes and documentation improvements. New features and any performance improvements must wait until GCC 4.6 branches, prior to release, and GCC 4.7 development opens. During this process, our usual preferred work flow (upstream first) will not work, so we'll have to do something else. Here's my proposal: * Create a new Launchpad branch for GCC 4.6. * Synchronize this branch with upstream regularly * once per week, perhaps. * Try to get upstream approval for all new patches in the usual way * on the understanding that they won't be applied until stage 1 * bug fixes are unaffected and may commit as usual. * Commit all pending patches to our own 4.6 branch * and backport them to our 4.5, branch, of course. * Usual "no test regressions" policy applies to our own patches * but beware regressions from merges from upstream. * we may want to track the clean 4.6 test results for comparison This is little different to what we do with the 4.5 release branch now. Thoughts? Andrew

14 years, 7 months

8
7
0 0

Linaro GCC 4.5 2010-11 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the latest release of Linaro GCC 4.5. Linaro GCC 4.5 is the fourth release in the 4.5 series. Based off the latest GCC 4.5.1+svn164911, it includes many ARM-focused performance improvements and bug fixes. Interesting changes include: * Various NEON related fixes * Performance improvements * A clean up of some of the testsuite test cases * An updated version of the __sync multicore primitives * Improvements in data packing when optimising for size * C locale support in libstdc++-v3 This release adds the new option -fstrict-volatile-bitfields and enables it by default on ARM. See doc/invoke.texi for more information. The source tarball is available from: https://launchpad.net/gcc-linaro/+milestone/4.5-2010.11-0 Downloads are available from the Linaro GCC page on Launchpad: https://launchpad.net/gcc-linaro Note that there were no changes to the 4.4 series. -- Michael

14 years, 7 months

1
0
0 0

Linaro GDB 7.2 2010.11-0 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the release of Linaro GDB 7.2. Linaro GDB 7.2 2010.11-0 is the second release in the 7.2 series. Based off the latest GDB 7.2, it includes a number of ARM-focused bug fixes and enhancements. This release concentrates on the GDB test suite and tidies up a number of failures. The source tarball is available at: https://launchpad.net/gdb-linaro/+milestone/7.2-2010.11-0 More information on Linaro GDB is available at: https://launchpad.net/gdb-linaro -- Michael

14 years, 7 months

1
0
0 0

Auto-detection of vector size for NEON

by Ira Rosen

Hi, It looks like it's enough to implement targetm.vectorize. autovectorize_vector_sizes for NEON in order to enable initial auto-detection of vector size. With the attached patch and -mvectorize-with-neon-quad flag, the vectorizer first tries to vectorize for 128 bit, and if this fails, it tries to vectorize for 64 bit. For example, in the attached testcase number of iterations is too small for 128 bit (first 2 iterations have to be peeled in order to align the array accesses), but is sufficient for 64 bit (the accesses are aligned here). I'd appreciate your comments on the patch, and I also have a few questions: 1. Why the default vector size is 64? 2. Where is the place of NEON vectorization tests? I found NEON tests with intrinsics at gcc.target/arm, is that the right place? 3. According to gcc.dg/vect/vect.exp the only flag that is used for NEON (in addition to target independent flags) is -ffast-math. Is that enough? Thanks, Ira ChangeLog: * config/arm/arm.c (arm_autovectorize_vector_sizes): New function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define. Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 166032) +++ config/arm/arm.c (working copy) @@ -246,6 +246,7 @@ static bool arm_builtin_support_vector_misalignmen const_tree type, int misalignment, bool is_packed); +static unsigned int arm_autovectorize_vector_sizes (void); /* Table of machine attributes. */ @@ -391,6 +392,9 @@ static const struct default_options arm_option_opt #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \ + arm_autovectorize_vector_sizes #undef TARGET_MACHINE_DEPENDENT_REORG #define TARGET_MACHINE_DEPENDENT_REORG arm_reorg @@ -23223,6 +23227,12 @@ arm_expand_sync (enum machine_mode mode, } } +static unsigned int +arm_autovectorize_vector_sizes (void) +{ + return TARGET_NEON_VECTORIZE_QUAD ? 16 | 8 : 0; +} + static bool arm_vector_alignment_reachable (const_tree type, bool is_packed) { test: #define N 5 unsigned int ub[N+2] = {1,1,6,39,12,18,14}; unsigned int uc[N+2] = {2,3,4,11,6,7,1}; void main1 () { int i; unsigned int udiff = 2; unsigned int umax = 10; for (i = 0; i < N; i++) { /* Summation. */ udiff += (ub[i+2] - uc[i]); /* Maximum. */ umax = umax < uc[i+2] ? uc[i+2] : umax; } }

14 years, 7 months

3
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain