linaro-toolchain

linaro-toolchain@lists.linaro.org

7 participants
5664 discussions

Bad code generation due to shrink-wrap optimisation

by Michael Hope

LP: #731665 is a silent bad code generation bug at least on functions which are empty except for inline assembly: https://bugs.launchpad.net/ubuntu/+source/gcc-4.5/+bug/731665 It was introduced in the shrink-wrap patch and is due to using an uninitialised variable. Andrew, can you please address this urgently either in Linaro or CSL. -- Michael

14 years, 9 months

[ACTIVITY] 2011-03-10

by David Gilbert

== hard-float == * Updated libffi variadic patch and Sent updated libffi variadic patch to the ffi mailing list. == String routines == * Got a big endian build environment going * Patched up memchr and strlen for big endian; turned out to be a very small change in the end; and tested it on qemu-armeb - note that an older version it didn't work on, but a newer one it did; I'll assume the newer one is correct. * Fixed a couple of build issues in the cortex strings test harness == Other == * Kicked off a SPEC2006 train run on canis using the 2011.03 compilers I'm on holiday tomorrow (Friday) and Monday. Dave

14 years, 9 months

[ACTIVITY] March 6-10

by Revital1 Eres

Hello, * Sent the patch to support targets that their doloop part is not decoupled from the rest of the loop's instructions (as is the case for ARM) to @gcc-patches: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00350.html * Continue looking into DENbench benchmarks. Thanks, Revital

14 years, 9 months

[ACTIVITY] March 6-10

by Ira Rosen

Hi, * continued working on cost model tuning. I don't see much difference running EEMBC DenBench with and without vectorization enabled (and, therefore, also with and without cost model). Also, I have to say, that the results are not stable and I sometimes get 10% difference just running the same executable two times in a row. * the only benchmark I see consistent degradation 5% with vectorization is DenBench aes, both with GCC trunk and gcc-linaro 4.5. I found one of the responsible loops, if it is not vectorized I see only 1.8% degradation. The problem there is that the loop bound is unknown at compile time, so the vectorizer attempts to vectorize the loop using runtime guards to verify that there are enough iterations to vectorize. The actual number of iterations is 4, so the scalar version of the loop is chosen at the run time, but I guess the guards cause the degradation. I'll continue looking into this next week. * prepared the conditional-store-sink patch (one of the patches that helps to vectorize Telecom Viterbi) for submission to gcc-patches. Ira

14 years, 9 months

Linaro QEMU 2011.03-1 released

by Peter Maydell

The Linaro Toolchain Working Group is pleased to announce the release of Linaro QEMU 2011.03-1. Linaro QEMU 2011.03-1 is the second release of qemu-linaro. Based off upstream (trunk) qemu, it includes a number of ARM-focused bug fixes and enhancements. This release includes a model of the ARM Versatile Express platform. This is still experimental but may be of use to people who want a model supporting up to 1GB of RAM with graphics and networking. Instructions for getting started with it are on the wiki: https://wiki.linaro.org/PeterMaydell/QemuVersatileExpress Other interesting changes include: - The OMAP emulation bug which was causing hangs if Linux tried to enable a swapfile is fixed - The OMAP UART model has been improved; this fixes the problem where kernels using the new omap-hsuart serial drivers stopped serial output halfway through boot. - As usual, various minor correctness fixes and other upstream changes Known issues: - The beagle and beaglexm models do not support USB, so there is no keyboard, mouse or networking (#708703) The only change over the shortlived 2011.03-0 is that the last minute bug #731093 has been fixed (versatilepb models would crash on startup.) The source tarball is available at: https://launchpad.net/qemu-linaro/+milestone/2011.03-1 Binary builds of this qemu-linaro release are being prepared and will be available shortly for users of Ubuntu. When ready, Natty packages of qemu-linaro 2011.03-1 will be in the Ubuntu archive. Packages for users of Ubuntu 10.04 LTS and Ubuntu 10.10 will be in the linaro-maintainers tools ppa: https://launchpad.net/~linaro-maintainers/+archive/tools/ More information on Linaro QEMU is available at: https://launchpad.net/qemu-linaro

14 years, 9 months

Getting linaro toolchain binaries

by Dave Martin

Hi all, I've had comments that getting hold of binaries for the linaro toolchain can be trick for people unfamiliar with the linaro tools. One reason is that we don't release binaries as such -- but a visitor browsing in through http://www.linaro.org/downloads/ won't discover this, and may waste a lot of time trying to understand launchpad etc. before coming to the conclusion that binaries either aren't available or are not easily findable. On the other hand, the cross toolchain packages are likely to be of interest to such visitors, but aren't obviously advertised -- maybe I'm looking in the wrong place, but if so then new visitors to the linaro pages are likely to look in the wrong place too. Would it make sense to explain the situation more prominently so that visitors know what to expect? Something along the lines of "if you use distro x revision y, these cross-compiler packages are available" and "if you need the tools for some other environment, you need to download the source and build it for yourself". Cheers ---Dave

14 years, 9 months

Linaro GDB 7.2 2011-03 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the release of Linaro GDB 7.2. Linaro GDB 7.2 2011.03-0 is the fourth release in the 7.2 series. Based off the latest GDB 7.2, it includes ARM-focused bug fixes and enhancements. Interesting changes include: * Hardware watchpoint support * Backtracing while in the Linux kernel trampoline frame Hardware watchpoints use the support built into ARM devices to watch for changes in values in memory with little performance impact. A 2.6.37 or later kernel is required. The source tarball is available at: https://launchpad.net/gdb-linaro/+milestone/7.2-2011.03-0 More information on Linaro GDB is available at: https://launchpad.net/gdb-linaro -- Michael

14 years, 9 months

[ACTIVITY] 28th February - 5th March

by Andrew Stubbs

Committed Kazu's VFP testcases patch upstream. Merged the latest from upstream GCC 4.6. Merged all the outstanding launchpad merge requests against both GCC 4.5 and 4.6. Spun the 4.5-2011.03-0 and 4.6-2011.03-0 releases. Passed the tarballs to Michael H for final testing. Brought the patch tracker up to date w.r.t. to new merges. Posted one of Dan's patches upstream for review. Decided to drop Julian's A8 alignment patch completely. I had previously discovered it provided no measurable benefit on A8, and now I've found the same for A9 (Pandaboard). There's no real improvement for any combination of -falign-* options in EEMBC. Bernd's "Discourage NEON on A8" patch also doesn't show any value in the benchmark results, but I think I've forward ported it wrong, because it should at least change the binary size, and it doesn't. I need to look into this further. I also decided I don't know enough about ARMv7, so I spent some time reading a few chapters from the ARM A.R.M. ---- Upstream patched requiring review: * Thumb2 constants: http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html * ARM EABI half-precision functions http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html * ARM Thumb2 Spill Likely tweak http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html * NEON scheduling patch http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html * RVCT Interoperability patch http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg00059.html

14 years, 9 months

[ACTIVITY] Feb.28 -- Mar.06

by Chung-Lin Tang

Last week: * Launchpad #711819 / PR47719: ARM minipool ICE. Followed up on discussion with Bernd and Ramana. Later posted discussion results on gcc-patches, where Richard Earnshaw took it over with a final fix. * Coremark ARMv7/v6 regressions: mostly pinpointed the exact cases where RTL simplification fails to optimize away ZERO_EXTEND expressions. Still working on how to enhance it. * TW Public Holiday on Feb.28 (Mon), was off for one day. This week: * Try to turn Coremark regression investigation into code form. * Other GCC issues.

14 years, 9 months

Representing interleaving and lane load/stores at the tree level

by Richard Sandiford

I've been spending this week playing around with various representations of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the best representation would be to use built-in functions. One concern in the original discussion was that the optimisers might move the original MEM_REFs away from the call. I don't think that's a problem though. For loads, we can simply treat the whole of the accessed memory as an array, and pass the array by value. If we do that, then the call would just look like: __builtin_load_lanes (MEM_REF[(elem[N] *)ADDR]) (where, despite the C notation, the MEM_REF accesses the whole of elem[N]). It is of course possible in principle for the tree optimisers to replace this MEM_REF with another, equivalent, one, but that's OK semantically. It isn't possible for the optimisers to replace it with something like an SSA name, because arrays can't be stored in gimple registers. __builtin_load_lanes would then be used like this: combined_vectors = __builtin_load_lanes (...); vector1 = ...extract first vector from combined_vectors... vector2 = ...extract second vector from combined_vectors... .... So combined_vectors only exists for load and extract operations. The question then is: what type should it have? (At this point I'm just talking about types, not modes.) The main possibilities seemed to be: 1. an integer type Pros * Gimple registers can store integers. Cons * As Julian points out, GCC doesn't really support integer types that are wider than 2 HOST_WIDE_INTs. It would be good to remove that restriction, but it might be a lot of work, and it isn't something we'd want to take on as part of this project. * We're not really using the type as an integer. * The combination of the integer type and the __builtin_load_lanes array argument wouldn't be enough to determine the correct load operation. __builtin_load_lanes would need something like a vector count (N => vldN) argument as well. 2. a combined vector type Pros * Gimple registers can store vectors. Cons * For vld3, this would mean creating vector types with non-power- of-two vectors. GCC doesn't support those yet, and you get ICEs as soon as you try to use them. (Remember that this is all about types, not modes.) It _might_ be interesting to implement this support, but as above, it would be a lot of work. It also raises some semantic questions, such as: what is the alignment of the new vectors? Which leads to... * The alignment of the type would be strange. E.g. suppose we're loading N*2 uint32_ts into N vectors of 2 elements each. The types and alignments would be: N=2 uint32x4_t, alignment 16 N=3 uint32x6_t, alignment 8 (if we follow the convention for modes) N=4 uint32x8_t, alignment 32 We don't need alignments greater than 8 in our intended use; 16 and 32 are overkill. * We're not really using the type as a single vector, but as a collection of vectors. * The combination of the vector type and the __builtin_load_lanes array argument wouldn't be enough to determine the correct load operation. __builtin_load_lanes would need something like a vector count (N => vldN) argument as well. 3. an array of vectors type Pros * No support for new GCC features (large integers or non-power-of-two vectors) is needed. * The alignment of the type would be taken from the alignment of the individual vectors, which is correct. * It accurately reflects how the loaded value is going to be used. * The type uniquely identifies the correct load operation, without need for additional arguments. (This is minor.) Cons * Gimple registers can't store array values. So I think the only disadvantage of using an array of vectors is that the result can never be a gimple register. But that isn't much of a disadvantage really; the things we care about are the individual vectors, which can of course be treated as gimple registers. I think our tracking of memory values is good enough for combined_vectors to be treated as such (even though, with the back-end changes we talked about earlier, they will actually be stored in RTL registers). So how about the following functions? (Forgive the pascally syntax.) __builtin_load_lanes (REF : array N*M of X) returns array N of vector M of X maps to vldN in practice, the result would be used in assignments of the form: vectorX = ARRAY_REF <result, X> __builtin_store_lanes (VECTORS : array N of vector M of X) returns array N*M of X maps to vstN in practice, the argument would be populated by assignments of the form: vectorX = ARRAY_REF <result, X> __builtin_load_lane (REF : array N of X, VECTORS : array N of vector M of X, LANE : integer) returns array N of vector M of X maps to vldN_lane __builtin_store_lane (VECTORS : array N of vector M of X, LANE : integer) returns array N of X maps to vstN_lane Note that each operation can be expanded independently. The expansion doesn't rely on preceding or following statements. I've hacked up the prototype below as a proof of concept. It includes changes to the C parser to allow these functions to be created in the original source code. This is throw-away code though; it would never be submitted. I've also included a simple test case and the output I get from it. The output looks pretty good; there's not even the stray VMOV that I saw with the intrinsics earlier in the week. (Note that if you'd like to try this yourself, you'll need the patch I posted on Monday as well.) What do you think? Obviously this discussion needs to move to gcc@ at some point, but I wanted to make sure this was vaguely sane first. Richard

14 years, 9 months

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain