- linaro-toolchain - lists.linaro.org

by Ira Rosen

Hi, I started to look into mixed vector sizes (in the same loop). My main reason for this was to allow widening and narrowing instructions, that have different vector sizes for src and dest, to work properly. My example was widen_mult (int = short * short), I thought its implementation was not optimal. But now that I have a working GCC mainline for ARM, I see that it works just fine. short ub[], uc[]; int c[]; for (i = 0; i < n; i++) c[i] = ub[i] * ua[i]; is compiled as: .L11: add r1, r1, #1 vldmia r4!, {d18-d19} cmp r5, r1 vldmia ip!, {d16-d17} vmull.s16 q10, d18, d16 vstr d20, [r3, #-32] vstr d21, [r3, #-24] vmull.s16 q8, d19, d17 vstr d16, [r3, #-16] vstr d17, [r3, #-8] add r3, r3, #32 bhi .L11 which looks good to me at least from the vmull point of view. Does anyone have an example when mixed vector size instructions are not used properly? Another reason for mixed sizes could be cases where only part of the loop can be vectorized with the wider vectors. I don't know how common this is. Are there any other reasons to implement mixed vector sizes? I understand that this can be a useful feature, I am just not sure it's the most important one. Thanks, Ira

15 years, 6 months

1
0
0 0

Backport criteria

by Michael Hope

I've been going through the ChangeLog for the release and am having trouble justifying some of the changes brought in. In particular: * -fstrict-volatile-bitfields, which is more appropriate for bare metal/kernel code * Cortex-M4 support * C locale support in libstdc++-v3 The march/mcpu clean up is OK but marginal. Our focus is time based performance on the Cortex-A series with an implied applications over kernel/bare metal. This is a very narrow view, but every non-performance line of code we bring in can also bring in a bug. Any thoughts? For those who are looking at using our toolchain, is earlier access to other toolchain improvements interesting? -- Michael

15 years, 6 months

4
3
0 0

Upstream GCC feature freeze

by Andrew Stubbs

Hi all, As you may or may not know, upstream GCC has now entered 'stage 3' of it's development cycle. This will last until spring. This means that they are only accepting bug fixes and documentation improvements. New features and any performance improvements must wait until GCC 4.6 branches, prior to release, and GCC 4.7 development opens. During this process, our usual preferred work flow (upstream first) will not work, so we'll have to do something else. Here's my proposal: * Create a new Launchpad branch for GCC 4.6. * Synchronize this branch with upstream regularly * once per week, perhaps. * Try to get upstream approval for all new patches in the usual way * on the understanding that they won't be applied until stage 1 * bug fixes are unaffected and may commit as usual. * Commit all pending patches to our own 4.6 branch * and backport them to our 4.5, branch, of course. * Usual "no test regressions" policy applies to our own patches * but beware regressions from merges from upstream. * we may want to track the clean 4.6 test results for comparison This is little different to what we do with the 4.5 release branch now. Thoughts? Andrew

15 years, 6 months

8
7
0 0

Linaro GCC 4.5 2010-11 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the latest release of Linaro GCC 4.5. Linaro GCC 4.5 is the fourth release in the 4.5 series. Based off the latest GCC 4.5.1+svn164911, it includes many ARM-focused performance improvements and bug fixes. Interesting changes include: * Various NEON related fixes * Performance improvements * A clean up of some of the testsuite test cases * An updated version of the __sync multicore primitives * Improvements in data packing when optimising for size * C locale support in libstdc++-v3 This release adds the new option -fstrict-volatile-bitfields and enables it by default on ARM. See doc/invoke.texi for more information. The source tarball is available from: https://launchpad.net/gcc-linaro/+milestone/4.5-2010.11-0 Downloads are available from the Linaro GCC page on Launchpad: https://launchpad.net/gcc-linaro Note that there were no changes to the 4.4 series. -- Michael

15 years, 6 months

1
0
0 0

Linaro GDB 7.2 2010.11-0 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the release of Linaro GDB 7.2. Linaro GDB 7.2 2010.11-0 is the second release in the 7.2 series. Based off the latest GDB 7.2, it includes a number of ARM-focused bug fixes and enhancements. This release concentrates on the GDB test suite and tidies up a number of failures. The source tarball is available at: https://launchpad.net/gdb-linaro/+milestone/7.2-2010.11-0 More information on Linaro GDB is available at: https://launchpad.net/gdb-linaro -- Michael

15 years, 6 months

1
0
0 0

Auto-detection of vector size for NEON

by Ira Rosen

Hi, It looks like it's enough to implement targetm.vectorize. autovectorize_vector_sizes for NEON in order to enable initial auto-detection of vector size. With the attached patch and -mvectorize-with-neon-quad flag, the vectorizer first tries to vectorize for 128 bit, and if this fails, it tries to vectorize for 64 bit. For example, in the attached testcase number of iterations is too small for 128 bit (first 2 iterations have to be peeled in order to align the array accesses), but is sufficient for 64 bit (the accesses are aligned here). I'd appreciate your comments on the patch, and I also have a few questions: 1. Why the default vector size is 64? 2. Where is the place of NEON vectorization tests? I found NEON tests with intrinsics at gcc.target/arm, is that the right place? 3. According to gcc.dg/vect/vect.exp the only flag that is used for NEON (in addition to target independent flags) is -ffast-math. Is that enough? Thanks, Ira ChangeLog: * config/arm/arm.c (arm_autovectorize_vector_sizes): New function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define. Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 166032) +++ config/arm/arm.c (working copy) @@ -246,6 +246,7 @@ static bool arm_builtin_support_vector_misalignmen const_tree type, int misalignment, bool is_packed); +static unsigned int arm_autovectorize_vector_sizes (void); /* Table of machine attributes. */ @@ -391,6 +392,9 @@ static const struct default_options arm_option_opt #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \ + arm_autovectorize_vector_sizes #undef TARGET_MACHINE_DEPENDENT_REORG #define TARGET_MACHINE_DEPENDENT_REORG arm_reorg @@ -23223,6 +23227,12 @@ arm_expand_sync (enum machine_mode mode, } } +static unsigned int +arm_autovectorize_vector_sizes (void) +{ + return TARGET_NEON_VECTORIZE_QUAD ? 16 | 8 : 0; +} + static bool arm_vector_alignment_reachable (const_tree type, bool is_packed) { test: #define N 5 unsigned int ub[N+2] = {1,1,6,39,12,18,14}; unsigned int uc[N+2] = {2,3,4,11,6,7,1}; void main1 () { int i; unsigned int udiff = 2; unsigned int umax = 10; for (i = 0; i < N; i++) { /* Summation. */ udiff += (ub[i+2] - uc[i]); /* Maximum. */ umax = umax < uc[i+2] ? uc[i+2] : umax; } }

15 years, 6 months

3
5
0 0

Weekly meeting at 0900 UTC today

by Michael Hope

Hi there. Just a reminder that today's call is the first at the new time of 0900 UTC which is 9am in the UK, 10am in Germany, 11am in Israel, and 5pm in China. I've updated the meetings page at: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings with the new details. -- Michael

15 years, 6 months

1
0
0 0

Help on merging two patches

by Yao Qi

Hi, I am backporint some patches from FSF mainline, which may improve Linaro 4.5 gcc on thumb2 speed. The first one is done by Richard E. "Improve optimization to transform TST into LSLS" http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html After it applied to Linaro 4.5 tree, EEMBC speed number downgrades, while code size is reduced to some extent. The code difference is like this, 6801 ldr r1, [r0, #0] f831 3013 ldrh.w r3, [r1, r3, lsl #1] -f413 6f00 tst.w r3, #2048 ; 0x800 -f43f af41 beq.w cc <t_run_test+0xcc> +0518 lsls r0, r3, #20 +f57f af44 bpl.w cc <t_run_test+0xcc> 4610 mov r0, r2 After reading cortex-a8 TRM, I can't find exact timing cycles of lsls. Under Chung-Lin's help, we feel that lsls should be slower than tst, but don't have any evidence to prove. If any people is familiar with arm microarch, help is welcome. If our assumption is correct, we may can change this patch to an optimization specific to size only. The second patch is Bernd's "Fix an if statement in arm_rtx_costs_1" http://gcc.gnu.org/ml/gcc-patches/2010-07/msg02096.html After this patch applied, EEMBC benchmark number is not changed. Shall we merge this patch to linaro 4.5 tree? I am inclined to merge it, but if you have concerns on this patch, let us discuss here. -- Yao Qi CodeSourcery yao(a)codesourcery.com (650) 331-3385 x739

15 years, 6 months

3
4
0 0

Problems with BeagleBoard installation

by Ira Rosen

Hi, I am trying to install my new BeagleBoard xM. I started with https://wiki.linaro.org/Releases/DailyBuilds, but it failed to write to the SD card giving some errors about QEMU (I can try that again if the error messages are important). Dave Martin also gave me this link https://wiki.linaro.org/Source/ImageInstallation, but it doesn't exist anymore. So, I tried these instructions: https://wiki.linaro.org/Source/VexpressImageInstallation with http://snapshots.linaro.org/10.11-daily/linaro-headless/20101107/0/images/t… http://jameswestby.net:8080/view/Hardware%20Packs/job/linaro-omap3%20hwpack… --dev beagle). The installation of the SD card went ok, but the board doesn't work. During boot I see the printings as in the attached file and it doesn't go any further. I hope someone can help me with that, because otherwise I am at dead end... Thanks, Ira

15 years, 6 months

1
0
0 0

Changing meetings

by Michael Hope

Hi there. I plan to change the Toolchain WG meetings due to daylight savings and to better cover the US. The Monday meeting will be at 0900 UTC which is 9 am in the UK, 10 am in central Europe, and 5 pm in Beijing. The standup calls will be merged into one at 1800 UTC on Wednesday which is 6 pm in the UK, 7 pm in central Europe, and 1 pm on the US East Coast. I don't expect China to call in as it's a quite unreasonable time. I'll update the invites and wiki page to reflect this. We'll start the new times next week, so Monday the 8th will be the first meeting at the new time. -- Michael

15 years, 6 months

3
2
0 0

Links

by Michael Hope

This is more for the people in the sprint... http://ex.seabright.co.nz/helpers/blueprints#toolchain http://ex.seabright.co.nz/helpers/planner -- Michael

15 years, 6 months

1
0
0 0

Report of thumb2 tuning investigation

by Yao Qi

The gaol and plan of investigation has been described in [1] In the plan, this task is divided into three parts, 1) patch backport, 2) regression fix, and 3) exploration and study other ARM compilers. This report follow the same manner. 1. Patch backport. 8 patches are listed in [1]. Backport them to Linaro 4.5 tree will improve speed performance. Action/Recommendation: Backport them if speed improves. These patches are ones that I think they *should* improve speed, but "performance surprise" is not impossible. 2. Regression fix. So far (until r99399), Linaro GCC 4.5 is slower than FSF GCC 4.5.0 on some EEMBC benchmarks. Performance regression is introduced by four commits, r99324,r99330,r99369,r99380, see details in [2]. Action/Recommendation: Figure out why speed regression is introduced, and try to fix it. One cent here is that how to avoid speed regression. I do believe that sometimes regression is unavoidable, but it is better if can track them, and keep them manageable. 3. Exploration and study other ARM compilers. In this part, I don't find any possible thumb-2 specific improvements. However, loop optimization and instruction scheduling should be improved on ARM. (This statement may be true to all ports, or even all compilers) Some tickets are opened for this part, LP:660644 Missed optimization opportunities LP:662692 Inner loop in autcor00 can be optimized better LP:656957 LP:645267 Improve code generation on switch statement LP:663793 Tune Swing Modulo Scheduling or Selective Scheduling for ARM LP:656373 Try -fsched-pressure for ARM I have to admit that instruction scheduling is quite hard, but if we can do something here, that will be great. I've put it in "performance-insdie-gcc" session on UDS. Let us talk about it a little there next week. During this investigation, I also find LTO or "whole-program optimization" is useful to some EEMBC benchmarks. (I didn't run LTO/WPO at all, but I got this when read source of benchmarks) [1] Plan of CS304: Thumb2 tuning investigation. http://lists.linaro.org/pipermail/linaro-toolchain/2010-October/000300.html [2] https://wiki.linaro.org/YaoQi/Sandbox/Thumb2SpeedOptimization -- Yao Qi CodeSourcery yao(a)codesourcery.com (650) 331-3385 x739

15 years, 6 months

1
0
0 0

finding in 4.5.2 20101003 (Linaro) [release 4.5-2010.10-0]

by leonid.moiseichuk＠nokia.com

Hello, Sorry I cannot find a bugzilla. One issue detected during compilation of ffmpeg, it looks like gcc -DHAVE_AV_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I. -I"/home/ldm/misc/tc/suite/cases/ffmpeg/ffmpeg" -O3 -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -std=c99 -fomit-frame-pointer -Wdeclaration-after-statement -Wall -Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wno-pointer-sign -Wcast-qual -Wwrite-strings -Wtype-limits -Wundef -fno-math-errno -fno-signed-zeros -c -o libavcodec/dsputil.o libavcodec/dsputil.c /tmp/ccK6jdSP.s: Assembler messages: /tmp/ccK6jdSP.s:34764: Error: VFP/Neon double precision register expected -- `vmovl.s16 q3,s12' /tmp/ccK6jdSP.s:34781: Error: VFP/Neon double precision register expected -- `vmovl.s16 q2,s8' /tmp/ccK6jdSP.s:34798: Error: VFP/Neon double precision register expected -- `vmovl.s16 q1,s4' /tmp/ccK6jdSP.s:34800: Error: VFP/Neon double precision register expected -- `vmovl.s16 q0,s0' /tmp/ccK6jdSP.s:34820: Error: VFP/Neon double precision register expected -- `vmovl.s16 q8,s8' The sources, assembler and test case are attached. With best wishes, Leonid

15 years, 6 months

3
4
0 0

2010-10-18 Toolchain WG minutes

by Michael Hope

Minutes from the 2010-10-18 call are here: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-10-18 I'm afraid I lost the recording of this call so the minutes are incomplete Minutes from the 2010-10-20 standup call are here: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-10-18 The calls focused on the upcoming summit. -- Michael

15 years, 6 months

1
0
0 0

Linaro@UDS sessions

by Michael Hope

Hi there. I've updated the list of potential Summit sessions based on yesterdays call. Could people please check the Sessions table on https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-10-18 and flesh out the agenda for sessions that have your name against them. The agenda should be five to ten discussion points, preferably of things that are not well understood and could use input from the group. There's a good discussion on what to expect at a Summit here: http://oubiwann.blogspot.com/2010/10/q-the-ubuntu-developer-summits.html You can check the already-approved sessions here: http://summit.ubuntu.com/uds-n/ Feel free to join in to any other sessions you might find interesting. There will be quite a few people with diverse backgrounds there, including ~80 people from Linaro, ~400 from Ubuntu, ~200 from the community, and ~200 remote. The overlap between Toolchain and Ubuntu interests might not be great, so I'll make sure a common work room is available for idle time. -- Michael

15 years, 6 months

1
0
0 0

Flavoured toolchains anyone?

by Marcin Juszkiewicz

Some of Linaro developers works with ARM devices older then ARMv7-a architecture. Other people experiments with hard-float ABI. Each of them has to rebuild toolchain for own use and that means playing with components to have them build properly. But it is no more - I made some patches and armel-cross-toolchain-base since 1.53 version + newer source packages for gcc-4.[45]-armel-cross have support for "debian/flavour" file which allows to set some flags related to toolchain build. So far supported things are: - ARM architecture - float ABI - FPU mode - Thumb mode This feature is not merged into regular Ubuntu packages yet as this is work in progress which needs to be cleaned first. http://people.linaro.org/~hrw/armel-cross-toolchain/ has all source packages needed. Regards, -- JID: hrw(a)jabber.org Website: http://marcin.juszkiewicz.com.pl/ LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

15 years, 6 months

2
1
0 0

Flavoured toolchains anyone?

by Marcin Juszkiewicz

Some of Linaro developers works with ARM devices older then ARMv7-a architecture. Other people experiments with hard-float ABI. Each of them has to rebuild toolchain for own use and that means playing with components to have them build properly. But it is no more - I made some patches and armel-cross-toolchain-base since 1.53 version + newer source packages for gcc-4.[45]-armel-cross have support for "debian/flavour" file which allows to set some flags related to toolchain build. So far supported things are: - ARM architecture - float ABI - FPU mode - Thumb mode This feature is not merged into regular Ubuntu packages yet as this is work in progress which needs to be cleaned first. http://people.linaro.org/~hrw/armel-cross-toolchain/ has all source packages needed. Regards, -- JID: hrw(a)jabber.org Website: http://marcin.juszkiewicz.com.pl/ LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

15 years, 6 months

1
0
0 0

NEON vectorization: use of specialized load/store instructions

by Julian Brown

I meant to send this to the "external" Linaro toolchain mailing list, not the internal CS one. Apologies to those who receive it twice! In a follow-up message, Joseph Myers pointed out a post he'd written previously on the same subject: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html In further followups (at the risk of misrepresenting Joseph & Paul Brook's opinions!), there seemed to be general agreement that a scheme something like that outlined below, with "permuting" loads/stores and some way of handling multiple in-register layouts for vectors seems like it will be a necessary addition to the vectorizer, going forward. Julian Begin forwarded message: Date: Thu, 7 Oct 2010 16:45:17 +0100 From: Julian Brown <julian(a)codesourcery.com> To: Ira Rosen <IRAR(a)il.ibm.com> Cc: Tejas Belagod <Tejas.Belagod(a)arm.com>, Linaro List <gnu-linaro-tools(a)codesourcery.com> Subject: [gnu-linaro-tools] NEON vectorization: use of specialized load/store instructions Hi, We're having some system issues, so I thought I'd take the chance to write down some things I've been thinking about re: utilising the NEON load/store instructions more effectively. I've also attempted to summarize the problems with big-endian mode. All unverified as of yet, so please take with a pinch of salt :-). Comments appreciated. It's been a while since I last thought about some of this stuff... Cheers, Julian Use of specialized load instructions ==================================== To provide good support for NEON's element and structure load/store instructions, GCC lacks support for a couple of key features: 1. A good way of representing a set of two, three or four vector registers (either D- or Q-sized), possibly with non-unit stride. 2. A generalised mapping between memory locations and lane numbers. To start with point 1: currently the element and structure load/store instructions are only supported via intrinsics. These are specified to load and store as if going via an array embedded in a union, i.e.: typedef struct int8x8x2_t { int8x8_t val[2]; } int8x8x2_t; __extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__)) vld2_s8 (const int8_t * __a) { union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv; __rv.__o = __builtin_neon_vld2v8qi ((const __builtin_neon_qi *) __a); return __rv.__i; } Even for a trivial test program, e.g.: #include <arm_neon.h> int foo (int8_t *x) { int8x8x2_t result = vld2_s8 (x); return vget_lane_s8 (result.val[0], 1); } We will generate code like so: sub sp, sp, #32 vld2.8 {d16-d17}, [r0] mov r3, sp vstmia sp, {d16-d17} add ip, sp, #16 ldmia r3, {r0, r1, r2, r3} stmia ip, {r0, r1, r2, r3} fldd d16, [sp, #16] vmov.s8 r0, d16[1] add sp, sp, #32 bx lr I.e., rather than being used directly, the registers loaded by vld2 will always be spilled to the stack then reloaded. This obviously reduces the usefulness of these intrinsics by a large factor. With some planning, it'd be good to find a powerful enough solution to this problem so that the same representation for multiple registers can be used by the autovectorizer as well as the intrinsic-handling code. (One difficulty is that the "foo.val[X]" interface should still be available to user code. There's probably no need for "val" to literally be an array, though other representations would require front-end changes). Assuming it's hard for the register allocator to deal with highly-constrained situations like requiring four consecutive registers, one (ugly) possibility might be to run a pass before register allocation, looking for "big" multi-register vectors and pre-allocating them to hard registers. Even using a fixed allocation of a single set of registers (e.g. make it so that all multi-reg loads/stores larger than a Q register must use d0-d7, or whatever) would probably give better code than what we produce at present, in most cases. Now, point 2. To start with, an aside: AIUI, there is currently an assumption in the vectoriser code that increasing element numbers in vector registers correspond to increasing addresses when those registers are loaded from and stored to memory (as if the vector was a short array, or alternatively as if a union of the vector register and an array of element-types had the same numberings for lanes and array indices corresponding to the same elements). Unfortunately that is only true for NEON in little-endian mode: in big-endian mode, the story is more complicated, for reasons I will try to explain. To remain compliant with the soft-float variant of the ARM EABI, we must pass vector register arguments in ARM registers (or the stack), not vector registers. This means that we must be very careful with the ordering of elements for values passed to functions. Consider the trivial function: int __attribute__((noinline)) qux (int16x8_t x) { x = vaddq_s16 (x, x); return vgetq_lane_s16 (x, 1); } This is compiled by GCC to the following (slightly unimpressively): vmov d18, r1, r0 @ v8hi vmov d19, r3, r2 vmov d20, r1, r0 @ v8hi vmov d21, r3, r2 vadd.i16 q8, q9, q10 vmov.s16 r0, d16[1] bx lr Which may then be called like, e.g.: ldmia sp, {r0-r3} blx qux So: notice that we're careful that when vector values are transferred from NEON registers to core registers, the same result will be transferred to/from memory when we use ldm/stm (core registers) or vldm/vstm (vector registers) -- i.e. we might use "vldm rX, {d18-d19}", storing d18 and d19 in consecutive increasing addresses, or "ldmia rX, {r0-r3}", again with consecutive registers in increasing memory locations, and we get the same outcome. The fact that we can use the multiple-register loads/stores is also important for spilling/reloading between vector and core registers, which inevitably happens occasionally. Notice also that when we call the above function like so: typedef union { int16x8_t quadvec; int16_t half[8]; } u; int foo (int8_t *x) { u bar; int i; for (i = 0; i < 8; i++) bar.half[i] = i; qux (bar.quadvec); } The value returned from "qux" is NOT 2 (1+1), as it would be if we were accessing the value at index 1 in the superimposed array in the union "u". The vgetq_lane_s16 call still interprets the array as if it had been loaded in little-endian element order. But we don't get the result we would have if the vector had been interpreted in purely big-endian order either (i.e. 12, 6+6)! In fact from the perspective of the element numbering used by vgetq_lane_s16, the vector elements we see for each of the (equal) operands of the "vadd" instruction in the qux function are: equiv. core register lane number (at function entry) value ----------- -------------------- ----- [0] high part of r1 3 [1] low part of r1 2 [2] high part of r0 1 [3] low part of r0 0 [4] high part of r3 7 [5] low part of r3 6 [6] high part of r2 5 [7] low part of r2 4 So the value returned will be 2+2, 4. Now, coming back to the vectorizer. Current practice means that increasing element numbers should correspond to increasing memory locations: i.e., that "array ordering" is in effect, just as in the call to vgetq_lane_s16 in the above example. This leads to an anomaly: it means that when the vectorizer asks for a particular element, it will generally get a different one. Most of the time we get away with this, since the vectorizer mostly deals with "opaque" vectors which are operated on element-wise: i.e. we only deal with data at the granularity of whole vectors, so it doesn't matter which order the elements are in. The ARM implementations of reduction operations fortuitously calculate the results across all elements simultaneously, so when one of those elements is extracted, we still get the right answer. One notable exception to this though is the movmisalign<mode> patterns: these are implemented using the vld1 and vst1 instructions, which load elements in "array" order (increasing elements from increasing memory locations), even in big-endian mode. Since vectors loaded using those instructions are "incompatible" with the above scheme, such misaligned accesses are simply disabled in big-endian mode. Of course, generally, sticking with the current non-solution in big-endian mode is not sustainable (and is probably already broken in various cases). So it might be worth thinking about whether supporting big-endian mode properly, as well as handling the more complex load and store element/structure instructions, can be done using some generalised solution. I'm thinking (without having much idea about how feasible such an idea is) of something along the lines of a function (in the mathematical sense) attached to each vector value manipulated by the vectorizer, to map that value's element numberings to and from memory offsets. So then the quad-word vector of 16-bit elements discussed above would look like, in big-endian mode: foo, {6, 4, 2, 0, 14, 12, 10, 8} Whereas in little-endian mode (or in big-endian mode, for vectors loaded using vld1), it would look like: foo, {0, 2, 4, 6, 8, 10, 12, 14} And then, perhaps more interestingly, a vector loaded using e.g. a "multiple 3-element structures" load, vld3.16 {d1, d2, d3}, [rN] Might look like (in either endianness, assuming we can represent a vector of such size in our hypothetical scheme): foo, {0, 6, 12, 18, 2, 8, 14, 20, 4, 10, 16, 22} Though it's not clear that such a scheme would be powerful enough to represent the whole range of element/structure loads/stores available (you'd probably need to be able to specify skipped or don't-care elements to do that, at least).

15 years, 6 months

3
4
0 0

Plan of CS304: Thumb2 tuning investigation

by Yao Qi

First of all, the goal of this work is about investigation on speed improvement on linaro gcc 4.5. Finally, the output/result of this work is to list all possible recommendations/actions to improve speed on linaro 4.5. Comments to this plan are welcome. So far, we can improve speed in three ways, 1. Backport patches from FSF GCC 4.6. Note that we don't want to backport the whole 4.6. 2. Benchmark with FSF GCC 4.5.0. Fix performance regressions if there are on linaro gcc 4.5. Output is the reason of performance regression, or even further, give recommendations on how to fix it. 3. Study the code generated by other ARM compilers, and give recommendations on how to improve GCC to do better job. I'll describe these three ways in details in the following sections, - Backport patches from FSF GCC 4.6 I went through gcc-patches archive, and select several patches that are helpful to code improvements. 1 ifcvt optimization. Target independent. http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00832.html 2 redundant register move for sign extending. Thumb2. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43137 3. PR 45335 Use ldrd and strd to access two consecutive words. Not yet approved. http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00059.html 4. Fix an if statement in arm_rtx_costs_1. http://gcc.gnu.org/ml/gcc-patches/2010-07/msg02096.html 5. Reduce code duplication for Thumb2 move patterns http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00624.html 6. ARM ldm/stm peepholes http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00512.html 7. PR44999 Replace "and r0, r0, #255" with uxtb in thumb2 http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01700.html 8. Improve optimization to transform TST into LSLS http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html 9. Fix bswap patterns for ARM / Thumb and Thumb2. http://gcc.gnu.org/ml/gcc-patches/2010-01/msg01238.html - Fix speed regression I found speed regression on EEMBC on linaro 4.5, compared with FSF GCC 4.5.0, and I'll investigate why speed regression happens on these cases. Here is a table below about speed regression compared between FSF GCC 4.5.0 and Linaro GCC 4.5 (revno:99398) O2 O3 puwmod01, -5.5 -3.5 bitmnp01, -7.9 -0.7 routelookup, -6.4 -8.2 conven00data_1, -7.2 -5.8 conven00data_2, -8.1 -7.3 conven00data_3, -6.6 -5.5 viterb00data_1, -1.7 +5.9 viterb00data_2, -4.3 +2.6 viterb00data_3, -2.3 +1.8 viterb00data_4, -5.3 -0.3 - Study the code generated by other ARM compilers. In this part, I'll study the binary generated by other ARM compilers, and try to teach GCC smart enough to do the same thing. This piece of work is quite open, and hard to estimate how much output we could get. -- Yao Qi CodeSourcery yao(a)codesourcery.com (650) 331-3385 x739

15 years, 6 months

4
6
0 0

Gcc test suite failures

by Nicolas Pitre

People here might want to have a look at this bug: http://gcc.gnu.org/bugzilla/show_bg.cgi?id=45979 Note that the heap randomization feature added to the kernel was part of a Linaro security blueprint. Nicolas

15 years, 7 months

3
2
0 0

Linaro Toolchain 2010.10 release

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the 2010.10 consolidation release including Linaro GCC 4.4, Linaro GCC 4.5, and the first version of Linaro GDB 7.2. Linaro GDB 7.2 2010.10-0 is the first release in the 7.2 series. Based off the latest GDB 7.2, it includes a number of ARM-focused bug fixes and enhancements. Interesting changes include: * Backtraces in Thumb-2 code are significantly improved * Much better prologue and epilogue parsing * Improved software watchpoint support * Many test suite tidy-ups Linaro GCC 4.5 is the third release in the 4.5 series. Based off the latest GCC 4.5.1+svn, it includes many ARM-focused performance improvements and bug fixes. Linaro GCC 4.4 is the fourth release in the 4.4 series. Based off the latest GCC 4.4.5, it fixes many of the issues found during building Ubuntu over the last few months. Interesting changes include: * Linaro GCC 4.4 is now based off FSF GCC 4.4.5 * Cortex A8 and Cortex A9 scheduler NEON improvements * Better code generation for constant addresses with inline assembly * Better code for copying small constant strings * Various correctness improvements Downloads are available from the Linaro GCC and GDB pages on Launchpad: https://launchpad.net/gcc-linaro https://launchpad.net/gdb-linaro -- Michael

15 years, 7 months

1
0
0 0

DCC device driver and gdb server

by Øyvind Harboe

Hi all, I was wondering someone knows about a ARM DCC (debug communications channel) device driver. The idea is to run gdbserver on /dev/dcc such that application debugging does not hog a serial/ethernet port. I'd modify OpenOCD to forward the DCC onto a TCP/IP port to connect GDB to the gdbserver. -- Øyvind Harboe US toll free 1-866-980-3434 / International +47 51 63 25 00 http://www.zylin.com/zy1000.html ARM7 ARM9 ARM11 XScale Cortex JTAG debugger and flash programmer

15 years, 7 months

3
5
0 0

New Linaro Toolchain meeting covering American timezones

by Michael Hope

(cc'ed to linaro-toolchain, bcc'ed to others who may be interested) I'm considering adding a new Linaro Toolchain meeting to cover people in the North/South American timezones. We've got quite a few people in that area who are interested in the toolchain but can't make the current 0900 UTC calls. How about a weekly half-hour call on Wednesdays at 1800 UTC? Once daylight savings drops out on the 7th of November, this would be 1000 Sacramento/PST, 1200 Houston/CST, and a reasonable evening time for those in Europe who wish to join. This will be a technical call and can cover topics such as status updates, release plans, reported problems, and any input from toolchain users. Please send me an email if you are interested, -- Michael

15 years, 7 months

3
3
0 0

System directories when cross-compiling

by Michael Hope

Hi Marcin. Would you consider passing --enable-poison-system-directories to the cross compiler configure? This makes the '-Wpoison-system-directories' option available which warns you if the cross compiler picks up a library or header file from /usr instead of the cross-build environment. I'm talking with someone who's looking at using the Linaro compiler and had a strange error due to picking up the host crtn.o. Having this warning would of tracked down the problem faster. -- Michael

15 years, 7 months

4
4
0 0

the linaro toolchain and older arm versions

by John Rigby

I believe that the libgcc.a in our toolchain contains Thumb-2 code. I verified this by doing objdump on libgcc.a and I see combinations of 16 and 32 bit instructions. So does that mean that the toolchain is only usable for ARM versions that support Thumb-2? Thanks, John

15 years, 7 months

13
21
0 0

CS MinGW patches

by Andrew Stubbs

As discussed in the meeting yesterday, CodeSourcery has a few MinGW patches that I had not merged into Linaro GCC. I have now investigated these patches, and I'm fairly happy that most are not necessary for Linaro. They're mainly about interworking with Cygwin. The one exception is this one: http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01214.html (and even that is primarily a GDB issue). Andrew

15 years, 7 months

3
3
0 0

2010-10-04 Toolchain WG minutes

by Michael Hope

...are available here: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-10-04 A recording of the call is available here: http://tc.seabright.co.nz/toolchainwg/ Interesting topics include a discussion on the copyright and license issues on string routines, the upcoming 2010.10 release, the lifecycle of Linaro GCC 4.4, potential Windows builds, and the blueprints for next cycle. -- Michael

15 years, 7 months

5
8
0 0

ltrace + thumb2

by Zach Welch

I made a patch for ltrace that adds support for Thumb-2. There's not much to it, but it allows me to trace applications built for Cortex-A8. Without it, users will experience this bug: https://bugs.launchpad.net/ubuntu/+source/ltrace/+bug/639796 Unfortunately, it appears that the upstream tree is not well-maintained. I posted it to the mailing list for the project, but others' patches have been ignored for many months. However, my post precipitated another contributor to offer to maintain the package. I also posted this patch as the proposed solution for the above LP bug, which should allow Linaro to benefit from the work without worrying about upstream. In fact, a new version of the package appears to have been released that includes my patch (0.5.3-2ubuntu6). Please give this updated package a whirl and let me know if there is more work to be done. Thoughts? Unless I hear feedback from others, I will assume that this tool now works for Cortex-A[89] and move on to other tasks. -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

15 years, 7 months

4
9
0 0

People wanting Cortex-A9 boards

by Michael Hope

(this is for current Toolchain WG members. Sorry if I got anyone else's hopes up) We'll soon be coming into some decent dual-core Cortex-A9 boards that have 1 GB of RAM and a good set of USB ports. I've asked for four of them with hard drives to go into the data centre for general use. Would anyone also like one for their desk? Note that you're generally better off using a data centre board as it's one less thing to maintain. -- Michael

15 years, 7 months

1
0
0 0

Armel cross compilers for lucid

by Marcin Juszkiewicz

Hi I finally built armel cross compiler packages for Ubuntu 10.04 'Lucid' LTS. They are available in unsigned APT repository: deb http://people.canonical.com/~hrw/ubuntu-lucid-armel-cross-compilers/ ./ They are built from Maverick packages: - binutils-source - eglibc-source - gcc-4.4-source - gcc-4.5-source - linux-source-2.6.35 - armel-cross-toolchain-base - gcc-4.4-armel-cross - gcc-4.5-armel-cross So they do not give exactly same versions as compilers used in 10.04 - please remember about it while doing cross builds. Regards, -- JID: hrw(a)jabber.org Website: http://marcin.juszkiewicz.com.pl/ LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

15 years, 7 months

3
3
0 0

ffmpeg regression

by Michael Hope

Hi Julian. For your reference, the performance of ffmpeg on a certain h.264 video decode has regressed since 4.4.0. See: http://ex.seabright.co.nz/helpers/benchcompare for the relative numbers. We're significantly better than 4.4.0 in the two other test videos, and better than 4.5.1 in all. The video itself is here: http://ex.seabright.co.nz/misc/cbuild/live/cbuild/files/ffmpeg/03-src13_hrc… ffmpeg is configured with -mfpu=vfp3 with the hand-written assembler backend turned off. The vfp3 is an accident - I forgot to explicitly set it to neon. You can see how to ffmpeg was configured here: http://ex.seabright.co.nz/misc/cbuild/live/cbuild/lib/ffmpegbench.mk The particular version is available here: http://ex.seabright.co.nz/misc/cbuild/live/cbuild/files/ffmpeg-0.6.tar.xz If you have time, please have a look at it. If not file a ticket on Launchpad and we'll look at it as part of the speed improvements. -- Michael

15 years, 7 months

2
2
0 0

Host strip corrupts cross-built binaries

by Loïc Minier

Hi folks apparently some tool calls "strip" instead of "$triplet-strip" when cross-building; this is something we shall fix, but it is apparently corrupting the binaries in some cases: https://bugs.launchpad.net/ubuntu/+source/binutils/+bug/615765 It seems the ELF architecture isn't set properly, or so I'm told. Which component is to blame here? Are we looking at a binutils or a gcc bug for not being able to set or read enough data that the architecture mismatch isn't detected? What could we do about it? Thanks! -- Loïc Minier

15 years, 7 months

5
10
0 0

2010-09-27 Toolchain WG Minutes

by Michael Hope

The minutes from the (short) 2010-09-27 Toolchain WG meeting are available here: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2010-09-27 A recording of the call is available here: http://tc.seabright.co.nz/toolchainwg/2010/09/2010-09-27-weekly-call/ Fairly quiet this week due to being in the middle of things. Discussions include the Toolcahin WG Versatile Express board status, upstream patch status, how to divide future work up with ARM, LTO, and benchmarks. -- Michael

15 years, 7 months

1
0
0 0

Linaro Valgrind snapshot 20100927 released

by Peter Maydell

The Linaro Toolchain Working Group is pleased to announce the availability of a "developer preview" of Valgrind which includes the support for ARM and Thumb which has recently been added by the Valgrind developers. Our aim with this preview release is to advertise Valgrind's improved ARM support and encourage people to try it out and find bugs before the official 3.6.0 release. Please report bugs via upstream's BTS: http://valgrind.org/support/bug_reports.html or you can ask on linaro-toolchain(a)lists.linaro.org if you have any problems. This release is a snapshot of upstream subversion; it should generally work but you may encounter bugs, especially if you run it on hand-optimised assembly that uses obscure instructions. New (upstream) features in this snapshot include: * Greatly improved support for ARM * Support for the Thumb instruction set * Support for NEON and VFPv3 instructions Known issues: * callgrind has difficulty identifying ARM function call and return so may not produce useful results Downloads are available from the Linaro Overlay PPA: https://launchpad.net/~linaro-maintainers/+archive/overlay ...so if you're running Linaro on an ARM system you should be able to just install it with 'apt-get install valgrind'. -- Peter Maydell

15 years, 7 months

1
0
0 0

Rough OpenOCD Plan

by Zach Welch

To All Ye Linaro Toolchain Folk, (and OpenOCD developers too) After a week of reading specifications and code, I am ready to start doing some serious hacking on OpenOCD. The following outlines my present plans and expectations, with the caveat that time can change everything. Last week, I started testing my BeagleBoard with OpenOCD, so I have begun trying to validate and improve the Cortex-A8 support. Indeed, I have already committed a minor patch that fixed a bug in the trunk caused by new command syntax required to distinguish physical memory addresses from virtual ones. That bug had been preventing the BeagleBoard support from working for several months, so this seems to show that nobody has been using (or even testing) the latest code with that board. It seems that much of the debug architecture can be shared between these two cores, so features added and bugs fixed for A8 should help me implement A9 faster. Indeed, A9 support may be more a matter of refactoring the existing code than developing new code. In this respect, the lists of tasks for A8 and A9 may end up proceeding in parallel. Cortex-A8: 1) Add missing topology detection for determining location of AHB-AP (for system memory access), APB-AP (for DAP and other CoreSight components), and register address range for accessing the DAP. 2) Fix Halt After Reset functionality (using vector catch magic). 3) Expose missing VFP3/NEON registers (only when present). 4) Fix various memory and resource leaks. Cortex-A9: 1) Basic bring-up to successful attachment with debugger. 2) Develop board scripts for common evaluation boards. 3) Work on advanced features: - download and run algorithms out of memory, - breakpoints/watchpoints, - tracing and performance monitoring, 4) Ensure SMP support works out-of-the-box. Finally, it would be good to produce a new release when all of these changes have made it into the tree. Due to various factors, the project has not achieved a regular release schedule, but these features would help to justify the effort from the community. P.S. I have cc'd the openocd-development list in the hope of generating useful feedback, but it requires subscribing to post (last I checked). Sorry for the bad netiquette. -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

15 years, 7 months

2
3
0 0

Linaro GDB release process

by Ulrich Weigand

Hello, I've now checked the Linaro branding changes in to the gdb-linaro Bazaar repository. I've created a Wiki page describing the Linaro GDB release process based on that repository: http://wiki.linaro.org/WorkingGroups/ToolChain/GDBReleaseProcess (modeled after Andrew's GCCReleaseProcess page) Review and comments are welcome! Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

15 years, 7 months

4
6
0 0

NEON vectorization improvements - preliminary notes

by Julian Brown

Hi, In case this is useful in its current (unfinished!) form: here are some notes I made whilst looking at a couple of the items listed for CS308 here: https://wiki.linaro.org/Internal/Contractors/CodeSourcery Namely: * automatic vector size selection (it's currently selected by command line switch) * also consider ARMv6 SIMD vectors (see CS309) * mixed size vectors (using to most appropriate size in each case) * ensure that all gcc vectorizer pattern names are implemented in the machine description (those that can be). I've not even started on looking at: * loops with more than two basic blocks (caused by if statements (anything else?)) * use of specialized load instructions * Conversly, perhaps identify NEON capabilities not covered by GCC patterns, and add them to gcc (e.g. vld2/vld3/vld4 insns) * any other missed opportunities (identify common idioms and teach the compiler to deal with them) I'm not likely to have time to restart work on the vectorization study for at least a couple of days, because of other CodeSourcery work. But perhaps the attached will still be useful in the meantime. Do you (Ira) have access to the ARM ISA docs detailing the NEON instructions? Cheers, Julian

15 years, 7 months

3
4
0 0

arm thumb veneer question

by John Rigby

While trying out the u-boot-next branch I found a problem. First some explanation. On most platforms, u-boot is linked to the address it will first start running. For example when using NOR flash U-Boot will be linked to an address in flash. Very early in the boot process, U-Boot copies itself to the top and ram and jumps there. This relocation has worked for years on powerpc and other arches. The -next tree adds this for arm and it almost works. The part that does not work is that some veneer routines do not get fixed up. Here is an example. A routine called i2c_init calls __aeabi_idiv. Here is the disassembly: ... 288: e59f0148 ldr r0, [pc, #328] ; 3d8 <i2c_init+0x1a4> 28c: e1a01083 lsl r1, r3, #1 290: ebfffffe bl 0 <__aeabi_idiv> 294: e2507006 subs r7, r0, #6 298: 4a000001 bmi 2a4 <i2c_init+0x70> Later after this .o is linked with everything else and libgcc that morphs to: 8000b384: e59f0148 ldr r0, [pc, #328] ; 8000b4d4 <_end+0xfff97c98> 8000b388: e1a01083 lsl r1, r3, #1 8000b38c: eb00aa43 bl 80035ca0 <____aeabi_idiv_veneer> 8000b390: e2507006 subs r7, r0, #6 8000b394: 4a000001 bmi 8000b3a0 <i2c_init+0x70> and the veneer version is at the end of text with other veneers: 80035ca0 <____aeabi_idiv_veneer>: 80035ca0: e51ff004 ldr pc, [pc, #-4] ; 80035ca4 <_end+0xfffc2468> 80035ca4: 80035999 .word 0x80035999 80035ca8 <____aeabi_llsl_veneer>: 80035ca8: e51ff004 ldr pc, [pc, #-4] ; 80035cac <_end+0xfffc2470> 80035cac: 80035c7d .word 0x80035c7d 80035cb0 <____aeabi_lasr_veneer>: 80035cb0: e51ff004 ldr pc, [pc, #-4] ; 80035cb4 <_end+0xfffc2478> 80035cb4: 80035c61 .word 0x80035c61 80035cb8 <____aeabi_llsr_veneer>: 80035cb8: e51ff004 ldr pc, [pc, #-4] ; 80035cbc <_end+0xfffc2480> 80035cbc: 80035c49 .word 0x80035c49 80035cc0 <____aeabi_uidivmod_veneer>: 80035cc0: e51ff004 ldr pc, [pc, #-4] ; 80035cc4 <_end+0xfffc2488> 80035cc4: 8003597d .word 0x8003597d 80035cc8 <____aeabi_uidiv_veneer>: 80035cc8: e51ff004 ldr pc, [pc, #-4] ; 80035ccc <_end+0xfffc2490> 80035ccc: 80035721 .word 0x80035721 80035cd0 <____aeabi_idivmod_veneer>: 80035cd0: e51ff004 ldr pc, [pc, #-4] ; 80035cd4 <_end+0xfffc2498> 80035cd4: 80035c2d .word 0x80035c2d then if we look at 80035998 we see some thumb code. 80035998 <__aeabi_idiv>: 80035998: 2900 cmp r1, #0 8003599a: f000 813e beq.w 80035c1a <.divsi3_nodiv0+0x27c> When u-boot copies itself to ram it relocates the jump tables it knows about and could relocate the addresses in the veneer routines if it knew about them. There are at least three possible ways to fix these: 1) u-boot has its own private libgcc and if I use it the problem goes away. 2) is there an option for the toolchain to use an arm libgcc instead of thumb? 3) is there a way to find the veneers at runtime and fix them up? All input welcome. Thanks, John

15 years, 7 months

5
15
0 0

Branding for Linaro GDB package

by Ulrich Weigand

Hello Michael, I'm looking into "branding" changes needed for a Linaro GDB release. So far I've made the following changes: - Set default PKGVERSION to "Linaro GDB" instead of "GDB" - Set default BUGURL to "http://bugs.launchpad.net/gdb-linaro/" instead of "http://www.gnu.org/software/gdb/bugs/" - Set version number according to Linaro version scheme - Update release script to generate tarballs/directories named "gdb-linaro-$VERSION" instead of "gdb-$VERSION". As a result, the default GDB startup output now reads: GNU gdb (Linaro GDB) 7.2-2010.10-0 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". For bug reporting instructions, please see: <http://bugs.launchpad.net/gdb-linaro/>. Do you agree that this is the way we should go? Have I overlooked anything? Unless there are objections, I'm planning to check these changes in later this week. As a related question, the generated files in a standard GDB 7.2 release seem to have been built on a relatively old system (RHEL 4 ?), which is visible through the versions of tools like bison, flex, texinfo, and gettext used to build those files. When building our Linaro GDB release tarballs, should we: - just use the tools as installed on a recent build system (say, Ubuntu Lucid), or - attempt to rebuild the release with the exact same set of tools used for the GDB 7.2 release? The second option has the advantage of reducing the amount of changes, e.g. visible in a full diff of the release tarballs. However, it has the disadvantage that reconstructing those exact set of tools (including Red Hat patches, it seems) is somewhat difficult, and can in addition lead to somewhat outdated results ... Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

15 years, 7 months

4
6
0 0

OpenOCD

by Zach Welch

Hi all, I was recently hired by CodeSourcery and have been assigned to Linaro for the purpose of improving OpenOCD. Specifically, I will be adding new support for Cortex-A9 SMP, though I may also make a few improvements to its handling of Cortex-A8 in the process. If you have experience using OpenOCD in these contexts, let me know if you have any specific requests for features or fixes, and I will try to fold them into my plans. After this cross-posted introduction, I believe that most of my correspondence will appear on the Toolchain mailing list, but I wanted to make sure that everyone knows that they can find me there. Cheers, -- Zach Welch CodeSourcery zwelch(a)codesourcery.com (650) 331-3385 x743

15 years, 7 months

5
5
0 0

Linaro GCC 4.4 and 4.5 2010.09 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the release of both Linaro GCC 4.4 and Linaro GCC 4.5. Linaro GCC 4.4 is the third release in the 4.4 series. Based off the latest GCC 4.4.4, it pulls in the pre-4.4.5 changes made by the FSF over the last six months. Linaro GCC 4.5 is the second release in the 4.5 series. Based off the latest GCC 4.5.1, it finishes the merge of many ARM-focused performance improvements and bug fixes. Interesting changes include: * Improved performance on the Cortex-A9 * Backports of a range of performance improvements from mainline * New inline versions of the GCC builtin sync primitives Downloads are available from the Linaro GCC page on Launchpad: https://launchpad.net/gcc-linaro Also available is an early release of optimised string routines for the Cortex-A series, including a mix of NEON and Thumb-2 versions of memcpy(), memset(), strcpy(), strcmp(), and strlen(). For more information see: https://launchpad.net/cortex-strings Pre-build packages are available in the Linaro Toolchain PPA at: https://launchpad.net/~linaro-toolchain-dev/+archive/ppa -- Michael

15 years, 7 months

5
11
0 0

Thumb2 code size improvements

by Yao Qi

Hi, We are looking for some possible improvements and optimizations on thumb2 code size. Currently, I am running some benchmarks with compilation flag "-Os -march=armv7-a -mthumb", and hope to find some thing interesting that we can improve. Beside that, do you have some ideas on this topic? or do you have some observations on thumb2 code that we may probably improve the size? Any thoughts on this are appreciated. Yao

15 years, 7 months

11
30
0 0

State of cross compiler packages

by Marcin Juszkiewicz

I think that it is easier to describe situation in email then on irc. Currently there are 4 packages related to cross compilation support: - armel-cross-toolchain-base (a-c-t-base in short) - gcc-4.4-armel-cross - gcc-4.5-armel-cross - gcc-defaults-armel-cross Each of them got into archive but they need to be updated to get installable packages. Status of each package: 1. a-c-t-base is at 1.47 in archive and was built from gcc-4.5-source 4.5.1-6ubuntu1 version. This package is used to bootstrap armel cross toolchain and generates: - binutils-arm-linux-gnueabi (from binutils-source) - libc6(-dev,-dbg)-armel-cross (from eglibc-source) - linux-libc-dev-armel-cross (from linux-source-2.6.35) - gcc-4.5-arm-linux-gnueabi-base, libgcc1(-dbg)-armel-cross (from gcc-4.5-source) libgcc1* packages have /usr/share/doc/ directories as symlinks to /usr/share/doc/gcc-4.5-arm-linux-gnueabi-base/ I have a version which does not provide gcc-4.5-arm-linux-gnueabi-base package, libgcc(-dbg)-armel-cross depends on gcc-4.5-base and have /usr/share/doc/ directories pointing into gcc-4.5-base one. Need to fix this symlink by providing those files in libgcc1 package instead. 2. gcc-4.4-armel-cross is at 1.36 in archive and was built with gcc-4.4-source 4.4.4-14ubuntu4 version. This package provides compilers, libstc++6-4.4-(dev,dbg,pic)-armel-cross, libmudflap0-4.4-dev-armel-cross and gcc-4.4-arm-linux-gnueabi-base packages. I have 1.38 version ready to upload which fixes #637454 #640298 bugs. 3. gcc-4.5-armel-cross is at 1.35 in archive and was built with gcc-4.5-source 4.5.1-7ubuntu1 version. This package provides compilers and runtime libraries. But it does not provide libgcc1(-dbg)-armel-cross and gcc-4.5-arm-linux-gnueabi-base because they are in a-c-t-base source package. All resulting packages have /usr/share/doc/ directories pointing into gcc-4.5-arm-linux-gnueabi-base one which is policy violation. I have 1.37 version ready to upload which fixes #637454 #640298 bugs and provides gcc-4.5-arm-linux-gnueabi-base package so policy violation is removed. 4. gcc-defaults-armel-cross is at 1.3 in archive and does not require any changes. Main problem is that packages generated from gcc-4.5-source are split into two packages: armel-cross-toolchain-base (libgcc1(-dbg)-armel-cross) and gcc-4.5-armel-cross (all the rest). This was required to allow to bootstrap cross compiler but gives problems when one is built with other version of gcc-4.5-source then other - resulting packages are not installable (we have it now in archive). It is also a thing which Matthias does not like and I understand it. For now my only solution is to build both with one version of gcc-4.5-source. What are your opinions? http://marcin.juszkiewicz.com.pl/download/ubuntu/ is download link for mentioned versions. Regards, -- JID: hrw(a)jabber.org Website: http://marcin.juszkiewicz.com.pl/ LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

15 years, 7 months

2
2
0 0

Re: GCC PRE patch; apply or not?

by Steven Bosscher

xf. http://lists.linaro.org/pipermail/linaro-toolchain/2010-August/000069.html > It is not upstreamable due to copyright issues, but we have a policy > that we can keep such patches, if we wish. I wrote this patch. If I am the copyright issue, then there is no issue. I have a copyright assignment for all my GCC work to the FSF. That assignment also covers the patch in the e-mail stored at http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00199.html. I consider copyright to all my patches assigned to the FSF if I have submitted the patches to gcc-patches(a)gcc.gnu.org, or attached them to a Problem Report in GCC bugzilla, or both. The only reason why this patch for GIMPLE PRE is not in the FSF GCC already, is that I just never cared enough to pursue it. GCC is just a hobby for me, and experimenting with ideas is fun. Doing all the required testing for inclusion in the FSF GCC is not fun and it costs time that I usually can't find. I am just too busy with other things to clear off this and other pending patches/ideas from my TODO list :-) If you wish to submit this patch for the FSF GCC, please feel free to do so. In fact, I'd encourage you to do so. Likewise for my patch for e.g. http://gcc.gnu.org/PR20070, and for the GIMPLE hoisting pass. Ciao! Steven

15 years, 7 months

1
0
0 0

binutils testsuite Thumb-2 coverage

by Chung-Lin Tang

Hi, about the status of binutils testsuite Thumb coverage (CS204 in the workplan), I have filed two Launchpad bugs: #640263: Testsuite coverage: Thumb-2 VFP/NEON encodings https://bugs.launchpad.net/binutils-linaro/+bug/640263 #640272: Testsuite coverage: Thumb relocations https://bugs.launchpad.net/binutils-linaro/+bug/640272 To summarize: I currently do not see any testing of Thumb-2 VFP/NEON encodings; Thumb mode relocations are also only barely tested in the ld testsuite. Also, please inform if there are any other areas of binutils Thumb testing that may be of concern to Linaro. Thanks, Chung-Lin

15 years, 7 months

2
1
0 0

Thumb2 size optimization report

by Yao Qi

* Goal Goal of this work is to look for thumb2 code size improvements on FSF GCC trunk. * Methodology ** Build FSF GCC trunk w/ and wo/ hardfp, run benchmarks including eembc, spec2000, and dhrystone, and check asm code to see if there is any possible improvements on size. ** Get input and suggestion from ARM experts. ** Search open PRs in GCC bugzilla. * Results Each item has been tracked on launchpad, and is listed with some elements, ** Cause: cause of this problem is known or unknown ** Difficulty: estimation of implementation difficulty ** Recommendation: Yao's recommendation on that bug for next step 1. LP:633233 Push/pop low register rather than high register when keeping stack alignment As Richard E. pointed out, it was implemented in gcc-4.5 on 2009, but Yao still can see the usage of r8 on FSF GCC trunk. Cause: Might be a regression if problem disappears on gcc-4.5. Difficulty: Easy. might not hard to fix a regression. Recommendations: Fix this regression if it is. 2. LP:633243 Improve regrename to make use of low registers. Get input from Bernd S. and Julian B. Initial implementation has been suggested by Bernd S. Cause: current regrename in gcc treats high and low registers equally. Difficulty: Medium. Recommendation: Implement it as Bernd suggested, and do benchmarking to see how much size is improved. 3. LP:634682 Redundant uxth/sxth insn are generated Cause: Unknown Difficulty: Unknown Recommendation: No recommendation so far. 4. LP:634696 Function is not inlined properly with -Os In consumer/cjpeg/jmemmgr.c, GCC inlined out_of_memory() with -Os, so increase code size. Cause: Unknown. Difficulty: Unknown Recommendation: Educate GCC to inline carefully when -Os is turned on. 5. GCC PR40730 LP:634731 Redundant memory load 6. LP:634738 inefficient code to extract least bits from an integer value GCC PR40697 is for thumb-1. The same problem is in thumb-2. Cause: Unknown. Difficulty: Medium. Recommendation: Fix it the similar way as fixing GCC PR40697. 7. LP:634891 Replace load/store by memcpy more aggressively Difficulty: Should be easy. Recommendation: Fix to this problem might be "reduce threshold value once -Os is turned on". 8. LP:637220 allocate local variables with fewer instructions GCC PR40657 is about this kind of problem, and was fixed. The similar prolbme exits on gcc with hardfp. Cause: Unknown. Difficulty: Unknown. Recommendation: No recommendation so far. 9. GCC PR 43721 Failure to optimize (a/b) and (a%b) into single __aeabi_idivmod call Difficulty: Medium or easy. Recommendation: No. 10. LP:637814 Combine add/move to add LP:637882 Combine ldr/mov to ldr Possible improvements have been found. No idea how to fix it yet. Cause: Unknown. Difficulty: Unknown. Recommendation: No. 11. LP:638014 Replace memset by memclr when 2nd parameter is zero Difficulty: Easy. Recommendation: No recommendation so far. 12. LP:625233 Merge constant pools for small functions Cause: Unknown. Difficulty: Medium. Recommendation: No. 13. LP:638935 Replace multiple vldr by vldm Some vldr insns accessing consecutive address can be replaced by single vldm. It is not about thumb2, but related to code size optimization. Cause: Unknown. Difficulty: Medium. Recommendation: No. -- Yao Qi CodeSourcery yao(a)codesourcery.com (650) 331-3385 x739

15 years, 7 months

2
2
0 0

Cross compiling for embedded targets

by Michael Hope

Hi there. I've always wanted to mix this: http://www.futurlec.com/ET-STM32_Stamp.shtml with some of this: http://bit.ly/cD0JPS to control my one of these: http://www.traxxas.com/products/electric/rustler2006/gallery/3705-3qrtr-Bla… and it sounds like a good opportunity to dogfood the Linaro toolchain at the same time. What's the best way to set up a Cortex-M3 toolchain with an appropriate newlib and libgcc? A wrapper script works fine but I need a way of recompiling libgcc for the Cortex-M series. I'd love to get a arm-none-eabi toolchain package out of this that others could use. Could I re-work the cross packaging to use newlib and change the configure flags instead? Are there existing Debianised cross packages that I could reuse? Ta, -- Michael

15 years, 8 months

5
5
0 0

2010.09 build results

by Michael Hope

Hi Andrew. Well, the builds are done and they're OK. I've added the ability to compare against an explicit release to make checking regressions easier. 4.4 results are here: http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs… http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs… http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.4-2010.09-1/logs… i686 and x86_64 have not regressed since 2010.08. On arm, and ignoring the limits test, 2010.09 adds a failure on gcc.c-torture/compile/991026-2.c. According to the log the run timed out but I can't reproduce it. 4.5 results are here: http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs… http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs… http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.5-2010.09-0/logs… i686 has not regressed since 2010.08. x86_64 fails on gcc.target/i386/wmul-1.c, but this is a new tests for new features and are not a regression against 4.5.1. arm is messier. The following new failures exist: Vectoriser related: * g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorized 1 loops" 1 * g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorizing stmts using SLP" 1 * gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect "vectorized 1 loops" 1 * gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect "vectorized 1 loops" 1 * gcc.dg/vect/vect-reduc-dot-s16b.c scan-tree-dump-times vect "vectorized 1 loops" 0 * gcc.dg/vect/vect-reduc-pattern-1a.c scan-tree-dump-times vect "vectorized 1 loops" 0 * gcc.dg/vect/vect-reduc-pattern-1b.c scan-tree-dump-times vect "vectorized 1 loops" 0 * gcc.dg/vect/vect-reduc-pattern-1c.c scan-tree-dump-times vect "vectorized 1 loops" 0 * gcc.dg/vect/vect-reduc-pattern-2a.c scan-tree-dump-times vect "vectorized 1 loops" 0 * gcc.dg/vect/vect-reduc-pattern-2b.c scan-tree-dump-times vect "vectorized 1 loops" 0 * gcc.dg/vect/wrapv-vect-reduc-pattern-2c.c scan-tree-dump-times vect "vectorized 1 loops" 0 Others: * gcc.target/arm/neon-load-df0.c scan-assembler vmov.i32[ \t]+[dD][0-9]+, #0\n * gcc.target/arm/synchronize.c scan-assembler __sync_synchronize neon-load-df0 is a new test. synchronize.c is an incorrect test as the compiler now correctly uses the dmb instruction. Your thoughts? -- Michael

15 years, 8 months

2
1
0 0

gdb-linaro bzr tree

by Yao Qi

Hi, I tried linaro gdb branch just now, $ bzr branch lp:gdb-linaro parent branch is bzr+ssh://bazaar.launchpad.net/%2Bbranch/gdb-linaro/, which looks strange. In gcc-linaro, parent branch is bzr+ssh://bazaar.launchpad.net/~linaro-toolchain-dev/gcc-linaro/4.5/ I thought parent branch of linaro gdb tree should be like bzr+ssh://bazaar.launchpad.net/~linaro-toolchain-dev/gdb-linaro/7.2/ Is there something wrong in this branch?

15 years, 8 months

1
0
0 0

armel cross toolchain packages available in PPA

by Marcin Juszkiewicz

I would like to announce that my work on armel cross toolchain got to the very nice point - all packages are available from PPA. What does it mean to you? 1. no "are you sure to install those unverified packages" messages from APT 2. ability to easily rebuild toolchain on own machines So if you used my repository from people.canonical.com then please switch to PPA one: add-apt-repository ppa:hrw/armel-cross-compilers Old repository will be available for some time but will not get any updates. Next step: merging those packages into Maverick release. Regards, -- JID: hrw(a)jabber.org Website: http://marcin.juszkiewicz.com.pl/ LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

15 years, 8 months

2
1
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain