- linaro-toolchain - lists.linaro.org

Release notes for 2012.01

by Michael Hope

I liked Andrew's idea of sharing the release notes so I stole it for this month: https://wiki.linaro.org/MichaelHope/Sandbox/GCC2012.01ReleaseNotes Please add anything new and noteworthy that you've done to this draft. -- Michael

13 years, 6 months

1
0
0 0

[ACTIVITY] weekly status

by Ken Werner

Hi, * Android * migrated my linaro android build environment * did a small change to the debuggerd patch (thanks Sylvain) * OpenEmbedded * the linaro binary toolchain uses multiarch paths while OE doesn't -> setup a workaround to make it look like a classic one * however, I think what I really want is to build the libc instead of using the one provided by the binary external toolchain * The core-image-minimal still doesn't boot properly because 'init' doesn't come properly Regards Ken

13 years, 6 months

1
0
0 0

Patch drop: fwprop.c patch for neon-strided-load-extract

by Richard Sandiford

The remaining change for neon-strided-load-extract is to allow fwprop.c to propagate: (set (reg X) (subreg (reg Y) N)) even if no further simplifications are possible. I posted the original patch for comments here: http://article.gmane.org/gmane.comp.gcc.patches/246180/ and fixed the problem that H.J. spotted. I wasn't entirely happy with the benchmark results though, so it never became an RFA. Richard gcc/ * fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs. Index: gcc/fwprop.c =================================================================== --- gcc/fwprop.c 2011-09-15 14:36:23.206143787 +0100 +++ gcc/fwprop.c 2011-09-15 14:36:40.995131564 +0100 @@ -664,7 +664,12 @@ propagate_rtx (rtx x, enum machine_mode return NULL_RTX; flags = 0; - if (REG_P (new_rtx) || CONSTANT_P (new_rtx)) + if (REG_P (new_rtx) + || CONSTANT_P (new_rtx) + || (GET_CODE (new_rtx) == SUBREG + && REG_P (SUBREG_REG (new_rtx)) + && (GET_MODE_SIZE (mode) + <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))) flags |= PR_CAN_APPEAR; if (!for_each_rtx (&new_rtx, varying_mem_p, NULL)) flags |= PR_HANDLE_MEM;

13 years, 6 months

2
3
0 0

[ACTIVITY] 2011-12-23 - and goodbye

by David Gilbert

== QEMU == * Wrote the context routines for Eglibc, including those that QEMU uses These pass all the context tests I could find, including QEMUs coroutine tests, and with them QEMU seems to boot OK. I've got a full eglibc test run going at the moment, but I don't think anything else uses them. I posted them with comments and a question to libc-ports; I'll try and chase follow ups. == String routines == * I posted the strchr and strlen routines to eglibc (libc-ports) * On strchr the question of whether it was worth using the longer version that's faster for longer strings (but slower for shorter strings) came up. I posted some stats, observations etc - and there is still a discussion on going about it. * For strlen, rth noted the same trick that I'd originally seen in newlib (and for which RichardS and Ramana had suggested) of a quicker end-of-string sequence using clz. I'd avoided this because I'd originally seen it in newlib and didn't want to copy it; but since 3 people have individually suggested it it would seem using. == Goodbye! == Thank you all for a fun & interesting year! I'm sure many of us will meet online again in the future. I'll try and follow my linaro.org address while it's still live to check for any replies to any patches/comments etc. Feel free to mail me at davidgil(a)uk.ibm.com (work) or dave(a)treblig.org (home); for Linaro people I've also added some more contact methods at: https://wiki.linaro.org/Internal/People/DaveGilbert/Contact Thanks again! Dave

13 years, 6 months

2
1
0 0

Fwd: New question: problem in installing Linaro tools on Ask Linaro

by Andy Doan

anyone have a suggestion for this person? -------- Original Message -------- Subject: New question: problem in installing Linaro tools on Ask Linaro Date: Mon, 26 Dec 2011 21:31:57 -0800 (PST) From: Ask Linaro <dnsadmin(a)linaro.org> To: doanac <andy.doan(a)linaro.org> Ask Linaro - Q & A forum for Linaro developers <http://ask.linaro.org/> ------------------------------------------------------------------------ Hello doanac, chandrakala <http://ask.linaro.org/users/172/chandrakala> has just posted a new question on Ask Linaro, entitled problem in installing Linaro tools <http://ask.linaro.org/questions/456/problem-in-installing-linaro-tools> and tagged "/linaro <http://ask.linaro.org/tags/linaro/> installation <http://ask.linaro.org/tags/installation/>/". Here's what it says: Hi all, Iam new to linux and gnu tools. i have downloaded linaro gnu tools to install cortex-a8 on my ubuntu machine. iam able to successfully install gcc tools. But when i try to install arm tools it is throwing error. following are the commands used to install gnu tools and the error it is generated mckala@mckala:~/Desktop/gstreamer/gcc_objdir$ ../gcc-linaro-4.6-2011.12/gcc-linaro-4.6-2011.12/configure --target=arm-*-elf --disable-bootstrap --enable-languages=c,c++ --with-mode=thumb --with-arch=armv7-a --with-tune=cortex-a8 --with-fpu=neon --with-float=softfp --prefix=/home/mckala/Desktop/gstreamer/gcc_objdir --disable-werror --with-newlib mckala@mckala:~/Desktop/gstreamer/gcc_objdir$ make all-host mckala@mckala:~/Desktop/gstreamer/gcc_objdir$ make install-host with this gnu tools installation is completed and i did not get any error. when i used the following command to install cross compiler, it is giving error mckala@mckala:~/Desktop/gstreamer/gcc_objdir$ make The error is checking for suffix of object files... configure: error: in |/home/mckala/Desktop/gstreamer/gcc_objdir/arm-*-elf/libgcc': configure: error: cannot compute suffix of object files: cannot compile See|config.log' for more details. make[1]: *** [configure-target-libgcc] Error 1 config.log file is showing the following error. gcc_objdir/arm-/-elf/bin/ -B/home/mckala/Desktop/gstreamer/gcc_objdir/arm-/-elf/lib/ -isystem /home/mckala/Desktop/gstreamer/gcc_objdir/arm-/-elf/include -isystem /home/mckala/Desktop/gstreamer/gcc_objdir/arm-/-elf/sys-include -V >&5 xgcc: fatal error: no input files compilation terminated. Can any one please help on this. Thanks in advance. Don't forget to come over and cast your vote. Thanks, Ask Linaro P.S. You can always fine-tune which notifications you receive here <http://ask.linaro.org//users/16/doanac/subscriptions/>. ------------------------------------------------------------------------

13 years, 6 months

2
1
0 0

Round-up and goodbye

by Richard Sandiford

I've posted all my WIP patches to this list over the last few days. Please treat them kindly. :-) I've also tried to update the relevant blueprints. I pinged the 4.5 and 4.6 backports for lp736661 on gcc-patches last week, then again this week, but there's obviously not likely to be much response at this time of year. I therefore went ahead with the Linaro merges of the branches rather than relying on them being committed to FSF branches in time for next month's Linaro release. I'll continue to ping though. If I've forgotten anything, or if you need more info, please don't hesitate to ask. I'll continue to monitor my Linaro email address as long as it remains active, although rdsandiford(a)googlemail.com is likely to be a better bet. With all that out of the way, I just wanted to say thank you to everyone for making this a really enjoyable project to work on. It feels like we managed to get through a fair number of new features, performance improvements and bug fixes this year. I hope Linaro will be around for a good few years yet and that it continues to go from strength to strength. Happy New Year, and all the best, Richard

13 years, 6 months

1
0
0 0

Patch drop: sms-and-memory-dependencies

by Richard Sandiford

Here's the patch for sms-and-memory-dependencies. The idea is to bypass the sched-deps.c {output,read,anti,true}_dependence tests altogether -- which is easy to do thanks to the note_mem_dep hook -- and instead handle them in ddg.c. The ddg.c tests then use RTL loop iv analysis to try to get longer distances on the memory dependencies. (Note that other memory-related dependencies, such as those between volatile MEMs and other volatile instructions, are still handled by sched-deps.c.) Dependencies are now always created in pairs, so there's no need for get_sched_window to set an upper bound when processing incoming MEM_DEPs, or a lower bound when processing outgoing MEM_DEPs; we can rely on the partnering edge to do that instead. Richard gcc/ * Makefile.in (ddg.o): Depend on $(TREE_PASS_H). * ddg.h (REG_OR_MEM_DEP, REG_AND_MEM_DEP): Delete. (ddg_mem_ref): New structure. (ddg): Add loads and stores array. (create_ddg): Add a loop argument. (add_edges_to_ddg): Declare. (MAX_DDG_DISTANCE): New macro. * ddg.c: Include tree-pass.h. (mem_ref_p, mark_mem_use, mark_mem_use_1, mem_read_insn_p) (mark_mem_store, mem_write_insn_p, rtx_mem_access_p) (mem_access_insn_p): Delete. (create_mem_ref): New function. (graph_and_node): New structure. (record_loads, record_stores): New functions. (create_ddg_dep_from_intra_loop_link): Treat all dependencies as register dependencies. (walk_mems_2, walk_mems_1, insns_may_alias_p, add_intra_loop_mem_dep) (add_inter_loop_mem_dep): Delete. (build_intra_loop_deps): Ignore memory dependencies created by sched-deps.c. Don't handle memory dependencies here. (measure_mem_distance, add_memory_dep): New functions. (FOR_EACH_LATER_MEM_REF): New macro. (build_memory_deps): New function. (create_ddg): Take the loop as argument. Don't count loads and stores here. Call iv_analysis_loop_init. Pass all loads to record_loads and all stores to record_stores. Move edge creation to... (add_edges_to_ddg): ...this new function. Also call build_memory_deps. * modulo-sched.c (sat_mulpp, sat_addsp, sat_subsp): New functions. (schedule_reg_moves): Only handle register dependencies. (sms_schedule): Update call to create_ddg. Call iv_analysis_done after creating all ddgs. Only set issue_rate if there are ddgs. Only call setup_sched_infos and haifa_sched_init if there are ddgs. Call add_edges_to_ddg before processing each ddg. (get_sched_window): Use saturating arithmetic. Do not add an implicit upper bound for incoming MEM_DEPs, or an implicit lower bound for outgoing MEM_DEPs. Rework calculation of final window. (calculate_must_precede_follow, compute_split_row): Use saturating arithmetic. Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2011-12-30 13:13:45.077544981 +0000 +++ gcc/Makefile.in 2011-12-30 13:24:57.330195801 +0000 @@ -3316,7 +3316,7 @@ ddg.o : ddg.c $(DDG_H) $(CONFIG_H) $(SYS $(DIAGNOSTIC_CORE_H) $(RTL_H) $(TM_P_H) $(REGS_H) $(FUNCTION_H) \ $(FLAGS_H) insn-config.h $(INSN_ATTR_H) $(EXCEPT_H) $(RECOG_H) \ $(SCHED_INT_H) $(CFGLAYOUT_H) $(CFGLOOP_H) $(EXPR_H) $(BITMAP_H) \ - hard-reg-set.h sbitmap.h $(TM_H) + hard-reg-set.h sbitmap.h $(TM_H) $(TREE_PASS_H) modulo-sched.o : modulo-sched.c $(DDG_H) $(CONFIG_H) $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(TARGET_H) $(DIAGNOSTIC_CORE_H) $(RTL_H) $(TM_P_H) $(REGS_H) $(FUNCTION_H) \ $(FLAGS_H) insn-config.h $(INSN_ATTR_H) $(EXCEPT_H) $(RECOG_H) \ Index: gcc/ddg.h =================================================================== --- gcc/ddg.h 2011-12-30 13:13:45.077544981 +0000 +++ gcc/ddg.h 2011-12-30 13:24:57.324195831 +0000 @@ -35,8 +35,7 @@ typedef struct ddg_scc *ddg_scc_ptr; typedef struct ddg_all_sccs *ddg_all_sccs_ptr; typedef enum {TRUE_DEP, OUTPUT_DEP, ANTI_DEP} dep_type; -typedef enum {REG_OR_MEM_DEP, REG_DEP, MEM_DEP, REG_AND_MEM_DEP} - dep_data_type; +typedef enum {REG_DEP, MEM_DEP} dep_data_type; /* The following two macros enables direct access to the successors and predecessors bitmaps held in each ddg_node. Do not make changes to @@ -44,6 +43,28 @@ typedef enum {REG_OR_MEM_DEP, REG_DEP, M #define NODE_SUCCESSORS(x) ((x)->successors) #define NODE_PREDECESSORS(x) ((x)->predecessors) +/* A structure that represents a memory read or write in the DDG; + context decides which. */ +struct ddg_mem_ref { + /* The previous reference of the same type (read or write) in the DDG. */ + struct ddg_mem_ref *prev; + + /* The DDG node that contains the memory reference. */ + ddg_node_ptr node; + + /* The memory reference itself. */ + rtx mem; + + /* If the address is a known induction variable, its value in iteration + I is given by: + + BASE + OFFSET + I * STEP + + In other cases BASE is null. */ + rtx base; + HOST_WIDE_INT offset, step; +}; + /* A structure that represents a node in the DDG. */ struct ddg_node { @@ -117,6 +138,11 @@ struct ddg /* Number of instructions in the basic block. */ int num_nodes; + /* The loads and stores in the BB, from the end of the block to + the beginning. */ + struct ddg_mem_ref *loads; + struct ddg_mem_ref *stores; + /* Number of load/store instructions in the BB - statistics. */ int num_loads; int num_stores; @@ -167,7 +193,9 @@ struct ddg_all_sccs }; -ddg_ptr create_ddg (basic_block, int closing_branch_deps); +struct loop; +ddg_ptr create_ddg (struct loop *, basic_block, int closing_branch_deps); +void add_edges_to_ddg (ddg_ptr); void free_ddg (ddg_ptr); void print_ddg (FILE *, ddg_ptr); @@ -188,4 +216,7 @@ int longest_simple_path (ddg_ptr, int fr bool autoinc_var_is_used_p (rtx, rtx); +/* The maximum allowable distance on a DDG edge. */ +#define MAX_DDG_DISTANCE INT_MAX + #endif /* GCC_DDG_H */ Index: gcc/ddg.c =================================================================== --- gcc/ddg.c 2011-12-30 13:13:45.077544981 +0000 +++ gcc/ddg.c 2011-12-30 13:36:35.005498271 +0000 @@ -43,6 +43,7 @@ Software Foundation; either version 3, o #include "expr.h" #include "bitmap.h" #include "ddg.h" +#include "tree-pass.h" #ifdef INSN_SCHEDULING @@ -61,88 +62,102 @@ static ddg_edge_ptr create_ddg_edge (ddg dep_data_type, int, int); static void add_edge_to_ddg (ddg_ptr g, ddg_edge_ptr); -/* Auxiliary variable for mem_read_insn_p/mem_write_insn_p. */ -static bool mem_ref_p; +/* Create a memory reference record for MEM, which occurs in NODE. + PREV is the previous reference of the same type. */ +static struct ddg_mem_ref * +create_mem_ref (struct ddg_mem_ref *prev, ddg_node_ptr node, rtx mem) +{ + struct ddg_mem_ref *entry; + enum machine_mode pmode; + struct rtx_iv iv; + rtx x; + + entry = XCNEW (struct ddg_mem_ref); + entry->prev = prev; + entry->node = node; + entry->mem = mem; + + pmode = targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem)); + if (iv_analyze_expr (node->insn, XEXP (mem, 0), pmode, &iv) + && iv.extend == UNKNOWN + && CONST_INT_P (iv.step)) + { + x = iv.base; + if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) + { + entry->base = XEXP (x, 0); + entry->offset = INTVAL (XEXP (x, 1)); + } + else + { + entry->base = x; + entry->offset = 0; + } + entry->step = INTVAL (iv.step); + } -/* Auxiliary function for mem_read_insn_p. */ -static int -mark_mem_use (rtx *x, void *data ATTRIBUTE_UNUSED) -{ - if (MEM_P (*x)) - mem_ref_p = true; - return 0; + if (dump_file) + { + fprintf (dump_file, "Found memory reference in insn %d:\n", + INSN_UID (node->insn)); + print_rtl (dump_file, mem); + if (entry->base) + { + fprintf (dump_file, "\nwith base:"); + print_rtl (dump_file, entry->base); + fprintf (dump_file, "\noffset " HOST_WIDE_INT_PRINT_DEC + " and step " HOST_WIDE_INT_PRINT_DEC "\n\n", + entry->offset, entry->step); + } + else + fprintf (dump_file, "\nwhich isn't a recognized iv\n\n"); + } + return entry; } -/* Auxiliary function for mem_read_insn_p. */ -static void -mark_mem_use_1 (rtx *x, void *data) -{ - for_each_rtx (x, mark_mem_use, data); -} +/* A structure for pairing a node and the graph to which it belongs. */ +struct graph_and_node { + ddg_ptr g; + ddg_node_ptr node; +}; -/* Returns nonzero if INSN reads from memory. */ -static bool -mem_read_insn_p (rtx insn) +/* A for_each_rtx callback. Record all loads in an instruction. + DATA points to a graph_and_node. */ +static int +record_loads_1 (rtx *loc, void *data) { - mem_ref_p = false; - note_uses (&PATTERN (insn), mark_mem_use_1, NULL); - return mem_ref_p; -} + struct graph_and_node *gn; -static void -mark_mem_store (rtx loc, const_rtx setter ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) -{ - if (MEM_P (loc)) - mem_ref_p = true; + if (MEM_P (*loc)) + { + gn = (struct graph_and_node *) data; + gn->g->loads = create_mem_ref (gn->g->loads, gn->node, *loc); + gn->g->num_loads++; + } + return 0; } -/* Returns nonzero if INSN writes to memory. */ -static bool -mem_write_insn_p (rtx insn) +/* A note_uses callback. Record all loads in an instruction. + DATA points to a graph_and_node. */ +static void +record_loads (rtx *loc, void *data) { - mem_ref_p = false; - note_stores (PATTERN (insn), mark_mem_store, NULL); - return mem_ref_p; + for_each_rtx (loc, record_loads_1, data); } -/* Returns nonzero if X has access to memory. */ -static bool -rtx_mem_access_p (rtx x) +/* A note_stores callback. Record all stores in an instruction. + DATA points to a graph_and_node. */ +static void +record_stores (rtx x, const_rtx setter ATTRIBUTE_UNUSED, void *data) { - int i, j; - const char *fmt; - enum rtx_code code; - - if (x == 0) - return false; + struct graph_and_node *gn; if (MEM_P (x)) - return true; - - code = GET_CODE (x); - fmt = GET_RTX_FORMAT (code); - for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--) { - if (fmt[i] == 'e') - { - if (rtx_mem_access_p (XEXP (x, i))) - return true; - } - else if (fmt[i] == 'E') - for (j = 0; j < XVECLEN (x, i); j++) - { - if (rtx_mem_access_p (XVECEXP (x, i, j))) - return true; - } + gn = (struct graph_and_node *) data; + gn->g->stores = create_mem_ref (gn->g->stores, gn->node, x); + gn->g->num_stores++; } - return false; -} - -/* Returns nonzero if INSN reads to or writes from memory. */ -static bool -mem_access_insn_p (rtx insn) -{ - return rtx_mem_access_p (PATTERN (insn)); } /* Return true if DEF_INSN contains address being auto-inc or auto-dec @@ -175,9 +190,6 @@ create_ddg_dep_from_intra_loop_link (ddg ddg_edge_ptr e; int latency, distance = 0; dep_type t = TRUE_DEP; - dep_data_type dt = (mem_access_insn_p (src_node->insn) - && mem_access_insn_p (dest_node->insn) ? MEM_DEP - : REG_DEP); gcc_assert (src_node->cuid < dest_node->cuid); gcc_assert (link); @@ -201,7 +213,7 @@ create_ddg_dep_from_intra_loop_link (ddg TODO: support the removal of all anti-deps edges, i.e. including those whose register has multiple defs in the loop. */ if (flag_modulo_sched_allow_regmoves - && (t == ANTI_DEP && dt == REG_DEP) + && t == ANTI_DEP && !autoinc_var_is_used_p (dest_node->insn, src_node->insn)) { rtx set; @@ -224,7 +236,7 @@ create_ddg_dep_from_intra_loop_link (ddg } latency = dep_cost (link); - e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance); + e = create_ddg_edge (src_node, dest_node, t, REG_DEP, latency, distance); add_edge_to_ddg (g, e); } @@ -380,107 +392,6 @@ build_inter_loop_deps (ddg_ptr g) } } - -static int -walk_mems_2 (rtx *x, rtx mem) -{ - if (MEM_P (*x)) - { - if (may_alias_p (*x, mem)) - return 1; - - return -1; - } - return 0; -} - -static int -walk_mems_1 (rtx *x, rtx *pat) -{ - if (MEM_P (*x)) - { - /* Visit all MEMs in *PAT and check indepedence. */ - if (for_each_rtx (pat, (rtx_function) walk_mems_2, *x)) - /* Indicate that dependence was determined and stop traversal. */ - return 1; - - return -1; - } - return 0; -} - -/* Return 1 if two specified instructions have mem expr with conflict alias sets*/ -static int -insns_may_alias_p (rtx insn1, rtx insn2) -{ - /* For each pair of MEMs in INSN1 and INSN2 check their independence. */ - return for_each_rtx (&PATTERN (insn1), (rtx_function) walk_mems_1, - &PATTERN (insn2)); -} - -/* Given two nodes, analyze their RTL insns and add intra-loop mem deps - to ddg G. */ -static void -add_intra_loop_mem_dep (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to) -{ - - if ((from->cuid == to->cuid) - || !insns_may_alias_p (from->insn, to->insn)) - /* Do not create edge if memory references have disjoint alias sets - or 'to' and 'from' are the same instruction. */ - return; - - if (mem_write_insn_p (from->insn)) - { - if (mem_read_insn_p (to->insn)) - create_ddg_dep_no_link (g, from, to, - DEBUG_INSN_P (to->insn) - ? ANTI_DEP : TRUE_DEP, MEM_DEP, 0); - else - create_ddg_dep_no_link (g, from, to, - DEBUG_INSN_P (to->insn) - ? ANTI_DEP : OUTPUT_DEP, MEM_DEP, 0); - } - else if (!mem_read_insn_p (to->insn)) - create_ddg_dep_no_link (g, from, to, ANTI_DEP, MEM_DEP, 0); -} - -/* Given two nodes, analyze their RTL insns and add inter-loop mem deps - to ddg G. */ -static void -add_inter_loop_mem_dep (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to) -{ - if (!insns_may_alias_p (from->insn, to->insn)) - /* Do not create edge if memory references have disjoint alias sets. */ - return; - - if (mem_write_insn_p (from->insn)) - { - if (mem_read_insn_p (to->insn)) - create_ddg_dep_no_link (g, from, to, - DEBUG_INSN_P (to->insn) - ? ANTI_DEP : TRUE_DEP, MEM_DEP, 1); - else if (from->cuid != to->cuid) - create_ddg_dep_no_link (g, from, to, - DEBUG_INSN_P (to->insn) - ? ANTI_DEP : OUTPUT_DEP, MEM_DEP, 1); - } - else - { - if (mem_read_insn_p (to->insn)) - return; - else if (from->cuid != to->cuid) - { - create_ddg_dep_no_link (g, from, to, ANTI_DEP, MEM_DEP, 1); - if (DEBUG_INSN_P (from->insn) || DEBUG_INSN_P (to->insn)) - create_ddg_dep_no_link (g, to, from, ANTI_DEP, MEM_DEP, 1); - else - create_ddg_dep_no_link (g, to, from, TRUE_DEP, MEM_DEP, 1); - } - } - -} - /* Perform intra-block Data Dependency analysis and connect the nodes in the DDG. We assume the loop has a single basic block. */ static void @@ -493,6 +404,9 @@ build_intra_loop_deps (ddg_ptr g) /* Build the dependence information, using the sched_analyze function. */ init_deps_global (); + /* Ignore the usual dependencies between two MEM rtxes. We still rely + on sched_analyze to handle memory barriers and the like. */ + sched_deps_info->note_mem_dep = 0; init_deps (&tmp_deps, false); /* Do the intra-block data dependence analysis for the given block. */ @@ -519,37 +433,6 @@ build_intra_loop_deps (ddg_ptr g) create_ddg_dep_from_intra_loop_link (g, src_node, dest_node, dep); } - - /* If this insn modifies memory, add an edge to all insns that access - memory. */ - if (mem_access_insn_p (dest_node->insn)) - { - int j; - - for (j = 0; j <= i; j++) - { - ddg_node_ptr j_node = &g->nodes[j]; - if (DEBUG_INSN_P (j_node->insn)) - continue; - if (mem_access_insn_p (j_node->insn)) - { - /* Don't bother calculating inter-loop dep if an intra-loop dep - already exists. */ - if (! TEST_BIT (dest_node->successors, j)) - add_inter_loop_mem_dep (g, dest_node, j_node); - /* If -fmodulo-sched-allow-regmoves - is set certain anti-dep edges are not created. - It might be that these anti-dep edges are on the - path from one memory instruction to another such that - removing these edges could cause a violation of the - memory dependencies. Thus we add intra edges between - every two memory instructions in this case. */ - if (flag_modulo_sched_allow_regmoves - && !TEST_BIT (dest_node->predecessors, j)) - add_intra_loop_mem_dep (g, j_node, dest_node); - } - } - } } /* Free the INSN_LISTs. */ @@ -560,13 +443,187 @@ build_intra_loop_deps (ddg_ptr g) sched_free_deps (head, tail, false); } +/* Given a "source" memory reference from iteration 0 and a "target" + memory reference from iteration BASE_DISTANCE, return the first + N >= BASE_DISTANCE such that the source reference in iteration 0 + overlaps the target reference in iteration N. + + FROM_OFFSET is the offset of the source reference from an unspecified + base, while TO_OFFSET is the offset of the target reference from that + same base. FROM_SIZE and TO_SIZE are the sizes of the two references + in bytes. + + PMODE is the mode of both addresses, and STEP is the amount that + will be added to each address by one loop iteration. */ +static int +measure_mem_distance (enum machine_mode pmode, + unsigned HOST_WIDE_INT from_offset, + unsigned HOST_WIDE_INT from_size, + unsigned HOST_WIDE_INT to_offset, + unsigned HOST_WIDE_INT to_size, + unsigned HOST_WIDE_INT base_distance, + HOST_WIDE_INT step) +{ + unsigned HOST_WIDE_INT extra, from2to, to2from; + + from2to = (to_offset - from_offset) & GET_MODE_MASK (pmode); + to2from = (from_offset - to_offset) & GET_MODE_MASK (pmode); + if (from2to < from_size || to2from < to_size) + /* The source reference in iteration 0 overlaps the target reference + in iteration BASE_DISTANCE. The check is written this way to cope + with cases where offset + size overflows. */ + return base_distance; + + /* N > BASE_DISTANCE. To cope more easily with cases where the round-up + divisions: + + (to2from - (to_size - 1) + (step - 1)) / step + (from2to - (from_size - 1) + (step - 1)) / step + + would overflow, bump BASE_DISTANCE and subtract STEP from each + dividend to compensate. */ + base_distance++; + if (step > 0) + extra = (to2from - to_size) / (unsigned HOST_WIDE_INT) step; + else + extra = (from2to - from_size) / (unsigned HOST_WIDE_INT) -step; + if (extra > MAX_DDG_DISTANCE || base_distance + extra > MAX_DDG_DISTANCE) + return MAX_DDG_DISTANCE; + return base_distance + extra; +} + +/* If FROM and TO might alias, record memory dependencies: + + FROM--(FORWARD_TYPE)-->TO + and TO--(BACKWARD_TYPE)-->FROM + + FROM comes before TO in the original loop, and both belong to G. + FORWARD_DISTANCE is the minimum distance of the FROM--->TO dependence. */ +static void +add_memory_dep (ddg_ptr g, struct ddg_mem_ref *from, + struct ddg_mem_ref *to, dep_type forward_type, + dep_type backward_type, int forward_distance) +{ + HOST_WIDE_INT step; + unsigned HOST_WIDE_INT from_size, to_size, to_disp, abs_step, future_offset; + enum machine_mode pmode; + int backward_distance; + + gcc_checking_assert (from->node->cuid < to->node->cuid); + + if (!may_alias_p (from->mem, to->mem)) + return; + + /* In the worst case, the TO---->FROM edge has a distance of 1. */ + backward_distance = 1; + + /* See if we can get more accurate distances. Look for cases where + the addresses of FROM and TO are ivs with the same base and step. */ + if (from->base + && to->base + && from->step + && from->step == to->step + && !MEM_VOLATILE_P (from->mem) + && !MEM_VOLATILE_P (to->mem) + && MEM_SIZE_KNOWN_P (from->mem) + && MEM_SIZE_KNOWN_P (to->mem) + && MEM_ADDR_SPACE (from->mem) == MEM_ADDR_SPACE (to->mem) + && rtx_equal_p (from->base, to->base)) + { + step = to->step; + abs_step = (step < 0 ? -step : step); + from_size = MEM_SIZE (from->mem); + to_size = MEM_SIZE (to->mem); + + /* If the step is a power of two, or the negative of a power of two, + see whether we can prove that the references never overlap. */ + if (abs_step == (abs_step & -abs_step)) + { + to_disp = (to->offset - from->offset) % abs_step; + if (from_size <= to_disp && to_disp + to_size <= abs_step) + return; + } + + pmode = targetm.addr_space.address_mode (MEM_ADDR_SPACE (to->mem)); + future_offset = to->offset + forward_distance * step; + forward_distance = measure_mem_distance (pmode, + from->offset, from_size, + future_offset, to_size, + forward_distance, step); + future_offset = from->offset + backward_distance * step; + backward_distance = measure_mem_distance (pmode, + to->offset, to_size, + future_offset, from_size, + backward_distance, step); + } + + if (DEBUG_INSN_P (from->node->insn) || DEBUG_INSN_P (to->node->insn)) + { + forward_type = ANTI_DEP; + backward_type = ANTI_DEP; + } + create_ddg_dep_no_link (g, from->node, to->node, forward_type, MEM_DEP, + forward_distance); + create_ddg_dep_no_link (g, to->node, from->node, backward_type, MEM_DEP, + backward_distance); +} + +/* Make REF2 iterate over all entries in ddg_mem_ref list LIST + that come later than ddg_mem_ref REF1. */ +#define FOR_EACH_LATER_MEM_REF(REF2, REF1, LIST) \ + for (REF2 = (LIST); \ + REF2 && REF2->node->cuid > REF1->node->cuid; \ + REF2 = REF2->prev) + +/* Check for dependencies between pairs of memory rtxes. */ +static void +build_memory_deps (ddg_ptr g) +{ + struct ddg_mem_ref *ref1, *ref2; + int distance; + + for (ref1 = g->loads; ref1; ref1 = ref1->prev) + { + /* LOAD--->LOAD. */ + if (MEM_VOLATILE_P (ref1->mem)) + FOR_EACH_LATER_MEM_REF (ref2, ref1, g->loads) + if (MEM_VOLATILE_P (ref2->mem)) + add_memory_dep (g, ref1, ref2, ANTI_DEP, ANTI_DEP, 0); + + /* LOAD--->STORE. */ + FOR_EACH_LATER_MEM_REF (ref2, ref1, g->stores) + { + distance = anti_dependence (ref1->mem, ref2->mem) ? 0 : 1; + add_memory_dep (g, ref1, ref2, ANTI_DEP, TRUE_DEP, distance); + } + } + + for (ref1 = g->stores; ref1; ref1 = ref1->prev) + { + /* STORE--->LOAD. */ + FOR_EACH_LATER_MEM_REF (ref2, ref1, g->loads) + { + distance = true_dependence (ref1->mem, VOIDmode, + ref2->mem, rtx_varies_p) ? 0 : 1; + add_memory_dep (g, ref1, ref2, TRUE_DEP, ANTI_DEP, distance); + } + + /* STORE--->STORE. */ + FOR_EACH_LATER_MEM_REF (ref2, ref1, g->stores) + { + distance = output_dependence (ref1->mem, ref2->mem) ? 0 : 1; + add_memory_dep (g, ref1, ref2, OUTPUT_DEP, OUTPUT_DEP, distance); + } + } +} -/* Given a basic block, create its DDG and return a pointer to a variable - of ddg type that represents it. +/* Given a basic block, create the nodes of its DDG and return a pointer + to a variable of ddg type that represents it. Initialize the ddg structure fields to the appropriate values. */ ddg_ptr -create_ddg (basic_block bb, int closing_branch_deps) +create_ddg (struct loop *loop, basic_block bb, int closing_branch_deps) { + struct graph_and_node gn; ddg_ptr g; rtx insn, first_note; int i; @@ -586,13 +643,6 @@ create_ddg (basic_block bb, int closing_ if (DEBUG_INSN_P (insn)) g->num_debug++; - else - { - if (mem_read_insn_p (insn)) - g->num_loads++; - if (mem_write_insn_p (insn)) - g->num_stores++; - } num_nodes++; } @@ -603,6 +653,8 @@ create_ddg (basic_block bb, int closing_ return NULL; } + iv_analysis_loop_init (loop); + /* Allocate the nodes array, and initialize the nodes. */ g->num_nodes = num_nodes; g->nodes = (ddg_node_ptr) xcalloc (num_nodes, sizeof (struct ddg_node)); @@ -637,18 +689,31 @@ create_ddg (basic_block bb, int closing_ g->nodes[i].predecessors = sbitmap_alloc (num_nodes); sbitmap_zero (g->nodes[i].predecessors); g->nodes[i].first_note = (first_note ? first_note : insn); - g->nodes[i++].insn = insn; + g->nodes[i].insn = insn; first_note = NULL_RTX; + + gn.g = g; + gn.node = &g->nodes[i]; + note_uses (&PATTERN (insn), record_loads, &gn); + note_stores (PATTERN (insn), record_stores, &gn); + + i++; } /* We must have found a branch in DDG. */ gcc_assert (g->closing_branch); + return g; +} +/* Add the edges to a DDG that was previously created by create_ddg. + This function relies on scheduler dependencies. */ - /* Build the data dependency graph. */ +void +add_edges_to_ddg (ddg_ptr g) +{ build_intra_loop_deps (g); build_inter_loop_deps (g); - return g; + build_memory_deps (g); } /* Free all the memory allocated for the DDG. */ Index: gcc/modulo-sched.c =================================================================== --- gcc/modulo-sched.c 2011-12-30 13:13:45.077544981 +0000 +++ gcc/modulo-sched.c 2011-12-30 13:24:57.327195816 +0000 @@ -345,6 +345,38 @@ ps_num_consecutive_stages (partial_sched return ps_reg_move (ps, id)->num_consecutive_stages; } +/* Perform a saturating multiplication of nonnegative values A and B. */ + +static inline int +sat_mulpp (unsigned int a, unsigned int b) +{ + if ((unsigned int) INT_MAX / b <= a) + return INT_MAX; + else + return a * b; +} + +/* Perform a saturating addition of signed value A and nonnegative value B. */ + +static inline int +sat_addsp (int a, int b) +{ + if (INT_MAX - b <= a) + return INT_MAX; + return a + b; +} + +/* Perform a saturating subtraction of signed value A and nonnegative + value B. */ + +static inline int +sat_subsp (int a, int b) +{ + if (INT_MIN + b >= a) + return INT_MIN; + return a - b; +} + /* Given HEAD and TAIL which are the first and last insns in a loop; return the register which controls the loop. Return zero if it has more than one occurrence in the loop besides the control part or the @@ -709,7 +741,9 @@ schedule_reg_moves (partial_schedule_ptr ranges started at u (excluding self-loops). */ distances[0] = distances[1] = false; for (e = u->out; e; e = e->next_out) - if (e->type == TRUE_DEP && e->dest != e->src) + if (e->data_type == REG_DEP + && e->type == TRUE_DEP + && e->dest != e->src) { int nreg_moves4e = (SCHED_TIME (e->dest->cuid) - SCHED_TIME (e->src->cuid)) / ii; @@ -781,7 +815,9 @@ schedule_reg_moves (partial_schedule_ptr copy of this register, depending on the time the use is scheduled. Record which uses require which move results. */ for (e = u->out; e; e = e->next_out) - if (e->type == TRUE_DEP && e->dest != e->src) + if (e->data_type == REG_DEP + && e->type == TRUE_DEP + && e->dest != e->src) { int dest_copy = (SCHED_TIME (e->dest->cuid) - SCHED_TIME (e->src->cuid)) / ii; @@ -1355,6 +1391,7 @@ sms_schedule (void) basic_block condition_bb = NULL; edge latch_edge; gcov_type trip_count = 0; + int num_ddgs; loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_RECORDED_EXITS); @@ -1364,34 +1401,19 @@ sms_schedule (void) return; /* There are no loops to schedule. */ } - /* Initialize issue_rate. */ - if (targetm.sched.issue_rate) - { - int temp = reload_completed; - - reload_completed = 1; - issue_rate = targetm.sched.issue_rate (); - reload_completed = temp; - } - else - issue_rate = 1; - - /* Initialize the scheduler. */ - setup_sched_infos (); - haifa_sched_init (); - /* Allocate memory to hold the DDG array one entry for each loop. We use loop->num as index into this array. */ g_arr = XCNEWVEC (ddg_ptr, number_of_loops ()); if (dump_file) - { - fprintf (dump_file, "\n\nSMS analysis phase\n"); - fprintf (dump_file, "===================\n\n"); - } + { + fprintf (dump_file, "\n\nSMS loop discovery phase\n"); + fprintf (dump_file, "========================\n\n"); + } /* Build DDGs for all the relevant loops and hold them in G_ARR indexed by the loop index. */ + num_ddgs = 0; FOR_EACH_LOOP (li, loop, 0) { rtx head, tail; @@ -1512,7 +1534,7 @@ sms_schedule (void) instructions. The branch is rotated to be in row ii-1 at the end of the scheduling procedure to make sure it's the last instruction in the iteration. */ - if (! (g = create_ddg (bb, 1))) + if (! (g = create_ddg (loop, bb, 1))) { if (dump_file) fprintf (dump_file, "SMS create_ddg failed\n"); @@ -1523,12 +1545,38 @@ sms_schedule (void) if (dump_file) fprintf (dump_file, "...OK\n"); + num_ddgs++; + } + iv_analysis_done (); + + if (num_ddgs == 0) + { + if (dump_file) + fprintf (dump_file, "No suitable loops\n"); + goto done; } + + /* Initialize issue_rate. */ + if (targetm.sched.issue_rate) + { + int temp = reload_completed; + + reload_completed = 1; + issue_rate = targetm.sched.issue_rate (); + reload_completed = temp; + } + else + issue_rate = 1; + + /* Initialize the scheduler. */ + setup_sched_infos (); + haifa_sched_init (); + if (dump_file) - { - fprintf (dump_file, "\nSMS transformation phase\n"); - fprintf (dump_file, "=========================\n\n"); - } + { + fprintf (dump_file, "\nSMS transformation phase\n"); + fprintf (dump_file, "=========================\n\n"); + } /* We don't want to perform SMS on new loops - created by versioning. */ FOR_EACH_LOOP (li, loop, 0) @@ -1542,6 +1590,8 @@ sms_schedule (void) if (! (g = g_arr[loop->num])) continue; + add_edges_to_ddg (g); + if (dump_file) { rtx insn = BB_END (loop->header); @@ -1754,10 +1804,12 @@ sms_schedule (void) free_ddg (g); } - free (g_arr); - /* Release scheduler data, needed until now because of DFA. */ haifa_sched_finish (); + + done: + free (g_arr); + loop_optimizer_finalize (); } @@ -1844,6 +1896,7 @@ #define DFA_HISTORY SMS_DFA_HISTORY /* A threshold for the number of repeated unsuccessful attempts to insert an empty row, before we flush the partial schedule and start over. */ #define MAX_SPLIT_NUM 10 + /* Given the partial schedule PS, this function calculates and returns the cycles in which we can schedule the node with the given index I. NOTE: Here we do the backtracking in SMS, in some special cases. We have @@ -1896,7 +1949,7 @@ get_sched_window (partial_schedule_ptr p fprintf (dump_file, "=========== =========== =========== ===========" " =====\n"); } - /* Calculate early_start and limit end. Both bounds are inclusive. */ + /* Calculate early_start and limit start. Both bounds are inclusive. */ if (psp_not_empty) for (e = u_node->in; e != 0; e = e->next_in) { @@ -1905,26 +1958,36 @@ get_sched_window (partial_schedule_ptr p if (TEST_BIT (sched_nodes, v)) { int p_st = SCHED_TIME (v); - int earliest = p_st + e->latency - (e->distance * ii); - int latest = (e->data_type == MEM_DEP ? p_st + ii - 1 : INT_MAX); + int earliest = sat_subsp (sat_addsp (p_st, e->latency), + sat_mulpp (e->distance, ii)); + if (e->data_type == MEM_DEP) + { + start = MAX (start, earliest); + if (dump_file) + fprintf (dump_file, "%11d %11s %11s %11s", + earliest, "", "", ""); + } + else + { + early_start = MAX (early_start, earliest); + if (dump_file) + fprintf (dump_file, "%11s %11d %11s %11s", + "", earliest, "", ""); + } if (dump_file) { - fprintf (dump_file, "%11s %11d %11s %11d %5d", - "", earliest, "", latest, p_st); + fprintf (dump_file, " %5d", p_st); print_ddg_edge (dump_file, e); fprintf (dump_file, "\n"); } - early_start = MAX (early_start, earliest); - end = MIN (end, latest); - if (e->type == TRUE_DEP && e->data_type == REG_DEP) count_preds++; } } - /* Calculate late_start and limit start. Both bounds are inclusive. */ + /* Calculate late_start and limit end. Both bounds are inclusive. */ if (pss_not_empty) for (e = u_node->out; e != 0; e = e->next_out) { @@ -1933,20 +1996,30 @@ get_sched_window (partial_schedule_ptr p if (TEST_BIT (sched_nodes, v)) { int s_st = SCHED_TIME (v); - int earliest = (e->data_type == MEM_DEP ? s_st - ii + 1 : INT_MIN); - int latest = s_st - e->latency + (e->distance * ii); + int latest = sat_addsp (sat_subsp (s_st, e->latency), + sat_mulpp (e->distance, ii)); + if (e->data_type == MEM_DEP) + { + end = MIN (end, latest); + if (dump_file) + fprintf (dump_file, "%11s %11s %11s %11d", + "", "", "", latest); + } + else + { + late_start = MIN (late_start, latest); + if (dump_file) + fprintf (dump_file, "%11s %11s %11d %11s", + "", "", latest, ""); + } if (dump_file) { - fprintf (dump_file, "%11d %11s %11d %11s %5d", - earliest, "", latest, "", s_st); + fprintf (dump_file, " %5d", s_st); print_ddg_edge (dump_file, e); fprintf (dump_file, "\n"); } - start = MAX (start, earliest); - late_start = MIN (late_start, latest); - if (e->type == TRUE_DEP && e->data_type == REG_DEP) count_succs++; } @@ -1963,14 +2036,22 @@ get_sched_window (partial_schedule_ptr p /* Get a target scheduling window no bigger than ii. */ if (early_start == INT_MIN && late_start == INT_MAX) - early_start = NODE_ASAP (u_node); - else if (early_start == INT_MIN) - early_start = late_start - (ii - 1); - late_start = MIN (late_start, early_start + (ii - 1)); - - /* Apply memory dependence limits. */ - start = MAX (start, early_start); - end = MIN (end, late_start); + { + /* The default window (as given in the paper) is based on + the node's ASAP value, but shift or shrink it as necessary + in order to honor memory dependencies. */ + early_start = MIN (NODE_ASAP (u_node), end - (ii - 1)); + start = MAX (early_start, start); + } + else + { + end = MIN (end, late_start); + if (early_start == INT_MIN) + start = MAX (start, end - (ii - 1)); + else + start = MAX (start, early_start); + } + end = MIN (end, start + (ii - 1)); if (dump_file && (psp_not_empty || pss_not_empty)) fprintf (dump_file, "%11s %11d %11d %11s %5s final window\n", @@ -2060,8 +2141,8 @@ calculate_must_precede_follow (ddg_node_ SCHED_TIME (e->src) - (e->distance * ii) == first_cycle_in_window */ for (e = u_node->in; e != 0; e = e->next_in) if (TEST_BIT (sched_nodes, e->src->cuid) - && ((SCHED_TIME (e->src->cuid) - (e->distance * ii)) == - first_cycle_in_window)) + && (sat_subsp (SCHED_TIME (e->src->cuid), sat_mulpp (e->distance, ii)) + == first_cycle_in_window)) { if (dump_file) fprintf (dump_file, "%d ", e->src->cuid); @@ -2371,7 +2452,8 @@ compute_split_row (sbitmap sched_nodes, int v = e->src->cuid; if (TEST_BIT (sched_nodes, v) - && (low == SCHED_TIME (v) + e->latency - (e->distance * ii))) + && low == sat_subsp (sat_addsp (SCHED_TIME (v), e->latency), + sat_mulpp (e->distance, ii))) if (SCHED_TIME (v) > lower) { crit_pred = v;

13 years, 6 months

1
0
0 0

Update to lp:~rsandifo/+junk/loop-microbenchmarks

by Richard Sandiford

About three months ago, 4.7 stopped being able to optimise things like: int *__restrict x = ...; The (libav) loop microbenchmarks that I'd written used this construct a lot, as an easy way of automatically generating a whole function from a loop kernel. I spent a while testing 4.7 with the restrict patch reverted, while I caught up with my post-holiday email backlog and saw whether the effect on this code was deliberate. I eventually realised it was, so implemented a change that Ira had suggested: splitting out a peak_loop_1 that takes all the restrict pointers as arguments. I just realised that I never pushed that change back up to bzr, so I've done it now. Probably a write-only change, since I doubt anyone's going to be using the benchmark again, but just in case :-) Richard

13 years, 6 months

1
0
0 0

Patch drop: auto-inc-dec.c rewrite

by Richard Sandiford

This is my current 4.7 auto-inc-dec.c patch. I submitted an RFC in July: http://article.gmane.org/gmane.comp.gcc.patches/241779/ and updated the patch in line with the feedback I got. Steven Bosscher sent some very useful comments in private email, so the update deals with those as well as Bernd's public ones. If we do go ahead with this rewrite, it depends on the A9 pipeline description changes. I submitted some A8 and A9 changes here: http://article.gmane.org/gmane.comp.gcc.patches/244238/ http://article.gmane.org/gmane.comp.gcc.patches/244242/ but because I later noticed that the A9 didn't behave quite as I thought, I decided not to apply them. Ramana asked around internally about what the A9 actually does (thanks) and had some ideas. The patch also relies on the MEM rtx_costs patch that I just posted: http://lists.linaro.org/pipermail/linaro-toolchain/2011-December/001944.html Richard gcc/ * Makefile.in (auto-inc-dec.o): Depends on $(OPTABS_H) and addresses.h. * auto-inc-dec.c: Rewrite. Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2011-12-07 11:43:29.549238252 +0000 +++ gcc/Makefile.in 2011-12-29 09:24:51.066303201 +0000 @@ -3145,7 +3145,8 @@ alloc-pool.o : alloc-pool.c $(CONFIG_H) auto-inc-dec.o : auto-inc-dec.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(TREE_H) $(RTL_H) $(TM_P_H) hard-reg-set.h $(BASIC_BLOCK_H) insn-config.h \ $(REGS_H) $(FLAGS_H) output.h $(FUNCTION_H) $(EXCEPT_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) \ - $(EXPR_H) $(TIMEVAR_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) + $(EXPR_H) $(TIMEVAR_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \ + $(OPTABS_H) addresses.h cfg.o : cfg.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(FLAGS_H) \ $(REGS_H) hard-reg-set.h output.h $(DIAGNOSTIC_CORE_H) $(FUNCTION_H) $(EXCEPT_H) $(GGC_H) \ $(TM_P_H) $(TIMEVAR_H) $(OBSTACK_H) $(TREE_H) alloc-pool.h \

13 years, 6 months

1
0
0 0

Patch drop: Rework MEM rtx_costs

by Richard Sandiford

I originally wrote this patch as part of the auto-inc-dec work. I didn't submit it because I wasn't sure what value of extra_writeback_latency was appropriate for A9. (I was hoping to crib it from Ramana's pipeline description.) The patch introduces three new fields to the costs structure: one to control the latency of core loads, one to control the latency of NEON loads, and one to control the penalty of address writeback. The patch includes a tweak for cases where we use two VLDRs. That part should obviously be dropped if we change the move patterns to use something else. Richard gcc/ * config/arm/arm-protos.h (tune_params): Add core_mem_latency, neon_mem_latency and extra_writeback_latency. * config/arm/arm.c (arm_slowmul_tune, arm_fastmul_tune) (arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune) (arm_cortex_tune, arm_cortex_a5_tune, arm_cortex_a9_tune) (arm_fa726te_tune): Populate the new tune_params fields. (arm_mem_cost): New function. (arm_rtx_costs_1): Use it. Index: gcc/config/arm/arm-protos.h =================================================================== --- gcc/config/arm/arm-protos.h 2011-08-09 15:01:14.000000000 +0100 +++ gcc/config/arm/arm-protos.h 2011-08-09 15:04:58.121984034 +0100 @@ -236,6 +236,9 @@ struct tune_params int l1_cache_size; int l1_cache_line_size; bool prefer_constant_pool; + int core_mem_latency; + int neon_mem_latency; + int extra_writeback_latency; int (*branch_cost) (bool, bool); }; Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c 2011-08-09 15:01:14.000000000 +0100 +++ gcc/config/arm/arm.c 2011-08-09 15:07:07.215103271 +0100 @@ -840,6 +840,9 @@ const struct tune_params arm_slowmul_tun 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -851,6 +854,9 @@ const struct tune_params arm_fastmul_tun 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -865,6 +871,9 @@ const struct tune_params arm_strongarm_t 3, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -876,6 +885,9 @@ const struct tune_params arm_xscale_tune 3, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -887,6 +899,9 @@ const struct tune_params arm_9e_tune = 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -898,6 +913,9 @@ const struct tune_params arm_v6t2_tune = 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, false, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -910,6 +928,9 @@ const struct tune_params arm_cortex_tune 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, false, /* Prefer constant pool. */ + 2, + 2, + 1, arm_default_branch_cost }; @@ -924,6 +945,9 @@ const struct tune_params arm_cortex_a5_t 1, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, false, /* Prefer constant pool. */ + 2, + 2, + 0, arm_cortex_a5_branch_cost }; @@ -935,6 +959,9 @@ const struct tune_params arm_cortex_a9_t 5, /* Max cond insns. */ ARM_PREFETCH_BENEFICIAL(4,32,32), false, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -946,6 +973,9 @@ const struct tune_params arm_fa726te_tun 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ + 2, + 2, + 0, arm_default_branch_cost }; @@ -6848,6 +6878,41 @@ thumb1_rtx_costs (rtx x, enum rtx_code c } } +/* Return the cost in insns of a memory reference of mode MODE to + address ADDR. */ + +static int +arm_mem_cost (enum machine_mode mode, rtx addr) +{ + int count, base; + + count = ARM_NUM_REGS (mode); + if (TARGET_NEON + && (VALID_NEON_DREG_MODE (mode) + || VALID_NEON_QREG_MODE (mode) + || VALID_NEON_STRUCT_MODE (mode))) + { + base = current_tune->neon_mem_latency; + + if (count == 4 && (GET_CODE (addr) == PLUS || CONSTANT_P (addr))) + /* In this case we use two VLDRs. */ + return COSTS_N_INSNS (base + 2); + + /* Assume that one quad can be accessed each cycle. */ + return COSTS_N_INSNS (base + (count + 3) / 4); + } + + base = current_tune->core_mem_latency; + + if (count == 1 && GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC) + /* On some targets (like A8), core accesses chained by address + register writeback cannot issue in consecutive cycles. + Pessimize writeback to account for this. */ + base += current_tune->extra_writeback_latency; + + return COSTS_N_INSNS (base + count); +} + static inline bool arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed) { @@ -6860,9 +6925,7 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou switch (code) { case MEM: - /* Memory costs quite a lot for the first word, but subsequent words - load at the equivalent of a single insn each. */ - *total = COSTS_N_INSNS (2 + ARM_NUM_REGS (mode)); + *total = arm_mem_cost (mode, XEXP (x, 0)); return true; case DIV:

13 years, 6 months

1
0
0 0

Goodbye

by Ira Rosen

Hi, Thank you all for an interesting and pleasant experience. I am very grateful to Linaro for the opportunity to meet and work with such an amazing group of people. I wish you all the best, and hope to meet you again (at least online). You can find me at irar(a)il.ibm.com or ira.rsn(a)gmail.com. Ira

13 years, 6 months

1
0
0 0

[ACTIVITY] WW51

by Zhenqiang Chen

Summary: * Read armV7-A/R reference manual; crosstool-ng patches and wrapper scripts. Details: 1. Patches for crosstool-ng: * Fix symlink issue when CT_USE_SYSROOT is not enabled. * Update sample/linaro-arm-none-eabi (baremetal) to disable SYSROOT. So that both include and lib files are in the same dir. 2. Study armV7-A/R reference manual. 3. Validate embedded toolchain Dec. release. 4. Enhance the wrapper to use crosstool-ng for embedded toolchain. Plan: * Ramp-up on gcc. Best regards! -Zhenqiang

13 years, 6 months

1
0
0 0

[ACTIVITY] weekly status

by Revital Eres

Submitting patch for Bug #879725: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01459.html Looking at the performance results running SMS with automatic testing. This is my last week in Linaro so I would also like to thank you all for the interesting year -- it was a great experience for me to work in this project. I wish you all good luck and happy holidays! Revital

13 years, 6 months

1
0
0 0

[ACTIVITY] weekly status

by Ken Werner

Hi, OpenEmbedded-Core: * No response on the CSL patches I posted to the ml yet * khem says someone (other than me) needs to try them * Linaro binary toolchain * Runs on Oneiric-X86_64 after installing lsb-core (interpreter: /lib/ld-lsb.so.3) * The do_rootfs tasks fails with runtine dependecy issues when using the external-linaro-toolchain_arm-2011.11.bb recipe. When re-using my CSL 2011.03 recipe with the linaro toolchain the error doesn't show up - strange. * OE-Core build gets confused by the (arm-linux-gnueabi-)pkg-config of the external linaro toolchain. As a workaround I just renamed this script. * The qemuarm MACHINE configuration uses "-march=armv5te -mno-thumb" Since the linaro toolchain defaults to thumb and -mno-thumb has no effect some inline assemblies are failing (i.e. on the umull insn). GNU #47930 suggests using -marm instead -> OE-Core patch posted. * Got the core-image-minimal to build, but it doesn't run yet (I suspect some basic runtime dependencies like libc again) * The build of the sato image fails (seems libtool and/or C++ related - need to investigate) Regards Ken

13 years, 6 months

1
0
0 0

Fwd: [ACTIVITY] w51

by Asa Sandahl

Hi, * Continued with comparison of eembc results for gcc4.4 and gcc 4.6 (FSF and Linaro). Collecting results for 4.6 with loop-unrolling turned off. * Working on a plotbench.py script that will use matplotlib for plotting the results. Right now the script plots the geomean value, for instance for eembc. I now try to make it plot all subtest as well. Then it should also show relative improvements instead of just the numbers, and then also sorted from best to worse. This script depends on Michaels script libtabulate.py for transforming the tabulated file back to python records. * Will be back January 9 /Regards Åsa

13 years, 6 months

1
0
0 0

[ACTIVITY] Nov 19 - Dec 22

by Ulrich Weigand

== GDB == * Ongoing work on remote support for "info proc" and core file generation. Completed implementation of latest solution via accessing arbitrary files on the remote site, only to run into a fundamental design problem ... so it's probably back to the previous approach. Discussion on the list is ongoing. * Fixed a GDB 7.4 test suite failure on ARM: PR tdep/12797 * Fixed another GBD 7.4 test suite failure on ARM, by enabling pthread_t thread debugging on core files. == GCC == * Patch review week. Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

13 years, 6 months

1
0
0 0

cdce3.C execution fault

by Michael Hope

Hi there. I've looked further into the intermittent gcc/testsuite/g++.dg/cdce3.C test failures. Taking Ira's vectoriser-only fix-pr51301-4.6 branch and comparing it with it's predecessor r106845: * cdce3.o itself is identical across compilers * Fault occurs in a parallel test run as part of the normal auto build * Fault occurs every time * Fault occurs with a manual 'make check-gcc RUNTESTFLAGS="dg.exp=cdce*' * Fault doesn't occur when building from the command line * Fault doesn't occur after updating binutils I'm suspicious of the linker. The auto builders are Natty based and come with ld 2.21.0.20110327. Updating them to Oneiric's 2.21.53.20110810 clears the problem. I've saved the build trees. I see no reason not to commit ~ramana/gcc-linaro/fix-lp-900426 and ~irar/gcc-linaro/fix-pr51301-4.6. -- Michael

13 years, 6 months

3
4
0 0

[ACTIVITY] Nov 12 - Dec 16

by Ulrich Weigand

== GDB == * Ongoing work on remote support for "info proc" and core file generation. Implemented initial version of latest solution via accessing arbitrary files on the remote site. == GCC == * Started familiarizing myself with current status of various performance patches in programm, in preparation of my taking on GCC performance work next year. Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

13 years, 6 months

1
0
0 0

[ACTIVITY] WW 50

by Zhenqiang Chen

Summary * make check-gcc on windows. * crosstool-ng patches Details: 1. Two patches for crosstool-ng: * Fix the compile error when CT_USE_SYSROOT is not "y". With this fix, we can config crosstool-ng to remove the symbol link for windows build. * Add scripts to build manual for newlib. 2. make check-gcc on Windows: * Wrap gcc/g++ for windows test. testglue.c should be compiled with gcc not g++. * Enhance scripts to convert path using "cygpath -w" 3. Analyze and root cause the pseudo new failed cases on windows. * gcc fail cases (gcc.dg/cpp/assert3.c, gcc.dg/cpp/include7.c and gcc.dg/cpp/trad/assert3.c) Root cause: " in options are removed in the test scripts. e.g. When reading gcc.log, you can find “-Aabc = jkl” in "Executing on host" as: Executing on host: …/cpp/assert3.c -A abc=def -A abc(ghi) "-Aabc = jkl" … But in spawn: the “” are removed. spawn …/cpp/assert3.c -A abc=def -A abc(ghi) -Aabc = jkl * g++ fail cases (dwarf2/lineno-simple1.C, dwarf2/pr44641.C and dwarf2/pr46527.C) The assembler codes generated from windows g++ and linux g++ are same except the PATH string. And all PASS on linux test. It seams the scripts can not grep the expected string on windows. * Tests on windows are not stable. For each test, there will have random fail cases (pass when retesting separately). Plans: * Create Makefile for embedded toolchain in linaro crosstool-ng. Best regards! -Zhenqiang

13 years, 6 months

1
0
0 0

Dependencies between Linaro toolchain and GDB/Linux Kernel

by Gibson, David

Hi, I was just wondering if anyone knows of any current or future dependencies with the Linaro toolchain (2011.11) and the Linaro release of GDB and the Linux Kernel? Is it considered safe to use the toolchain with the upstream releases of GDB and the Kernel, assuming that the versions of each are suitably compatible? Or are there potential dependencies on work that has been done in the toolchain? For example, new instructions supported in the compiler/assembler, or enhancements to the kernel/runtime library on the Linaro branch that would depend on them being in sync. Thanks, Dave.

13 years, 6 months

2
1
0 0

[ACTIVITY] weekly status

by Ken Werner

Hi, OpenEmbedded-Core: * the CSL 2011.03 recipe works with localization support disabled * got the OE-Core sato image to built (~250 source packages) * also built the Qt4 demo image (~100 source packages) to stress the C++ part of the toolchain * both are booting using qemu and seem to work just fine. * all of the Linaro members approved the request to contribute to OpenEmbedded * started to post patches onto the mailing list * briefly looked at the Linaro binary toolchain * the most recent one is dynamically linked * while the old one has old binutils (.21) that causes issue with --gc-sections * Currently the build is using the OE qemuarm machine configuration that uses a Yocto kernel and targets armv5te. This is something I'd like to look at too. Regards Ken

13 years, 6 months

2
1
0 0

[ACTIVITY] weekly status

by Revital Eres

Re-submitted the patch to estimate register pressure in SMS to the gcc-patches ml after discussing the patch with Richard.

13 years, 6 months

1
0
0 0

[ACTIVITY] 12th - 16th December

by Andrew Stubbs

Continued work on 64-bit shifts. - Improved 64-bit shifts without NEON (should benefit all cases). - Fixed bugs in constant shift code. - Rewrote 64-bit neon patch to take advantage of the new non-neon code, in the fall-back case. - Titied up the code, in general. - Rewrote SImode shift amount patch for new neon patches. The code produced now seems pretty good, but it still seems to choose which mode to use slightly haphazardly. The next step is to figure that out and benchmark the results. Had a few more attempts at getting the LAVA system to do something useful for me. I'm getting closer, but keep hitting new problems. Some of them my fault, and some are bugs in the system. Paul Larson has been very kindly helping me out and swatting the bugs. Didn't get much further with benchmarking for the generic tuning. This has been put on the back-burner while I work on the Neon shifts, and my test runs on my IGEPv2 A8 board have all been interrupted by power cuts or rendered useless by my forgetting to kill background tasks (such as Xorg). ---- Next weeks On vacation December 19th - January 3rd (returning January 4th).

13 years, 6 months

1
0
0 0

[ACTIVITY] 2011-12-16

by David Gilbert

== General == * Tidying things up and updating my list of statuses == String routines == * Adding strchr and strlen to eglibc; tests running at the moment. Dave

13 years, 6 months

1
0
0 0

[ACTIVITY] report week 50

by Peter Maydell

[short week, three days] RAG: Red: Amber: Green: Current Milestones: || || Planned || Estimate || Actual || ||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 || ||cp15-rework || 2012-01-06 || 2012-01-17 || || ||initial-a15-system-model || 2012-01-27 || 2012-01-27 || || ||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?|| || (for blueprint definitions: https://wiki.linaro.org/PeterMaydell/QemuKVM) Historical Milestones: ||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 || ||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 || ||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 || == cp15-rework == * estimate pushed back a bit because I've ended up doing this in parallel with the other blueprints. Also exynos4210 review has taken some time. == upstream-omap3-cleanup == * split up the last handful of patches in the stack which were doing several things at once * this blueprint is now complete, meaning that the next stage of omap3 upstreaming will be cleaning up individual subsets of functionality to send upstream. This is all backburner level priority, though. == other == * reviewed most of Samsung's exynos4210 model * completed a conflict-heavy rebase of qemu-linaro (the result of MemoryRegion conversions for omap devices landing upstream) * LP:903239 : added linux-user support for some missing xattr syscalls that were causing build problems for apparmor -- PMM

13 years, 6 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain