- linaro-toolchain - lists.linaro.org

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Factor predidacte analysis out of tree-ssa-uninit.c into its own module.: commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Factor predidacte analysis out of tree-ssa-uninit.c into its own module. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6240 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6999 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-mainline-defconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… Reproduce builds: <cut> mkdir investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 cd investigate-gcc-94c12ffac234b29a702aa7b6730f2678265857c8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 94c12ffac234b29a702aa7b6730f2678265857c8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 51166eb2c534692c3c7779def24f83c8c3811b98 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 94c12ffac234b29a702aa7b6730f2678265857c8 Author: Martin Sebor <msebor(a)redhat.com> Date: Fri Sep 17 15:39:13 2021 -0600 Factor predidacte analysis out of tree-ssa-uninit.c into its own module. gcc/ChangeLog: * Makefile.in (OBJS): Add gimple-predicate-analysis.o. * tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis. (MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same. (check_defs): Add comment. (can_skip_redundant_opnd): Update comment. (compute_uninit_opnds_pos): Adjust to namespace change. (find_pdom): Move to gimple-predicate-analysis.cc. (find_dom): Same. (struct uninit_undef_val_t): New. (is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc. (find_control_equiv_block): Same. (MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same. (MAX_SWITCH_CASES): Same. (compute_control_dep_chain): Same. (find_uninit_use): Use predicate analyzer. (struct pred_info): Move to gimple-predicate-analysis. (convert_control_dep_chain_into_preds): Same. (find_predicates): Same. (collect_phi_def_edges): Same. (warn_uninitialized_phi): Use predicate analyzer. (find_def_preds): Move to gimple-predicate-analysis. (dump_pred_info): Same. (dump_pred_chain): Same. (dump_predicates): Same. (destroy_predicate_vecs): Remove. (execute_late_warn_uninitialized): New. (get_cmp_code): Move to gimple-predicate-analysis. (is_value_included_in): Same. (value_sat_pred_p): Same. (find_matching_predicate_in_rest_chains): Same. (is_use_properly_guarded): Same. (prune_uninit_phi_opnds): Same. (find_var_cmp_const): Same. (use_pred_not_overlap_with_undef_path_pred): Same. (pred_equal_p): Same. (is_neq_relop_p): Same. (is_neq_zero_form_p): Same. (pred_expr_equal_p): Same. (is_pred_expr_subset_of): Same. (is_pred_chain_subset_of): Same. (is_included_in): Same. (is_superset_of): Same. (pred_neg_p): Same. (simplify_pred): Same. (simplify_preds_2): Same. (simplify_preds_3): Same. (simplify_preds_4): Same. (simplify_preds): Same. (push_pred): Same. (push_to_worklist): Same. (get_pred_info_from_cmp): Same. (is_degenerated_phi): Same. (normalize_one_pred_1): Same. (normalize_one_pred): Same. (normalize_one_pred_chain): Same. (normalize_preds): Same. (can_one_predicate_be_invalidated_p): Same. (can_chain_union_be_invalidated_p): Same. (uninit_uses_cannot_happen): Same. (pass_late_warn_uninitialized::execute): Define. * gimple-predicate-analysis.cc: New file. * gimple-predicate-analysis.h: New file. --- gcc/Makefile.in | 1 + gcc/gimple-predicate-analysis.cc | 2400 +++++++++++++++++++++++++++++++++++++ gcc/gimple-predicate-analysis.h | 158 +++ gcc/tree-ssa-uninit.c | 2431 +++----------------------------------- 4 files changed, 2741 insertions(+), 2249 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index b8229adf580..f36ffa4740b 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1394,6 +1394,7 @@ OBJS = \ gimple-loop-jam.o \ gimple-loop-versioning.o \ gimple-low.o \ + gimple-predicate-analysis.o \ gimple-pretty-print.o \ gimple-range.o \ gimple-range-cache.o \ diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc new file mode 100644 index 00000000000..3404f2d630a --- /dev/null +++ b/gcc/gimple-predicate-analysis.cc @@ -0,0 +1,2400 @@ +/* Support for simple predicate analysis. + + Copyright (C) 2001-2021 Free Software Foundation, Inc. + Contributed by Xinliang David Li <davidxl(a)google.com> + Generalized by Martin Sebor <msebor(a)redhat.com> + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#define INCLUDE_STRING +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-pretty-print.h" +#include "diagnostic-core.h" +#include "fold-const.h" +#include "gimple-iterator.h" +#include "tree-ssa.h" +#include "tree-cfg.h" +#include "cfghooks.h" +#include "attribs.h" +#include "builtins.h" +#include "calls.h" +#include "value-query.h" + +#include "gimple-predicate-analysis.h" + +#define DEBUG_PREDICATE_ANALYZER 1 + +/* Find the immediate postdominator of the specified basic block BB. */ + +static inline basic_block +find_pdom (basic_block bb) +{ + basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (cfun); + if (bb == exit_bb) + return exit_bb; + + if (basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb)) + return pdom; + + return exit_bb; +} + +/* Find the immediate dominator of the specified basic block BB. */ + +static inline basic_block +find_dom (basic_block bb) +{ + basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + if (bb == entry_bb) + return entry_bb; + + if (basic_block dom = get_immediate_dominator (CDI_DOMINATORS, bb)) + return dom; + + return entry_bb; +} + +/* Return true if BB1 is postdominating BB2 and BB1 is not a loop exit + bb. The loop exit bb check is simple and does not cover all cases. */ + +static bool +is_non_loop_exit_postdominating (basic_block bb1, basic_block bb2) +{ + if (!dominated_by_p (CDI_POST_DOMINATORS, bb2, bb1)) + return false; + + if (single_pred_p (bb1) && !single_succ_p (bb2)) + return false; + + return true; +} + +/* Find BB's closest postdominator that is its control equivalent (i.e., + that's controlled by the same predicate). */ + +static inline basic_block +find_control_equiv_block (basic_block bb) +{ + basic_block pdom = find_pdom (bb); + + /* Skip the postdominating bb that is also a loop exit. */ + if (!is_non_loop_exit_postdominating (pdom, bb)) + return NULL; + + /* If the postdominator is dominated by BB, return it. */ + if (dominated_by_p (CDI_DOMINATORS, pdom, bb)) + return pdom; + + return NULL; +} + +/* Return true if X1 is the negation of X2. */ + +static inline bool +pred_neg_p (const pred_info &x1, const pred_info &x2) +{ + if (!operand_equal_p (x1.pred_lhs, x2.pred_lhs, 0) + || !operand_equal_p (x1.pred_rhs, x2.pred_rhs, 0)) + return false; + + tree_code c1 = x1.cond_code, c2; + if (x1.invert == x2.invert) + c2 = invert_tree_comparison (x2.cond_code, false); + else + c2 = x2.cond_code; + + return c1 == c2; +} + +/* Return whether the condition (VAL CMPC BOUNDARY) is true. */ + +static bool +is_value_included_in (tree val, tree boundary, tree_code cmpc) +{ + /* Only handle integer constant here. */ + if (TREE_CODE (val) != INTEGER_CST || TREE_CODE (boundary) != INTEGER_CST) + return true; + + bool inverted = false; + if (cmpc == GE_EXPR || cmpc == GT_EXPR || cmpc == NE_EXPR) + { + cmpc = invert_tree_comparison (cmpc, false); + inverted = true; + } + + bool result; + if (cmpc == EQ_EXPR) + result = tree_int_cst_equal (val, boundary); + else if (cmpc == LT_EXPR) + result = tree_int_cst_lt (val, boundary); + else + { + gcc_assert (cmpc == LE_EXPR); + result = tree_int_cst_le (val, boundary); + } + + if (inverted) + result ^= 1; + + return result; +} + +/* Format the vector of edges EV as a string. */ + +static std::string +format_edge_vec (const vec<edge> &ev) +{ + std::string str; + + unsigned n = ev.length (); + for (unsigned i = 0; i < n; ++i) + { + char es[32]; + const_edge e = ev[i]; + sprintf (es, "%u", e->src->index); + str += es; + if (i + 1 < n) + str += " -> "; + } + return str; +} + +/* Format the first N elements of the array of vector of edges EVA as + a string. */ + +static std::string +format_edge_vecs (const vec<edge> eva[], unsigned n) +{ + std::string str; + + for (unsigned i = 0; i != n; ++i) + { + str += '{'; + str += format_edge_vec (eva[i]); + str += '}'; + if (i + 1 < n) + str += ", "; + } + return str; +} + +/* Dump a single pred_info to DUMP_FILE. */ + +static void +dump_pred_info (const pred_info &pred) +{ + if (pred.invert) + fprintf (dump_file, "NOT ("); + print_generic_expr (dump_file, pred.pred_lhs); + fprintf (dump_file, " %s ", op_symbol_code (pred.cond_code)); + print_generic_expr (dump_file, pred.pred_rhs); + if (pred.invert) + fputc (')', dump_file); +} + +/* Dump a pred_chain to DUMP_FILE. */ + +static void +dump_pred_chain (const pred_chain &chain) +{ + unsigned np = chain.length (); + if (np > 1) + fprintf (dump_file, "AND ("); + + for (unsigned j = 0; j < np; j++) + { + dump_pred_info (chain[j]); + if (j < np - 1) + fprintf (dump_file, ", "); + else if (j > 0) + fputc (')', dump_file); + } +} + +/* Dump the predicate chain PREDS for STMT, prefixed by MSG. */ + +static void +dump_predicates (gimple *stmt, const pred_chain_union &preds, const char *msg) +{ + fprintf (dump_file, "%s", msg); + if (stmt) + { + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "is guarded by:\n"); + } + + unsigned np = preds.length (); + if (np > 1) + fprintf (dump_file, "OR ("); + for (unsigned i = 0; i < np; i++) + { + dump_pred_chain (preds[i]); + if (i < np - 1) + fprintf (dump_file, ", "); + else if (i > 0) + fputc (')', dump_file); + } + fputc ('\n', dump_file); +} + +/* Dump the first NCHAINS elements of the DEP_CHAINS array into DUMP_FILE. */ + +static void +dump_dep_chains (const auto_vec<edge> dep_chains[], unsigned nchains) +{ + if (!dump_file) + return; + + for (unsigned i = 0; i != nchains; ++i) + { + const auto_vec<edge> &v = dep_chains[i]; + unsigned n = v.length (); + for (unsigned j = 0; j != n; ++j) + { + fprintf (dump_file, "%u", v[j]->src->index); + if (j + 1 < n) + fprintf (dump_file, " -> "); + } + fputc ('\n', dump_file); + } +} + +/* Return the 'normalized' conditional code with operand swapping + and condition inversion controlled by SWAP_COND and INVERT. */ + +static tree_code +get_cmp_code (tree_code orig_cmp_code, bool swap_cond, bool invert) +{ + tree_code tc = orig_cmp_code; + + if (swap_cond) + tc = swap_tree_comparison (orig_cmp_code); + if (invert) + tc = invert_tree_comparison (tc, false); + + switch (tc) + { + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + break; + default: + return ERROR_MARK; + } + return tc; +} + +/* Return true if PRED is common among all predicate chains in PREDS + (and therefore can be factored out). */ + +static bool +find_matching_predicate_in_rest_chains (const pred_info &pred, + const pred_chain_union &preds) +{ + /* Trival case. */ + if (preds.length () == 1) + return true; + + for (unsigned i = 1; i < preds.length (); i++) + { + bool found = false; + const pred_chain &chain = preds[i]; + unsigned n = chain.length (); + for (unsigned j = 0; j < n; j++) + { + const pred_info &pred2 = chain[j]; + /* Can relax the condition comparison to not use address + comparison. However, the most common case is that + multiple control dependent paths share a common path + prefix, so address comparison should be ok. */ + if (operand_equal_p (pred2.pred_lhs, pred.pred_lhs, 0) + && operand_equal_p (pred2.pred_rhs, pred.pred_rhs, 0) + && pred2.invert == pred.invert) + { + found = true; + break; + } + } + if (!found) + return false; + } + return true; +} + +/* Find a predicate to examine against paths of interest. If there + is no predicate of the "FLAG_VAR CMP CONST" form, try to find one + of that's the form "FLAG_VAR CMP FLAG_VAR" with value range info. + PHI is the phi node whose incoming (interesting) paths need to be + examined. On success, return the comparison code, set defintion + gimple of FLAG_DEF and BOUNDARY_CST. Otherwise return ERROR_MARK. */ + +static tree_code +find_var_cmp_const (pred_chain_union preds, gphi *phi, gimple **flag_def, + tree *boundary_cst) +{ + tree_code vrinfo_code = ERROR_MARK; + gimple *vrinfo_def = NULL; + tree vrinfo_cst = NULL; + + gcc_assert (preds.length () > 0); + pred_chain chain = preds[0]; + for (unsigned i = 0; i < chain.length (); i++) + { + bool use_vrinfo_p = false; + const pred_info &pred = chain[i]; + tree cond_lhs = pred.pred_lhs; + tree cond_rhs = pred.pred_rhs; + if (cond_lhs == NULL_TREE || cond_rhs == NULL_TREE) + continue; + + tree_code code = get_cmp_code (pred.cond_code, false, pred.invert); + if (code == ERROR_MARK) + continue; + + /* Convert to the canonical form SSA_NAME CMP CONSTANT. */ + if (TREE_CODE (cond_lhs) == SSA_NAME + && is_gimple_constant (cond_rhs)) + ; + else if (TREE_CODE (cond_rhs) == SSA_NAME + && is_gimple_constant (cond_lhs)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + /* Check if we can take advantage of FLAG_VAR COMP FLAG_VAR predicate + with value range info. Note only first of such case is handled. */ + else if (vrinfo_code == ERROR_MARK + && TREE_CODE (cond_lhs) == SSA_NAME + && TREE_CODE (cond_rhs) == SSA_NAME) + { + gimple* lhs_def = SSA_NAME_DEF_STMT (cond_lhs); + if (!lhs_def || gimple_code (lhs_def) != GIMPLE_PHI + || gimple_bb (lhs_def) != gimple_bb (phi)) + { + std::swap (cond_lhs, cond_rhs); + if ((code = get_cmp_code (code, true, false)) == ERROR_MARK) + continue; + } + + /* Check value range info of rhs, do following transforms: + flag_var < [min, max] -> flag_var < max + flag_var > [min, max] -> flag_var > min + + We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: + flag_var <= [min, max] -> flag_var < [min, max+1] + flag_var >= [min, max] -> flag_var > [min-1, max] + if no overflow/wrap. */ + tree type = TREE_TYPE (cond_lhs); + value_range r; + if (!INTEGRAL_TYPE_P (type) + || !get_range_query (cfun)->range_of_expr (r, cond_rhs) + || r.kind () != VR_RANGE) + continue; + + wide_int min = r.lower_bound (); + wide_int max = r.upper_bound (); + if (code == LE_EXPR + && max != wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = LT_EXPR; + max = max + 1; + } + if (code == GE_EXPR + && min != wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type))) + { + code = GT_EXPR; + min = min - 1; + } + if (code == LT_EXPR) + cond_rhs = wide_int_to_tree (type, max); + else if (code == GT_EXPR) + cond_rhs = wide_int_to_tree (type, min); + else + continue; + + use_vrinfo_p = true; + } + else + continue; + + if ((*flag_def = SSA_NAME_DEF_STMT (cond_lhs)) == NULL) + continue; + + if (gimple_code (*flag_def) != GIMPLE_PHI + || gimple_bb (*flag_def) != gimple_bb (phi) + || !find_matching_predicate_in_rest_chains (pred, preds)) + continue; + + /* Return if any "flag_var comp const" predicate is found. */ + if (!use_vrinfo_p) + { + *boundary_cst = cond_rhs; + return code; + } + /* Record if any "flag_var comp flag_var[vinfo]" predicate is found. */ + else if (vrinfo_code == ERROR_MARK) + { + vrinfo_code = code; + vrinfo_def = *flag_def; + vrinfo_cst = cond_rhs; + } + } + /* Return the "flag_var cmp flag_var[vinfo]" predicate we found. */ + if (vrinfo_code != ERROR_MARK) + { + *flag_def = vrinfo_def; + *boundary_cst = vrinfo_cst; + } + return vrinfo_code; +} + +/* Return true if all interesting opnds are pruned, false otherwise. + PHI is the phi node with interesting operands, OPNDS is the bitmap + of the interesting operand positions, FLAG_DEF is the statement + defining the flag guarding the use of the PHI output, BOUNDARY_CST + is the const value used in the predicate associated with the flag, + CMP_CODE is the comparison code used in the predicate, VISITED_PHIS + is the pointer set of phis visited, and VISITED_FLAG_PHIS is + the pointer to the pointer set of flag definitions that are also + phis. + + Example scenario: + + BB1: + flag_1 = phi <0, 1> // (1) + var_1 = phi <undef, some_val> + + + BB2: + flag_2 = phi <0, flag_1, flag_1> // (2) + var_2 = phi <undef, var_1, var_1> + if (flag_2 == 1) + goto BB3; + + BB3: + use of var_2 // (3) + + Because some flag arg in (1) is not constant, if we do not look into + the flag phis recursively, it is conservatively treated as unknown and + var_1 is thought to flow into use at (3). Since var_1 is potentially + uninitialized a false warning will be emitted. + Checking recursively into (1), the compiler can find out that only + some_val (which is defined) can flow into (3) which is OK. */ + +static bool +prune_phi_opnds (gphi *phi, unsigned opnds, gphi *flag_def, + tree boundary_cst, tree_code cmp_code, + predicate::func_t &eval, + hash_set<gphi *> *visited_phis, + bitmap *visited_flag_phis) +{ + /* The Boolean predicate guarding the PHI definition. Initialized + lazily from PHI in the first call to is_use_guarded() and cached + for subsequent iterations. */ + predicate def_preds (eval); + + unsigned n = MIN (eval.max_phi_args, gimple_phi_num_args (flag_def)); + for (unsigned i = 0; i < n; i++) + { + if (!MASK_TEST_BIT (opnds, i)) + continue; + + tree flag_arg = gimple_phi_arg_def (flag_def, i); + if (!is_gimple_constant (flag_arg)) + { + if (TREE_CODE (flag_arg) != SSA_NAME) + return false; + + gphi *flag_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (flag_arg)); + if (!flag_arg_def) + return false; + + tree phi_arg = gimple_phi_arg_def (phi, i); + if (TREE_CODE (phi_arg) != SSA_NAME) + return false; + + gphi *phi_arg_def = dyn_cast<gphi *> (SSA_NAME_DEF_STMT (phi_arg)); + if (!phi_arg_def) + return false; + + if (gimple_bb (phi_arg_def) != gimple_bb (flag_arg_def)) + return false; + + if (!*visited_flag_phis) + *visited_flag_phis = BITMAP_ALLOC (NULL); + + tree phi_result = gimple_phi_result (flag_arg_def); + if (bitmap_bit_p (*visited_flag_phis, SSA_NAME_VERSION (phi_result))) + return false; + + bitmap_set_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + + /* Now recursively try to prune the interesting phi args. */ + unsigned opnds_arg_phi = eval.phi_arg_set (phi_arg_def); + if (!prune_phi_opnds (phi_arg_def, opnds_arg_phi, flag_arg_def, + boundary_cst, cmp_code, eval, visited_phis, + visited_flag_phis)) + return false; + + bitmap_clear_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result)); + continue; + } + + /* Now check if the constant is in the guarded range. */ + if (is_value_included_in (flag_arg, boundary_cst, cmp_code)) + { + /* Now that we know that this undefined edge is not pruned. + If the operand is defined by another phi, we can further + prune the incoming edges of that phi by checking + the predicates of this operands. */ + + tree opnd = gimple_phi_arg_def (phi, i); + gimple *opnd_def = SSA_NAME_DEF_STMT (opnd); + if (gphi *opnd_def_phi = dyn_cast <gphi *> (opnd_def)) + { + unsigned opnds2 = eval.phi_arg_set (opnd_def_phi); + if (!MASK_EMPTY (opnds2)) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + if (def_preds.is_use_guarded (phi, opnd_edge->src, + opnd_def_phi, opnds2, + visited_phis)) + return false; + } + } + else + return false; + } + } + + return true; +} + +/* Recursively compute the set PHI's incoming edges with "uninteresting" + operands of a phi chain, i.e., those for which EVAL returns false. + CD_ROOT is the control dependence root from which edges are collected + up the CFG nodes that it's dominated by. *EDGES holds the result, and + VISITED is used for detecting cycles. */ + +static void +collect_phi_def_edges (gphi *phi, basic_block cd_root, auto_vec<edge> *edges, + predicate::func_t &eval, hash_set<gimple *> *visited) +{ + if (visited->elements () == 0 + && DEBUG_PREDICATE_ANALYZER + && dump_file) + { + fprintf (dump_file, "%s for cd_root %u and ", + __func__, cd_root->index); + print_gimple_stmt (dump_file, phi, 0); + + } + + if (visited->add (phi)) + return; + + unsigned n = gimple_phi_num_args (phi); + for (unsigned i = 0; i < n; i++) + { + edge opnd_edge = gimple_phi_arg_edge (phi, i); + tree opnd = gimple_phi_arg_def (phi, i); + + if (TREE_CODE (opnd) == SSA_NAME) + { + gimple *def = SSA_NAME_DEF_STMT (opnd); + + if (gimple_code (def) == GIMPLE_PHI + && dominated_by_p (CDI_DOMINATORS, gimple_bb (def), cd_root)) + collect_phi_def_edges (as_a<gphi *> (def), cd_root, edges, eval, + visited); + else if (!eval (opnd)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + edges->safe_push (opnd_edge); + } + } + else + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "\tFound def edge %i -> %i for cd_root %i " + "and operand %u of: ", + opnd_edge->src->index, opnd_edge->dest->index, + cd_root->index, i); + print_gimple_stmt (dump_file, phi, 0); + } + + if (!eval (opnd)) + edges->safe_push (opnd_edge); + } + } +} + +/* Return an expression corresponding to the predicate PRED. */ + +static tree +build_pred_expr (const pred_info &pred) +{ + tree_code cond_code = pred.cond_code; + tree lhs = pred.pred_lhs; + tree rhs = pred.pred_rhs; + + if (pred.invert) + cond_code = invert_tree_comparison (cond_code, false); + + return build2 (cond_code, TREE_TYPE (lhs), lhs, rhs); +} + +/* Return an expression corresponding to PREDS. */ + +static tree +build_pred_expr (const pred_chain_union &preds, bool invert = false) +{ + tree_code code = invert ? TRUTH_AND_EXPR : TRUTH_OR_EXPR; + tree_code subcode = invert ? TRUTH_OR_EXPR : TRUTH_AND_EXPR; + + tree expr = NULL_TREE; + for (unsigned i = 0; i != preds.length (); ++i) + { + tree subexpr = NULL_TREE; + for (unsigned j = 0; j != preds[i].length (); ++j) + { + const pred_info &pi = preds[i][j]; + tree cond = build_pred_expr (pi); + if (invert) + cond = invert_truthvalue (cond); + subexpr = subexpr ? build2 (subcode, boolean_type_node, + subexpr, cond) : cond; + } + if (expr) + expr = build2 (code, boolean_type_node, expr, subexpr); + else + expr = subexpr; + } + + return expr; +} + +/* Return a bitset of all PHI arguments or zero if there are too many. */ + +unsigned +predicate::func_t::phi_arg_set (gphi *phi) +{ + unsigned n = gimple_phi_num_args (phi); + + if (max_phi_args < n) + return 0; + + /* Set the least significant N bits. */ + return (1U << n) - 1; +} + +/* Determine if the predicate set of the use does not overlap with that + of the interesting paths. The most common senario of guarded use is + in Example 1: + Example 1: + if (some_cond) + { + x = ...; // set x to valid + flag = true; + } + + ... some code ... + + if (flag) + use (x); // use when x is valid + + The real world examples are usually more complicated, but similar + and usually result from inlining: + + bool init_func (int * x) + { + if (some_cond) + return false; + *x = ...; // set *x to valid + return true; + } + + void foo (..) + { + int x; + + if (!init_func (&x)) + return; + + .. some_code ... + use (x); // use when x is valid + } + + Another possible use scenario is in the following trivial example: + + Example 2: + if (n > 0) + x = 1; + ... + if (n > 0) + { + if (m < 2) + ... = x; + } + + Predicate analysis needs to compute the composite predicate: + + 1) 'x' use predicate: (n > 0) .AND. (m < 2) + 2) 'x' default value (non-def) predicate: .NOT. (n > 0) + (the predicate chain for phi operand defs can be computed + starting from a bb that is control equivalent to the phi's + bb and is dominating the operand def.) + + and check overlapping: + (n > 0) .AND. (m < 2) .AND. (.NOT. (n > 0)) + <==> false + + This implementation provides a framework that can handle different + scenarios. (Note that many simple cases are handled properly without + the predicate analysis if jump threading eliminates the merge point + thus makes path-sensitive analysis unnecessary.) + + PHI is the phi node whose incoming (undefined) paths need to be + pruned, and OPNDS is the bitmap holding interesting operand + positions. VISITED is the pointer set of phi stmts being + checked. */ + +bool +predicate::overlap (gphi *phi, unsigned opnds, hash_set<gphi *> *visited) +{ + gimple *flag_def = NULL; + tree boundary_cst = NULL_TREE; + bitmap visited_flag_phis = NULL; + + /* Find within the common prefix of multiple predicate chains + a predicate that is a comparison of a flag variable against + a constant. */ + tree_code cmp_code = find_var_cmp_const (m_preds, phi, &flag_def, + &boundary_cst); + if (cmp_code == ERROR_MARK) + return true; + + /* Now check all the uninit incoming edges have a constant flag + value that is in conflict with the use guard/predicate. */ + gphi *phi_def = as_a<gphi *> (flag_def); + bool all_pruned = prune_phi_opnds (phi, opnds, phi_def, boundary_cst, + cmp_code, m_eval, visited, + &visited_flag_phis); + + if (visited_flag_phis) + BITMAP_FREE (visited_flag_phis); + + return !all_pruned; +} + +/* Return true if two predicates PRED1 and X2 are equivalent. Assume + the expressions have already properly re-associated. */ + +static inline bool +pred_equal_p (const pred_info &pred1, const pred_info &pred2) +{ + if (!operand_equal_p (pred1.pred_lhs, pred2.pred_lhs, 0) + || !operand_equal_p (pred1.pred_rhs, pred2.pred_rhs, 0)) + return false; + + tree_code c1 = pred1.cond_code, c2; + if (pred1.invert != pred2.invert + && TREE_CODE_CLASS (pred2.cond_code) == tcc_comparison) + c2 = invert_tree_comparison (pred2.cond_code, false); + else + c2 = pred2.cond_code; + + return c1 == c2; +} + +/* Return true if PRED tests inequality (i.e., X != Y). */ + +static inline bool +is_neq_relop_p (const pred_info &pred) +{ + + return ((pred.cond_code == NE_EXPR && !pred.invert) + || (pred.cond_code == EQ_EXPR && pred.invert)); +} + +/* Returns true if PRED is of the form X != 0. */ + +static inline bool +is_neq_zero_form_p (const pred_info &pred) +{ + if (!is_neq_relop_p (pred) || !integer_zerop (pred.pred_rhs) + || TREE_CODE (pred.pred_lhs) != SSA_NAME) + return false; + return true; +} + +/* Return true if PRED is equivalent to X != 0. */ + +static inline bool +pred_expr_equal_p (const pred_info &pred, tree expr) +{ + if (!is_neq_zero_form_p (pred)) + return false; + + return operand_equal_p (pred.pred_lhs, expr, 0); +} + +/* Return true if VAL satisfies (x CMPC BOUNDARY) predicate. CMPC can + be either one of the range comparison codes ({GE,LT,EQ,NE}_EXPR and + the like), or BIT_AND_EXPR. EXACT_P is only meaningful for the latter. + Modify the question from VAL & BOUNDARY != 0 to VAL & BOUNDARY == VAL. + For other values of CMPC, EXACT_P is ignored. */ + +static bool +value_sat_pred_p (tree val, tree boundary, tree_code cmpc, + bool exact_p = false) +{ + if (cmpc != BIT_AND_EXPR) + return is_value_included_in (val, boundary, cmpc); + + wide_int andw = wi::to_wide (val) & wi::to_wide (boundary); + if (exact_p) + return andw == wi::to_wide (val); + + return andw.to_uhwi (); +} + +/* Return true if the domain of single predicate expression PRED1 + is a subset of that of PRED2, and false if it cannot be proved. */ + +static bool +subset_of (const pred_info &pred1, const pred_info &pred2) +{ + if (pred_equal_p (pred1, pred2)) + return true; + </cut>

3 years, 9 months

2
1
0 0

clang-aarch64-full-2stage buildbot timeout

by Florian Hahn

Hi, It looks like a lot of the recent builds of clang-aarch64-full-2stage are timing out. E.g https://lab.llvm.org/buildbot/#/builders/179/builds/1078 while checking out sources https://lab.llvm.org/buildbot/#/builders/179/builds/1076 during building stage2 Is there anything that could be done to avoid such timeouts and avoid false positive failure emails? Cheers, Florian

3 years, 9 months

1
1
0 0

[TCWG CI] 403.gcc grew in size by 2% after llvm: Turn on the new pass manager by default

by ci_notify＠linaro.org

After llvm commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Turn on the new pass manager by default the following benchmarks grew in size by more than 1%: - 403.gcc grew in size by 2% from 2586180 to 2648252 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b ../artifacts/test.sh # Reproduce last_good build git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Date: Mon Jan 25 11:00:56 2021 -0800 Turn on the new pass manager by default This turns on the new pass manager by default for the optimization pipeline in Clang and ThinLTO in various LLD backends. This also makes uses of `opt -instcombine` use the new pass manager (unless specifically opted out). This does not affect the backend target-dependent codegen pipeline. If this causes regressions, you can opt out of the new pass manager either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag while building LLVM, or via various compiler flags, e.g. -flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for ELF LLD. Please file bugs for any regressions. Major differences: * The inliner works slightly differently * -O1 does some amount of inlining * LCSSA and LoopSimplify are run before all loop passes * Loop unswitching is implemented slightly differently * A new SpeculateAroundPHIs pass is added to the pipeline https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html Reviewed By: asbirlea, ychen, MaskRay, echristo Differential Revision: https://reviews.llvm.org/D95380 --- llvm/CMakeLists.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 1affc289e64b..f5298de9f7ca 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -688,8 +688,8 @@ else() endif() option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default}) -set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL - "Enable the experimental new pass manager by default.") +set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL + "Enable the new pass manager by default.") include(HandleLLVMOptions) </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by linux: parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled

by ci_notify＠linaro.org

[TCWG CI] Regression caused by linux: parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled: commit 9caea0007601d3bc6debec04f8b4cd6f4c2394be Author: Helge Deller <deller(a)gmx.de> parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 37 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 20151 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Baseline build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Even more details: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Reproduce builds: <cut> mkdir investigate-linux-9caea0007601d3bc6debec04f8b4cd6f4c2394be cd investigate-linux-9caea0007601d3bc6debec04f8b4cd6f4c2394be # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 9caea0007601d3bc6debec04f8b4cd6f4c2394be ../artifacts/test.sh # Reproduce last_good build git checkout --detach 31ad37bd6faf871c070650f72ac9488ceeeceeb0 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9caea0007601d3bc6debec04f8b4cd6f4c2394be Author: Helge Deller <deller(a)gmx.de> Date: Sun Sep 19 10:36:09 2021 -0700 parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled Linus noticed odd declaration rules for pci_iounmap() in iomap.h and pci_iomap.h, where it dependend on either NO_GENERIC_PCI_IOPORT_MAP or GENERIC_IOMAP when CONFIG_PCI was disabled. Testing on parisc seems to indicate that we need pci_iounmap() only when CONFIG_PCI is enabled, so the declaration of pci_iounmap() can be moved cleanly into pci_iomap.h in sync with the declarations of pci_iomap(). Link: https://lore.kernel.org/all/CAHk-=wjRrh98pZoQ+AzfWmsTZacWxTJKXZ9eKU2X_0+jM=… Signed-off-by: Helge Deller <deller(a)gmx.de> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Fixes: 97a29d59fc22 ("[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional") Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Guenter Roeck <linux(a)roeck-us.net> Cc: Ulrich Teichert <krypton(a)ulrich-teichert.org> Cc: James Bottomley <James.Bottomley(a)hansenpartnership.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> --- arch/parisc/lib/iomap.c | 4 +++- include/asm-generic/iomap.h | 10 ---------- include/asm-generic/pci_iomap.h | 3 +++ 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/arch/parisc/lib/iomap.c b/arch/parisc/lib/iomap.c index f03adb1999e7..367f6397bda7 100644 --- a/arch/parisc/lib/iomap.c +++ b/arch/parisc/lib/iomap.c @@ -513,12 +513,15 @@ void ioport_unmap(void __iomem *addr) } } +#ifdef CONFIG_PCI void pci_iounmap(struct pci_dev *dev, void __iomem * addr) { if (!INDIRECT_ADDR(addr)) { iounmap(addr); } } +EXPORT_SYMBOL(pci_iounmap); +#endif EXPORT_SYMBOL(ioread8); EXPORT_SYMBOL(ioread16); @@ -544,4 +547,3 @@ EXPORT_SYMBOL(iowrite16_rep); EXPORT_SYMBOL(iowrite32_rep); EXPORT_SYMBOL(ioport_map); EXPORT_SYMBOL(ioport_unmap); -EXPORT_SYMBOL(pci_iounmap); diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index 9b3eb6d86200..08237ae8b840 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -110,16 +110,6 @@ static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size) } #endif -#ifdef CONFIG_PCI -/* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */ -struct pci_dev; -extern void pci_iounmap(struct pci_dev *dev, void __iomem *); -#elif defined(CONFIG_GENERIC_IOMAP) -struct pci_dev; -static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) -{ } -#endif - #include <asm-generic/pci_iomap.h> #endif diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h index df636c6d8e6c..5a2f9bf53384 100644 --- a/include/asm-generic/pci_iomap.h +++ b/include/asm-generic/pci_iomap.h @@ -18,6 +18,7 @@ extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen); +extern void pci_iounmap(struct pci_dev *dev, void __iomem *); /* Create a virtual mapping cookie for a port on a given PCI device. * Do not call this directly, it exists to make it easier for architectures * to override */ @@ -50,6 +51,8 @@ static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, { return NULL; } +static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) +{ } #endif #endif /* __ASM_GENERIC_PCI_IOMAP_H */ </cut>

3 years, 9 months

2
1
0 0

[TCWG CI] 447.dealII:libstdc++.so.6.0.29 grew in size by 12% after gcc: libstdc++: Add floating-point std::to_chars implementation

by ci_notify＠linaro.org

After gcc commit 3c57e692357c79ee7623dfc1586652aee2aefb8f Author: Patrick Palka <ppalka(a)redhat.com> libstdc++: Add floating-point std::to_chars implementation the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%): - 447.dealII:libstdc++.so.6.0.29 grew in size by 12% from 1245370 to 1391240 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: arm-linux-gnueabihf - Compiler flags: -Os -mthumb - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f cd investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 3c57e692357c79ee7623dfc1586652aee2aefb8f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5033506993ef92589373270a8e8dbbf50e3ebef1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 3c57e692357c79ee7623dfc1586652aee2aefb8f Author: Patrick Palka <ppalka(a)redhat.com> Date: Thu Dec 17 23:11:34 2020 -0500 libstdc++: Add floating-point std::to_chars implementation This implements the floating-point std::to_chars overloads for float, double and long double. We use the Ryu library to compute the shortest round-trippable fixed and scientific forms for float, double and long double. We also use Ryu for performing explicit-precision fixed and scientific formatting for float and double. For explicit-precision formatting for long double we fall back to using printf. Hexadecimal formatting for float, double and long double is implemented from scratch. The supported long double binary formats are binary64, binary80 (x86 80-bit extended precision), binary128 and ibm128. Much of the complexity of the implementation is in computing the exact output length before handing it off to Ryu (which doesn't do bounds checking). In some cases it's hard to compute the output length beforehand, so in these cases we instead compute an upper bound on the output length and use a sufficiently-sized intermediate buffer only if necessary. Another source of complexity is in the general-with-precision formatting mode, where we need to do zero-trimming of the string returned by Ryu, and where we also take care to avoid having to format the number through Ryu a second time when the general formatting mode resolves to fixed (which we determine by doing a scientific formatting first and inspecting the scientific exponent). We avoid going through Ryu twice by instead transforming the scientific form to the corresponding fixed form via in-place string manipulation. This implementation is non-conforming in a couple of ways: 1. For the shortest hexadecimal formatting, we currently follow the Microsoft implementation's decision to be consistent with the output of printf's '%a' specifier at the expense of sometimes not printing the shortest representation. For example, the shortest hex form for the number 1.08p+0 is 2.1p-1, but we output the former instead of the latter, as does printf. 2. The Ryu routine generic_binary_to_decimal that we use for performing shortest formatting for large floating point types is implemented using the __int128 type, but some targets with a large long double type lack __int128 (e.g. i686), so we can't perform shortest formatting of long double on such targets through Ryu. As a temporary stopgap this patch makes the long double to_chars overloads just dispatch to the double overloads on these targets, which means we lose precision in the output. (We could potentially fix this by writing a specialized version of Ryu's generic_binary_to_decimal routine that uses uint64_t instead of __int128.) [Though I wonder if there's a better way to work around the lack of __int128 on i686 specifically?] 3. Our shortest formatting for __ibm128 doesn't guarantee the round-trip property if the difference between the high- and low-order exponent is large. This is because we treat __ibm128 as if it has a contiguous 105-bit mantissa by merging the mantissas of the high- and low-order parts (using code extracted from glibc), so we potentially lose precision from the low-order part. This seems to be consistent with how glibc printf formats __ibm128. libstdc++-v3/ChangeLog: * config/abi/pre/gnu.ver: Add new exports. * include/std/charconv (to_chars): Declare the floating-point overloads for float, double and long double. * src/c++17/Makefile.am (sources): Add floating_to_chars.cc. * src/c++17/Makefile.in: Regenerate. * src/c++17/floating_to_chars.cc: New file. (to_chars): Define for float, double and long double. * testsuite/20_util/to_chars/long_double.cc: New test. --- libstdc++-v3/config/abi/pre/gnu.ver | 7 + libstdc++-v3/include/std/charconv | 24 + libstdc++-v3/src/c++17/Makefile.am | 1 + libstdc++-v3/src/c++17/Makefile.in | 3 +- libstdc++-v3/src/c++17/floating_to_chars.cc | 1563 ++++++++++++++++++++ .../testsuite/20_util/to_chars/long_double.cc | 199 +++ 6 files changed, 1796 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 4b4bd8ab6da..05e0a512247 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -2393,6 +2393,13 @@ GLIBCXX_3.4.29 { # std::once_flag::_M_finish(bool) _ZNSt9once_flag9_M_finishEb; + # std::to_chars(char*, char*, [float|double|long double]) + _ZSt8to_charsPcS_[defg]; + # std::to_chars(char*, char*, [float|double|long double], chars_format) + _ZSt8to_charsPcS_[defg]St12chars_format; + # std::to_chars(char*, char*, [float|double|long double], chars_format, int) + _ZSt8to_charsPcS_[defg]St12chars_formati; + } GLIBCXX_3.4.28; # Symbols in the support library (libsupc++) have their own tag. diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index dd1ebdf8322..b57b0a16db2 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -702,6 +702,30 @@ namespace __detail chars_format __fmt = chars_format::general) noexcept; #endif + // Floating-point std::to_chars + + // Overloads for float. + to_chars_result to_chars(char* __first, char* __last, float __value) noexcept; + to_chars_result to_chars(char* __first, char* __last, float __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, float __value, + chars_format __fmt, int __precision) noexcept; + + // Overloads for double. + to_chars_result to_chars(char* __first, char* __last, double __value) noexcept; + to_chars_result to_chars(char* __first, char* __last, double __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, double __value, + chars_format __fmt, int __precision) noexcept; + + // Overloads for long double. + to_chars_result to_chars(char* __first, char* __last, long double __value) + noexcept; + to_chars_result to_chars(char* __first, char* __last, long double __value, + chars_format __fmt) noexcept; + to_chars_result to_chars(char* __first, char* __last, long double __value, + chars_format __fmt, int __precision) noexcept; + _GLIBCXX_END_NAMESPACE_VERSION } // namespace std #endif // C++14 diff --git a/libstdc++-v3/src/c++17/Makefile.am b/libstdc++-v3/src/c++17/Makefile.am index 37cdb53c076..2ec5ed621ca 100644 --- a/libstdc++-v3/src/c++17/Makefile.am +++ b/libstdc++-v3/src/c++17/Makefile.am @@ -51,6 +51,7 @@ endif sources = \ floating_from_chars.cc \ + floating_to_chars.cc \ fs_dir.cc \ fs_ops.cc \ fs_path.cc \ diff --git a/libstdc++-v3/src/c++17/Makefile.in b/libstdc++-v3/src/c++17/Makefile.in index ccae721ab3f..9b36b7a916c 100644 --- a/libstdc++-v3/src/c++17/Makefile.in +++ b/libstdc++-v3/src/c++17/Makefile.in @@ -124,7 +124,7 @@ LTLIBRARIES = $(noinst_LTLIBRARIES) libc__17convenience_la_LIBADD = @ENABLE_DUAL_ABI_TRUE@am__objects_1 = cow-fs_dir.lo cow-fs_ops.lo \ @ENABLE_DUAL_ABI_TRUE@ cow-fs_path.lo -am__objects_2 = floating_from_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \ +am__objects_2 = floating_from_chars.lo floating_to_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \ memory_resource.lo $(am__objects_1) @ENABLE_DUAL_ABI_TRUE@am__objects_3 = cow-string-inst.lo @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_4 = ostream-inst.lo \ @@ -440,6 +440,7 @@ headers = sources = \ floating_from_chars.cc \ + floating_to_chars.cc \ fs_dir.cc \ fs_ops.cc \ fs_path.cc \ diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc b/libstdc++-v3/src/c++17/floating_to_chars.cc new file mode 100644 index 00000000000..dd83f5eea93 --- /dev/null +++ b/libstdc++-v3/src/c++17/floating_to_chars.cc @@ -0,0 +1,1563 @@ +// std::to_chars implementation for floating-point types -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +// Activate __glibcxx_assert within this file to shake out any bugs. +#define _GLIBCXX_ASSERTIONS 1 + +#include <charconv> + +#include <bit> +#include <cfenv> +#include <cassert> +#include <cmath> +#include <cstdio> +#include <cstring> +#include <langinfo.h> +#include <optional> +#include <string_view> +#include <type_traits> + +// Determine the binary format of 'long double'. + +// We support the binary64, float80 (i.e. x86 80-bit extended precision), +// binary128, and ibm128 formats. +#define LDK_UNSUPPORTED 0 +#define LDK_BINARY64 1 +#define LDK_FLOAT80 2 +#define LDK_BINARY128 3 +#define LDK_IBM128 4 + +#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ +# define LONG_DOUBLE_KIND LDK_BINARY64 +#elif defined(__SIZEOF_INT128__) +// The Ryu routines need a 128-bit integer type in order to do shortest +// formatting of types larger than 64-bit double, so without __int128 we can't +// support any large long double format. This is the case for e.g. i386. +# if __LDBL_MANT_DIG__ == 64 +# define LONG_DOUBLE_KIND LDK_FLOAT80 +# elif __LDBL_MANT_DIG__ == 113 +# define LONG_DOUBLE_KIND LDK_BINARY128 +# elif __LDBL_MANT_DIG__ == 106 +# define LONG_DOUBLE_KIND LDK_IBM128 +# endif +#endif +#if !defined(LONG_DOUBLE_KIND) +# define LONG_DOUBLE_KIND LDK_UNSUPPORTED +#endif + +namespace +{ + namespace ryu + { +#include "ryu/common.h" +#include "ryu/digit_table.h" +#include "ryu/d2s_intrinsics.h" +#include "ryu/d2s_full_table.h" +#include "ryu/d2fixed_full_table.h" +#include "ryu/f2s_intrinsics.h" +#include "ryu/d2s.c" +#include "ryu/d2fixed.c" +#include "ryu/f2s.c" + +#ifdef __SIZEOF_INT128__ + namespace generic128 + { + // Put the generic Ryu bits in their own namespace to avoid name conflicts. +# include "ryu/generic_128.h" +# include "ryu/ryu_generic_128.h" +# include "ryu/generic_128.c" + } // namespace generic128 + + using generic128::floating_decimal_128; + using generic128::generic_binary_to_decimal; + + int + to_chars(const floating_decimal_128 v, char* const result) + { return generic128::generic_to_chars(v, result); } +#endif + } // namespace ryu + + // A traits class that contains pertinent information about the binary + // format of each of the floating-point types we support. + template<typename T> + struct floating_type_traits + { }; + + template<> + struct floating_type_traits<float> + { + // We (and Ryu) assume float has the IEEE binary32 format. + static_assert(__FLT_MANT_DIG__ == 24); + static constexpr int mantissa_bits = 23; + static constexpr int exponent_bits = 8; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = uint32_t; + using shortest_scientific_t = ryu::floating_decimal_32; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000011101011100110101100101101110000000000000000000000000 }; + }; + + template<> + struct floating_type_traits<double> + { + // We (and Ryu) assume double has the IEEE binary64 format. + static_assert(__DBL_MANT_DIG__ == 53); + static constexpr int mantissa_bits = 52; + static constexpr int exponent_bits = 11; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = uint64_t; + using shortest_scientific_t = ryu::floating_decimal_64; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000011000110101110111000001100101110000111100, + 0b0111100011110101011000011110000000110110010101011000001110011111, + 0b0101101100000000011100100100111100110110110100010001010101110000, + 0b0011110010111000101111110101100011101100010001010000000101100111, + 0b0001010000011001011100100001010000010101101000001101000000000000 }; + }; + +#if LONG_DOUBLE_KIND == LDK_BINARY64 + // When long double is equivalent to double, we just forward the long double + // overloads to the double overloads, so we don't need to define a a + // floating_type_traits<long double> specialization in this case. +#elif LONG_DOUBLE_KIND == LDK_FLOAT80 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 64; + static constexpr int exponent_bits = 15; + static constexpr bool has_implicit_leading_bit = false; + using mantissa_t = uint64_t; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000110101011111110100010100110000011101, + 0b1001100101001111010011011111101000101111110001011001011101110000, + 0b0000101111111011110010001000001010111101011110111111010100011001, + 0b0011100000011111001101101011111001111100100010000101001111101001, + 0b0100100100000000100111010010101110011000110001101101110011001010, + 0b0111100111100010100000010011000010010110101111110101000011110100, + 0b1010100111100010011110000011011101101100010110000110101010101010, + 0b0000001111001111000000101100111011011000101000110011101100110010, + 0b0111000011100100101101010100001101111110101111001000010011111111, + 0b0010111000100110100100100010101100111010110001101010010111001000, + 0b0000100000010110000011001001000111000001111010100101101000001111, + 0b0010101011101000111100001011000010011101000101010010010000101111, + 0b1011111011101101110010101011010001111000101000101101011001100011, + 0b1010111011011011110111110011001010000010011001110100101101000101, + 0b0011000001110110011010010000011100100011001011001100001101010110, + 0b0100011111011000111111101000011110000010111110101001000000001001, + 0b1110000001110001001101101110011000100000001010000111100010111010, + 0b1110001001010011101000111000001000010100110000010110100011110000, + 0b0000011010110000110001111000011111000011001101001101001001000110, + 0b1010010111001000101001100101010110100100100010010010000101000010, + 0b1011001110000111100010100110000011100011111001110111001100000101, + 0b0110101001001000010110001000010001010101110101100001111100011001, + 0b1111100011110101011110011010101001010010100011000010110001101001, + 0b0100000100001000111101011100010011011111011001000000001100011000, + 0b1110111111000111100101110111110000000011001110011100011011011001, + 0b1100001100100000010001100011011000111011110000110011010101000011, + 0b1111111011100111011101001111111000010000001111010111110010000100, + 0b1110111001111110101111000101000000001010001110011010001000111010, + 0b1000010001011000101111111010110011111101110101101001111000111010, + 0b0100000111101001000111011001101000001010111011101001101111000100, + 0b0000011100110001000111011100111100110001101111111010110111100000, + 0b0000011101011100100110010011110101010100010011110010010111010000, + 0b0011011001100111110101111100001001101110101101001110110011110110, + 0b1011000101000001110100111001100100111100110011110000000001101000, + 0b1011100011110100001001110101010110111001000000001011101001011110, + 0b1111001010010010100000010110101010101011101000101000000000001100, + 0b1000001111100100111001110101100001010011111111000001000011110000, + 0b0001011101001000010000101101111000001110101100110011001100110111, + 0b1110011100000010101011011111001010111101111110100000011100000011, + 0b1001110110011100101010011110100010110001001110110000101011100110, + 0b1001101000100011100111010000011011100001000000110101100100001001, + 0b1010111000101000101101010111000010001100001010100011111100000100, + 0b0111101000100011000101101011111011100010001101110111001111001011, + 0b1110100111010110001110110110000000010110100011110000010001111100, + 0b1100010100011010001011001000111001010101011110100101011001000000, + 0b0000110001111001100110010110111010101101001101000000000010010101, + 0b0001110111101000001111101010110010010000111110111100000111110100, + 0b0111110111001001111000110001101101001010101110110101111110000100, + 0b0000111110111010101111100010111010011100010110011011011001000001, + 0b1010010100100100101110111111111000101100000010111111101101000110, + 0b1000100111111101100011001101000110001000000100010101010100001101, + 0b1100101010101000111100101100001000110001110010100000000010110101, + 0b1010000100111101100100101010010110100010000000110101101110000100, + 0b1011111011110001110000100100000000001010111010001101100000100100, + 0b0111101101100011001110011100000001000101101101111000100111011111, + 0b0100111010010011011001010011110100001100111010010101111111100011, + 0b0010001001011000111000001100110111110111110010100011000110110110, + 0b0101010110000000010000100000110100111011111101000100000111010010, + 0b0110000011011101000001010100110101101110011100110101000000001001, + 0b1101100110100000011000001111000100100100110001100110101010101100, + 0b0010100101010110010010001010101000011111111111001011001010001111, + 0b0111001010001111001100111001010101001000110101000011110000001000, + 0b0110010011001001001111110001010010001011010010001101110110110011, + 0b0110010100111011000100111000001001101011111001110010111110111111, + 0b0101110111001001101100110100101001110010101110011001101110001000, + 0b0100110101010111011010001100010111100011010011111001010100111000, + 0b0111000110110111011110100100010111000110000110110110110001111110, + 0b1000101101010100100100111110100011110110110010011001110011110101, + 0b1001101110101001010100111101101011000101000010110101101111110000, + 0b0100100101001011011001001011000010001101001010010001010110101000, + 0b0010100001001011100110101000010110000111000111000011100101011011, + 0b0110111000011001111101101011111010001000000010101000101010011110, + 0b1000110110100001111011000001111100001001000000010110010100100100, + 0b1001110100011111100111101011010000010101011100101000010010100110, + 0b0001010110101110100010101010001110110110100011101010001001111100, + 0b1010100101101100000010110011100110100010010000100100001110000100, + 0b0001000000010000001010000010100110000001110100111001110111101101, + 0b1100000000000000000000000000000000000000000000000000000000000000 }; + }; +#elif LONG_DOUBLE_KIND == LDK_BINARY128 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 112; + static constexpr int exponent_bits = 15; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = unsigned __int128; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000000000000000000000000100000010000000, + 0b1011001111110100000100010101101110011100100110000110010110011000, + 0b1010100010001101111111000000001101010010100010010000111011110111, + 0b1011111001110001111000011111000010110111000111110100101010100101, + 0b0110100110011110011011000011000010011001110001001001010011100011, + 0b0000011111110010101111101011101010000110011111100111001110100111, + 0b0100010101010110000010111011110100000010011001001010001110111101, + 0b1101110111000010001101100000110100000111001001101011000101011011, + 0b0100111011101101010000001101011000101100101110010010110000101011, + 0b0100000110111000000110101000010011101000110100010110000011101101, + 0b1011001101001000100001010001100100001111011101010101110001010110, + 0b1000000001000000101001110010110010001111101101010101001100000110, + 0b0101110110100110000110000001001010111110001110010000111111010011, + 0b1010001111100111000100011100100100111100100101000001011001000111, + 0b1010011000011100110101100111001011100101111111100001110100000100, + 0b1100011100100010100000110001001010000000100000001001010111011101, + 0b0101110000100011001111101101000000100110000010010111010001111010, + 0b0100111100011010110111101000100110000111001001101100000001111100, + 0b1100100100111110101011000100000101011010110111000111110100110101, + 0b0110010000010111010100110011000000111010000010111011010110000100, + 0b0101001001010010110111010111000101011100000111100111000001110010, + 0b1101111111001011101010110001000111011010111101001011010110100100, + 0b0001000100110000011111101011001101110010110110010000000011100100, + 0b0001000000000101001001001000000000011000100011001110101001001110, + 0b0010010010001000111010011011100001000110011011011110110100111000, + 0b0000100110101100000111100010100100011100110111011100001111001100, + 0b1011111010001110001100000011110111111111100000001011111111101100, + 0b0000011100001111010101110000100110111100101101110111101001000001, + 0b1100010001110110111100001001001101101000011100000010110101001011, + 0b0100101001101011111001011110101101100011011111011100101010101111, + 0b0001101001111001110000101101101100001011010001011110011101000010, + 0b1111000000101001101111011010110011101110100001011011001011100010, + 0b0101001010111101101100001111100010010110001101001000001101100100, + 0b0101100101011110001100101011111000111001111001001001101101100001, + 0b1111001101010010100100011011000110110010001111000111010001001101, + 0b0001110010011000000001000110110111011000011100001000011001110111, + 0b0100001011011011011011110011101100100101111111101100101000001110, + 0b0101011110111101010111100111101111000101111111111110100011011010, + 0b1110101010001001110100000010110111010111111010111110100110010110, + 0b1010001111100001001100101000110100001100011100110010000011010111, + 0b1111111101101111000100111100000101011000001110011011101010111001, + 0b1111101100001110100101111101011001000100000101110000110010100011, + 0b1001010110110101101101000101010001010000101011011111010011010000, + 0b0111001110110011101001100111000001000100001010110000010000001101, + 0b0101111100111110100111011001111001111011011110010111010011101010, + 0b1110111000000001100100111001100100110001011011001110101111110111, + 0b0001010001001101010111101010011111000011110001101101011001111111, + 0b0101000011100011010010001101100001011101011010100110101100100010, + 0b0001000101011000100101111100110110000101101101111000110001001011, + 0b0101100101001011011000010101000000010100011100101101000010011111, + 0b1000010010001011101001011010100010111011110100110011011000100111, + 0b1000011011100001010111010111010011101100100010010010100100101001, + 0b1001001001010111110101000010111010000000101111010100001010010010, + 0b0011011110110010010101111011000001000000000011011111000011111011, + 0b1011000110100011001110000001000100000001011100010111010010011110, + 0b0111101110110101110111110000011000000100011100011000101101101110, + 0b1001100101111011011100011110101011001111100111101010101010110111, + 0b1100110010010001100011001111010000000100011101001111011101001111, + 0b1000111001111010100101000010000100000001001100101010001011001101, + 0b0011101011110000110010100101010100110010100001000010101011111101, + 0b1100000000000110000010101011000000011101000110011111100010111111, + 0b0010100110000011011100010110111100010110101100110011101110001101, + 0b0010111101010011111000111001111100110111111100100011110001101110, + 0b1001110111001001101001001001011000010100110001000000100011010110, + 0b0011110101100111011011111100001000011001010100111100100101111010, + 0b0010001101000011000010100101110000010101101000100110000100001010, + 0b0010000010100110010101100101110011101111000111111111001001100001, + 0b0100111111011011011011100111111011000010011101101111011111110110, + 0b1111111111010110101011101000100101110100001110001001101011100111, + 0b1011111101000101110000111100100010111010100001010000010010110010, + 0b1111010101001011101011101010000100110110001110111100100110111111, + 0b1011001101000001001101000010101010010110010001100001011100011010, + 0b0101001011011101010001110100010000010001111100100100100001001101, + 0b0010100000111001100011000101100101000001111100111001101000000010, + 0b1011001111010101011001000100100110100100110111110100000110111000, + 0b0101011111010011100011010010111101110010100001111111100010001001, + 0b0010111011101100100000000000001111111010011101100111100001001101, + 0b1101000000000000000000000000000000000000000000000000000000000000 }; + }; +#elif LONG_DOUBLE_KIND == LDK_IBM128 + template<> + struct floating_type_traits<long double> + { + static constexpr int mantissa_bits = 105; + static constexpr int exponent_bits = 11; + static constexpr bool has_implicit_leading_bit = true; + using mantissa_t = unsigned __int128; + using shortest_scientific_t = ryu::floating_decimal_128; + + static constexpr uint64_t pow10_adjustment_tab[] + = { 0b0000000000000000000000000000000000000000000000001000000100000000, + 0b0000000000000000000100000000000000000000001000000000000000000010, + 0b0000100000000000000000001001000000000000000001100100000000000000, + 0b0011000000000000000000000000000001110000010000000000000000000000, + 0b0000100000000000001000000000000000000000000000100000000000000000 }; + }; +#endif + + // An IEEE-style decomposition of a floating-point value of type T. + template<typename T> + struct ieee_t + { + typename floating_type_traits<T>::mantissa_t mantissa; + uint32_t biased_exponent; + bool sign; + }; + + // Decompose the floating-point value into its IEEE components. + template<typename T> + ieee_t<T> + get_ieee_repr(const T value) + { + constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits; + constexpr int exponent_bits = floating_type_traits<T>::exponent_bits; + constexpr int total_bits = mantissa_bits + exponent_bits + 1; + + constexpr auto get_uint_t = [] { + if constexpr (total_bits <= 32) + return uint32_t{}; + else if constexpr (total_bits <= 64) + return uint64_t{}; +#ifdef __SIZEOF_INT128__ + else if constexpr (total_bits <= 128) + return (unsigned __int128){}; +#endif + }; + using uint_t = decltype(get_uint_t()); + uint_t value_bits = 0; + memcpy(&value_bits, &value, sizeof(value)); + + ieee_t<T> ieee_repr; + ieee_repr.mantissa = value_bits & ((uint_t{1} << mantissa_bits) - 1u); + ieee_repr.biased_exponent + = (value_bits >> mantissa_bits) & ((uint_t{1} << exponent_bits) - 1u); + ieee_repr.sign = (value_bits >> (mantissa_bits + exponent_bits)) & 1; + return ieee_repr; + } + +#if LONG_DOUBLE_KIND == LDK_IBM128 + template<> + ieee_t<long double> + get_ieee_repr(const long double value) + { + // The layout of __ibm128 isn't compatible with the standard IEEE format. + // So we transform it into an IEEE-compatible format, suitable for + // consumption by the generic Ryu API, with an 11-bit exponent and 105-bit + // mantissa (plus an implicit leading bit). We use the exponent and sign + // of the high part, and we merge the mantissa of the high part with the + // mantissa (and the implicit leading bit) of the low part. + using uint_t = unsigned __int128; + uint_t value_bits = 0; + memcpy(&value_bits, &value, sizeof(value_bits)); + + const uint64_t value_hi = value_bits; + const uint64_t value_lo = value_bits >> 64; + + uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1); + unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1); + const int sign_hi = (value_hi >> 63) & 1; + + uint64_t mantissa_lo = value_lo & ((1ull << 52) - 1); + const unsigned exponent_lo = (value_lo >> 52) & ((1ull << 11) - 1); + const int sign_lo = (value_lo >> 63) & 1; + + { + // The following code for adjusting the low-part mantissa to combine + // it with the high-part mantissa is taken from the glibc source file + // sysdeps/ieee754/ldbl-128ibm/printf_fphex.c. + mantissa_lo <<= 7; + if (exponent_lo != 0) + mantissa_lo |= (1ull << (52 + 7)); + else + mantissa_lo <<= 1; + + const int ediff = exponent_hi - exponent_lo - 53; + if (ediff > 63) + mantissa_lo = 0; + else if (ediff > 0) + mantissa_lo >>= ediff; + else if (ediff < 0) + mantissa_lo <<= -ediff; + + if (sign_lo != sign_hi && mantissa_lo != 0) + { + mantissa_lo = (1ull << 60) - mantissa_lo; + if (mantissa_hi == 0) + { + mantissa_hi = 0xffffffffffffeLL | (mantissa_lo >> 59); + mantissa_lo = 0xfffffffffffffffLL & (mantissa_lo << 1); + exponent_hi--; + } + else + mantissa_hi--; + } + } + + ieee_t<long double> ieee_repr; + ieee_repr.mantissa = ((uint_t{mantissa_hi} << 64) + | (uint_t{mantissa_lo} << 4)) >> 11; + ieee_repr.biased_exponent = exponent_hi; + ieee_repr.sign = sign_hi; + return ieee_repr; + } +#endif + + // Invoke Ryu to obtain the shortest scientific form for the given + // floating-point number. + template<typename T> + typename floating_type_traits<T>::shortest_scientific_t + floating_to_shortest_scientific(const T value) + { + if constexpr (std::is_same_v<T, float>) + return ryu::floating_to_fd32(value); + else if constexpr (std::is_same_v<T, double>) + return ryu::floating_to_fd64(value); +#ifdef __SIZEOF_INT128__ + else if constexpr (std::is_same_v<T, long double>) + { + constexpr int mantissa_bits + = floating_type_traits<T>::mantissa_bits; + constexpr int exponent_bits + = floating_type_traits<T>::exponent_bits; + constexpr bool has_implicit_leading_bit + = floating_type_traits<T>::has_implicit_leading_bit; + + const auto [mantissa, exponent, sign] = get_ieee_repr(value); + return ryu::generic_binary_to_decimal(mantissa, exponent, sign, + mantissa_bits, exponent_bits, + !has_implicit_leading_bit); + } +#endif + } + + // This subroutine returns true if the shortest scientific form fd is a + // positive power of 10, and the floating-point number that has this shortest + // scientific form is smaller than this power of 10. + // + // For instance, the exactly-representable 64-bit number + // 99999999999999991611392.0 has the shortest scientific form 1e23, so its + // exact value is smaller than its shortest scientific form. + // + // For these powers of 10 the length of the fixed form is one digit less + // than what the scientific exponent suggests. + // + // This subroutine inspects a lookup table to detect when fd is such a + // "rounded up" power of 10. + template<typename T> + bool + is_rounded_up_pow10_p(const typename + floating_type_traits<T>::shortest_scientific_t fd) + { + if (fd.exponent < 0 || fd.mantissa != 1) [[likely]] + return false; + + constexpr auto& pow10_adjustment_tab + = floating_type_traits<T>::pow10_adjustment_tab; + __glibcxx_assert(fd.exponent/64 < (int)std::size(pow10_adjustment_tab)); + return (pow10_adjustment_tab[fd.exponent/64] + & (1ull << (63 - fd.exponent%64))); + } + + int + get_mantissa_length(const ryu::floating_decimal_32 fd) + { return ryu::decimalLength9(fd.mantissa); } + + int + get_mantissa_length(const ryu::floating_decimal_64 fd) + { return ryu::decimalLength17(fd.mantissa); } + +#ifdef __SIZEOF_INT128__ + int + get_mantissa_length(const ryu::floating_decimal_128 fd) + { return ryu::generic128::decimalLength(fd.mantissa); } +#endif +} // anon namespace + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + +// This subroutine of __floating_to_chars_* handles writing nan, inf and 0 in +// all formatting modes. +template<typename T> + static optional<to_chars_result> + __handle_special_value(char* first, char* const last, const T value, + const chars_format fmt, const int precision) + { + __glibcxx_assert(precision >= 0); + + string_view str; + switch (__builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, + FP_ZERO, value)) + { + case FP_INFINITE: + str = "-inf"; + break; + + case FP_NAN: + str = "-nan"; + break; + + case FP_ZERO: + break; + + default: + case FP_SUBNORMAL: + case FP_NORMAL: [[likely]] + return nullopt; + } + + if (!str.empty()) + { + // We're formatting +-inf or +-nan. + if (!__builtin_signbit(value)) + str.remove_prefix(strlen("-")); + + if (last - first < (int)str.length()) + return {{last, errc::value_too_large}}; + + memcpy(first, &str[0], str.length()); + first += str.length(); + return {{first, errc{}}}; + } + + // We're formatting 0. + __glibcxx_assert(value == 0); + const auto orig_first = first; + const bool sign = __builtin_signbit(value); + int expected_output_length; + switch (fmt) + { + case chars_format::fixed: + case chars_format::scientific: + case chars_format::hex: + expected_output_length = sign + 1; + if (precision) + expected_output_length += strlen(".") + precision; + if (fmt == chars_format::scientific) + expected_output_length += strlen("e+00"); + else if (fmt == chars_format::hex) + expected_output_length += strlen("p+0"); + if (last - first < expected_output_length) + return {{last, errc::value_too_large}}; + + if (sign) + *first++ = '-'; + *first++ = '0'; + if (precision) + { + *first++ = '.'; + memset(first, '0', precision); + first += precision; + } + if (fmt == chars_format::scientific) + { + memcpy(first, "e+00", 4); + first += 4; + } + else if (fmt == chars_format::hex) + { + memcpy(first, "p+0", 3); + first += 3; + } + break; + + case chars_format::general: + default: // case chars_format{}: + expected_output_length = sign + 1; + if (last - first < expected_output_length) + return {{last, errc::value_too_large}}; + + if (sign) + *first++ = '-'; + *first++ = '0'; + break; + } + __glibcxx_assert(first - orig_first == expected_output_length); + return {{first, errc{}}}; + } + +// This subroutine of the floating-point to_chars overloads performs +// hexadecimal formatting. +template<typename T> + static to_chars_result + __floating_to_chars_hex(char* first, char* const last, const T value, + const optional<int> precision) + { + if (precision.has_value() && precision.value() < 0) [[unlikely]] + // A negative precision argument is treated as if it were omitted. + return __floating_to_chars_hex(first, last, value, nullopt); + + __glibcxx_requires_valid_range(first, last); + + constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits; + constexpr bool has_implicit_leading_bit + = floating_type_traits<T>::has_implicit_leading_bit; + constexpr int exponent_bits = floating_type_traits<T>::exponent_bits; + constexpr int exponent_bias = (1u << (exponent_bits - 1)) - 1; + using mantissa_t = typename floating_type_traits<T>::mantissa_t; + constexpr int mantissa_t_width = sizeof(mantissa_t) * __CHAR_BIT__; + + if (auto result = __handle_special_value(first, last, value, + chars_format::hex, + precision.value_or(0))) + return *result; + + // Extract the sign, mantissa and exponent from the value. + const auto [ieee_mantissa, biased_exponent, sign] = get_ieee_repr(value); + const bool is_normal_number = (biased_exponent != 0); + + // Calculate the unbiased exponent. + const int32_t unbiased_exponent = (is_normal_number + ? biased_exponent - exponent_bias + : 1 - exponent_bias); + + // Shift the mantissa so that its bitwidth is a multiple of 4. + constexpr unsigned rounded_mantissa_bits = (mantissa_bits + 3) / 4 * 4; + static_assert(mantissa_t_width >= rounded_mantissa_bits); + mantissa_t effective_mantissa + = ieee_mantissa << (rounded_mantissa_bits - mantissa_bits); + if (is_normal_number) + { + if constexpr (has_implicit_leading_bit) + // Restore the mantissa's implicit leading bit. + effective_mantissa |= mantissa_t{1} << rounded_mantissa_bits; + else + // The explicit mantissa bit should already be set. + __glibcxx_assert(effective_mantissa & (mantissa_t{1} << (mantissa_bits + - 1u))); + } + + // Compute the shortest precision needed to print this value exactly, + // disregarding trailing zeros. + constexpr int full_hex_precision = (has_implicit_leading_bit + ? (mantissa_bits + 3) / 4 + // With an explicit leading bit, we + // use the four leading nibbles as the + // hexit before the decimal point. + : (mantissa_bits - 4 + 3) / 4); + const int trailing_zeros = __countr_zero(effective_mantissa) / 4; + const int shortest_full_precision = full_hex_precision - trailing_zeros; + __glibcxx_assert(shortest_full_precision >= 0); + + int written_exponent = unbiased_exponent; + const int effective_precision = precision.value_or(shortest_full_precision); + if (effective_precision < shortest_full_precision) + { + // When limiting the precision, we need to determine how to round the + // least significant printed hexit. The following branchless + // bit-level-parallel technique computes whether to round up the + // mantissa bit at index N (according to round-to-nearest rules) when + // dropping N bits of precision, for each index N in the bit vector. + // This technique is borrowed from the MSVC implementation. + using bitvec = mantissa_t; + const bitvec round_bit = effective_mantissa << 1; + const bitvec has_tail_bits = round_bit - 1; + const bitvec lsb_bit = effective_mantissa; + const bitvec should_round = round_bit & (has_tail_bits | lsb_bit); + + const int dropped_bits = 4*(full_hex_precision - effective_precision); + // Mask out the dropped nibbles. + effective_mantissa >>= dropped_bits; + effective_mantissa <<= dropped_bits; + if (should_round & (mantissa_t{1} << dropped_bits)) + { + // Round up the least significant nibble. + effective_mantissa += mantissa_t{1} << dropped_bits; + // Check and adjust for overflow of the leading nibble. When the + // type has an implicit leading bit, then the leading nibble + // before rounding is either 0 or 1, so it can't overflow. + if constexpr (!has_implicit_leading_bit) + { + // The only supported floating-point type with explicit + // leading mantissa bit is LDK_FLOAT80, i.e. x86 80-bit + // extended precision, and so we hardcode the below overflow + // check+adjustment for this type. + static_assert(mantissa_t_width == 64 + && rounded_mantissa_bits == 64); + if (effective_mantissa == 0) + { + // We rounded up the least significant nibble and the + // mantissa overflowed, e.g f.fcp+10 with precision=1 + // became 10.0p+10. Absorb this extra hexit into the + // exponent to obtain 1.0p+14. + effective_mantissa + = mantissa_t{1} << (rounded_mantissa_bits - 4); + written_exponent += 4; + } + } + } + } + + // Compute the leading hexit and mask it out from the mantissa. + char leading_hexit; + if constexpr (has_implicit_leading_bit) + { + const unsigned nibble = effective_mantissa >> rounded_mantissa_bits; + __glibcxx_assert(nibble <= 2); + leading_hexit = '0' + nibble; + effective_mantissa &= ~(mantissa_t{0b11} << rounded_mantissa_bits); + } + else + { + const unsigned nibble = effective_mantissa >> (rounded_mantissa_bits-4); + __glibcxx_assert(nibble < 16); + leading_hexit = "0123456789abcdef"[nibble]; + effective_mantissa &= ~(mantissa_t{0b1111} << (rounded_mantissa_bits-4)); + written_exponent -= 3; + } + + // Now before we start writing the string, determine the total length of + // the output string and perform a single bounds check. + int expected_output_length = sign + 1; + if (effective_precision != 0) + expected_output_length += strlen(".") + effective_precision; + const int abs_written_exponent = abs(written_exponent); + expected_output_length += (abs_written_exponent >= 10000 ? strlen("p+ddddd") + : abs_written_exponent >= 1000 ? strlen("p+dddd") + : abs_written_exponent >= 100 ? strlen("p+ddd") + : abs_written_exponent >= 10 ? strlen("p+dd") + : strlen("p+d")); + if (last - first < expected_output_length) + return {last, errc::value_too_large}; + + const auto saved_first = first; + // Write the negative sign and the leading hexit. + if (sign) + *first++ = '-'; + *first++ = leading_hexit; + + if (effective_precision > 0) + { + *first++ = '.'; + int written_hexits = 0; + // Extract and mask out the leading nibble after the decimal point, + // write its corresponding hexit, and repeat until the mantissa is + // empty. + int nibble_offset = rounded_mantissa_bits; + if constexpr (!has_implicit_leading_bit) + // We already printed the entire leading hexit. + nibble_offset -= 4; + while (effective_mantissa != 0) + { + nibble_offset -= 4; + const unsigned nibble = effective_mantissa >> nibble_offset; + __glibcxx_assert(nibble < 16); + *first++ = "0123456789abcdef"[nibble]; + ++written_hexits; + effective_mantissa &= ~(mantissa_t{0b1111} << nibble_offset); + } + __glibcxx_assert(nibble_offset >= 0); + __glibcxx_assert(written_hexits <= effective_precision); + // Since the mantissa is now empty, every hexit hereafter must be '0'. + if (int remaining_hexits = effective_precision - written_hexits) + { + memset(first, '0', remaining_hexits); + first += remaining_hexits; + } + } + + // Finally, write the exponent. + *first++ = 'p'; + if (written_exponent >= 0) + *first++ = '+'; + const to_chars_result result = to_chars(first, last, written_exponent); + __glibcxx_assert(result.ec == errc{} + && result.ptr == saved_first + expected_output_length); + return result; + } + +template<typename T> + static to_chars_result + __floating_to_chars_shortest(char* first, char* const last, const T value, + chars_format fmt) + { + if (fmt == chars_format::hex) + return __floating_to_chars_hex(first, last, value, nullopt); + + __glibcxx_assert(fmt == chars_format::fixed + || fmt == chars_format::scientific </cut>

3 years, 9 months

2
1
0 0

[TCWG CI] Regression caused by llvm: Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""

by ci_notify＠linaro.org

After llvm commit 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 Author: Jun Ma <JunMa(a)linux.alibaba.com> Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" the following benchmarks grew in size by more than 1%: - 401.bzip2 grew in size by 4% from 36214 to 37510 bytes - [.] BZ2_decompress grew in size by 19%,401.bzip2,[.] BZ2_decompress grew in size by 19% from 7260 to 8660 bytes Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 cd investigate-llvm-8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d1280f6967db1ca8fa4e0c39414003e717b40feb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614 Author: Jun Ma <JunMa(a)linux.alibaba.com> Date: Fri Aug 20 17:27:00 2021 +0800 Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" Differential Revision: https://reviews.llvm.org/D106056 --- llvm/include/llvm/Transforms/Utils/Local.h | 5 ++++ .../Scalar/CorrelatedValuePropagation.cpp | 27 +++++++++++++++++++++- llvm/lib/Transforms/Utils/Local.cpp | 20 ++++++++++++++++ llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 20 ---------------- .../Transforms/CorrelatedValuePropagation/basic.ll | 11 +++++---- 5 files changed, 57 insertions(+), 26 deletions(-) diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h index 97686d7d5f2f..f003615eca78 100644 --- a/llvm/include/llvm/Transforms/Utils/Local.h +++ b/llvm/include/llvm/Transforms/Utils/Local.h @@ -55,6 +55,7 @@ class MDNode; class MemorySSAUpdater; class PHINode; class StoreInst; +class SwitchInst; class TargetLibraryInfo; class TargetTransformInfo; @@ -236,6 +237,10 @@ CallInst *createCallMatchingInvoke(InvokeInst *II); /// This function converts the specified invoek into a normall call. void changeToCall(InvokeInst *II, DomTreeUpdater *DTU = nullptr); +/// This function removes the default destination from the specified switch. +void createUnreachableSwitchDefault(SwitchInst *Switch, + DomTreeUpdater *DTU = nullptr); + ///===---------------------------------------------------------------------===// /// Dbg Intrinsic utilities /// diff --git a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp index 36cbd42a5fdd..cd38ce96e287 100644 --- a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp +++ b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp @@ -341,7 +341,13 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI, // ConstantFoldTerminator() as the underlying SwitchInst can be changed. SwitchInstProfUpdateWrapper SI(*I); - for (auto CI = SI->case_begin(), CE = SI->case_end(); CI != CE;) { + APInt Low = + APInt::getSignedMaxValue(Cond->getType()->getScalarSizeInBits()); + APInt High = + APInt::getSignedMinValue(Cond->getType()->getScalarSizeInBits()); + + SwitchInst::CaseIt CI = SI->case_begin(); + for (auto CE = SI->case_end(); CI != CE;) { ConstantInt *Case = CI->getCaseValue(); LazyValueInfo::Tristate State = LVI->getPredicateAt(CmpInst::ICMP_EQ, Cond, Case, I, @@ -374,9 +380,28 @@ static bool processSwitch(SwitchInst *I, LazyValueInfo *LVI, break; } + // Get Lower/Upper bound from switch cases. + Low = APIntOps::smin(Case->getValue(), Low); + High = APIntOps::smax(Case->getValue(), High); + // Increment the case iterator since we didn't delete it. ++CI; } + + // Try to simplify default case as unreachable + if (CI == SI->case_end() && SI->getNumCases() != 0 && + !isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg())) { + const ConstantRange SIRange = + LVI->getConstantRange(SI->getCondition(), SI); + + // If the numbered switch cases cover the entire range of the condition, + // then the default case is not reachable. + if (SIRange.getSignedMin() == Low && SIRange.getSignedMax() == High && + SI->getNumCases() == High - Low + 1) { + createUnreachableSwitchDefault(SI, &DTU); + Changed = true; + } + } } if (Changed) diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp index 3d6ffded9b19..6d7eca8e3678 100644 --- a/llvm/lib/Transforms/Utils/Local.cpp +++ b/llvm/lib/Transforms/Utils/Local.cpp @@ -2182,6 +2182,26 @@ void llvm::changeToCall(InvokeInst *II, DomTreeUpdater *DTU) { DTU->applyUpdates({{DominatorTree::Delete, BB, UnwindDestBB}}); } +void llvm::createUnreachableSwitchDefault(SwitchInst *Switch, + DomTreeUpdater *DTU) { + LLVM_DEBUG(dbgs() << "SimplifyCFG: switch default is dead.\n"); + auto *BB = Switch->getParent(); + auto *OrigDefaultBlock = Switch->getDefaultDest(); + OrigDefaultBlock->removePredecessor(BB); + BasicBlock *NewDefaultBlock = BasicBlock::Create( + BB->getContext(), BB->getName() + ".unreachabledefault", BB->getParent(), + OrigDefaultBlock); + new UnreachableInst(Switch->getContext(), NewDefaultBlock); + Switch->setDefaultDest(&*NewDefaultBlock); + if (DTU) { + SmallVector<DominatorTree::UpdateType, 2> Updates; + Updates.push_back({DominatorTree::Insert, BB, &*NewDefaultBlock}); + if (!is_contained(successors(BB), OrigDefaultBlock)) + Updates.push_back({DominatorTree::Delete, BB, &*OrigDefaultBlock}); + DTU->applyUpdates(Updates); + } +} + BasicBlock *llvm::changeToInvokeAndSplitBasicBlock(CallInst *CI, BasicBlock *UnwindEdge, DomTreeUpdater *DTU) { diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 737b4f97a97a..70297e471a7a 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -4743,26 +4743,6 @@ static bool CasesAreContiguous(SmallVectorImpl<ConstantInt *> &Cases) { return true; } -static void createUnreachableSwitchDefault(SwitchInst *Switch, - DomTreeUpdater *DTU) { - LLVM_DEBUG(dbgs() << "SimplifyCFG: switch default is dead.\n"); - auto *BB = Switch->getParent(); - auto *OrigDefaultBlock = Switch->getDefaultDest(); - OrigDefaultBlock->removePredecessor(BB); - BasicBlock *NewDefaultBlock = BasicBlock::Create( - BB->getContext(), BB->getName() + ".unreachabledefault", BB->getParent(), - OrigDefaultBlock); - new UnreachableInst(Switch->getContext(), NewDefaultBlock); - Switch->setDefaultDest(&*NewDefaultBlock); - if (DTU) { - SmallVector<DominatorTree::UpdateType, 2> Updates; - Updates.push_back({DominatorTree::Insert, BB, &*NewDefaultBlock}); - if (!is_contained(successors(BB), OrigDefaultBlock)) - Updates.push_back({DominatorTree::Delete, BB, &*OrigDefaultBlock}); - DTU->applyUpdates(Updates); - } -} - /// Turn a switch with two reachable destinations into an integer range /// comparison and branch. bool SimplifyCFGOpt::TurnSwitchRangeIntoICmp(SwitchInst *SI, diff --git a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll index 5abbcbc90e01..a620c8468d4d 100644 --- a/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll +++ b/llvm/test/Transforms/CorrelatedValuePropagation/basic.ll @@ -382,7 +382,7 @@ define i32 @switch_range(i32 %cond) { ; CHECK-NEXT: entry: ; CHECK-NEXT: [[S:%.*]] = urem i32 [[COND:%.*]], 3 ; CHECK-NEXT: [[S1:%.*]] = add nuw nsw i32 [[S]], 1 -; CHECK-NEXT: switch i32 [[S1]], label [[UNREACHABLE:%.*]] [ +; CHECK-NEXT: switch i32 [[S1]], label [[ENTRY_UNREACHABLEDEFAULT:%.*]] [ ; CHECK-NEXT: i32 1, label [[EXIT1:%.*]] ; CHECK-NEXT: i32 2, label [[EXIT2:%.*]] ; CHECK-NEXT: i32 3, label [[EXIT1]] @@ -391,6 +391,8 @@ define i32 @switch_range(i32 %cond) { ; CHECK-NEXT: ret i32 1 ; CHECK: exit2: ; CHECK-NEXT: ret i32 2 +; CHECK: entry.unreachabledefault: +; CHECK-NEXT: unreachable ; CHECK: unreachable: ; CHECK-NEXT: ret i32 0 ; @@ -453,10 +455,9 @@ define i8 @switch_defaultdest_multipleuse(i8 %t0) { ; CHECK-NEXT: entry: ; CHECK-NEXT: [[O:%.*]] = or i8 [[T0:%.*]], 1 ; CHECK-NEXT: [[R:%.*]] = srem i8 1, [[O]] -; CHECK-NEXT: switch i8 [[R]], label [[EXIT:%.*]] [ -; CHECK-NEXT: i8 0, label [[EXIT]] -; CHECK-NEXT: i8 1, label [[EXIT]] -; CHECK-NEXT: ] +; CHECK-NEXT: br label [[EXIT:%.*]] +; CHECK: entry.unreachabledefault: +; CHECK-NEXT: unreachable ; CHECK: exit: ; CHECK-NEXT: ret i8 0 ; </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by binutils: PR28149, debug info with wrong file association

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: PR28149, debug info with wrong file association: commit 51298b330327a568358da069d9808f51c6cb1672 Author: Alan Modra <amodra(a)gmail.com> PR28149, debug info with wrong file association Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 6363 # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 7116 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-defconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… Reproduce builds: <cut> mkdir investigate-binutils-51298b330327a568358da069d9808f51c6cb1672 cd investigate-binutils-51298b330327a568358da069d9808f51c6cb1672 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-def… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 51298b330327a568358da069d9808f51c6cb1672 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5cdb4f14426a99ec8fcba843fa503efdc55fa078 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 51298b330327a568358da069d9808f51c6cb1672 Author: Alan Modra <amodra(a)gmail.com> Date: Fri Sep 17 09:08:15 2021 +0930 PR28149, debug info with wrong file association gcc-11 and gcc-12 pass -gdwarf-5 to gas, in order to prime gas for DWARF 5 level debug info. Unfortunately it seems there are cases where the compiler does not emit a .file or .loc dwarf debug directive before any machine instructions. (Note that the .file directive typically emitted as the first line of assembly output doesn't count as a dwarf debug directive. The dwarf .file has a file number before the file name string.) This patch delays allocation of file numbers for gas generated line debug info until the end of assembly, thus avoiding any clashes with compiler generated file numbers. Two fixes for test case source are necessary; A .loc can't use a file number that hasn't already been specified with .file. A followup patch will remove all the gas generated line info on seeing a .file directive. PR 28149 * dwarf2dbg.c (num_of_auto_assigned): Delete. (current): Update initialisation. (set_or_check_view): Replace all accesses to view with u.view. (dwarf2_consume_line_info): Likewise. (dwarf2_directive_loc): Likewise. Assert that we aren't generating line info. (dwarf2_gen_line_info_1): Don't call set_or_check_view on gas generated line entries. (dwarf2_gen_line_info): Set and track filenames for gas generated line entries. Simplify generation of labels. (get_directory_table_entry): Use filename_cmp when comparing dirs. (do_allocate_filenum): New function. (dwarf2_where): Set u.filename and filenum to -1 for gas generated line entries. (dwarf2_directive_filename): Remove num_of_auto_assigned handling. (process_entries): Update view field access. Call do_allocate_filenum. * dwarf2dbg.h (struct dwarf2_line_info): Add filename field in union aliasing view. * testsuite/gas/i386/dwarf2-line-3.s: Add .file directive. * testsuite/gas/i386/dwarf2-line-4.s: Likewise. * testsuite/gas/i386/dwarf2-line-4.d: Update expected output. * testsuite/gas/i386/dwarf4-line-1.d: Likewise. * testsuite/gas/i386/dwarf5-line-1.d: Likewise. * testsuite/gas/i386/dwarf5-line-2.d: Likewise. --- gas/dwarf2dbg.c | 152 ++++++++++++++++++--------------- gas/dwarf2dbg.h | 7 +- gas/testsuite/gas/i386/dwarf2-line-3.s | 1 + gas/testsuite/gas/i386/dwarf2-line-4.d | 5 +- gas/testsuite/gas/i386/dwarf2-line-4.s | 1 + gas/testsuite/gas/i386/dwarf4-line-1.d | 4 +- gas/testsuite/gas/i386/dwarf5-line-1.d | 4 +- gas/testsuite/gas/i386/dwarf5-line-2.d | 3 +- 8 files changed, 105 insertions(+), 72 deletions(-) diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c index 9e3437b8948..c6303ba94a6 100644 --- a/gas/dwarf2dbg.c +++ b/gas/dwarf2dbg.c @@ -207,7 +207,6 @@ struct file_entry static struct file_entry *files; static unsigned int files_in_use; static unsigned int files_allocated; -static unsigned int num_of_auto_assigned; /* Table of directories used by .debug_line. */ static char ** dirs = NULL; @@ -233,7 +232,7 @@ static struct dwarf2_line_info current = { 1, 1, 0, 0, DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, - 0, NULL + 0, { NULL } }; /* This symbol is used to recognize view number forced resets in loc @@ -342,7 +341,7 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, /* First, compute !(E->label > P->label), to tell whether or not we're to reset the view number. If we can't resolve it to a constant, keep it symbolic. */ - if (!p || (e->loc.view == force_reset_view && force_reset_view)) + if (!p || (e->loc.u.view == force_reset_view && force_reset_view)) { viewx.X_op = O_constant; viewx.X_add_number = 0; @@ -367,9 +366,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, } } - if (S_IS_DEFINED (e->loc.view) && symbol_constant_p (e->loc.view)) + if (S_IS_DEFINED (e->loc.u.view) && symbol_constant_p (e->loc.u.view)) { - expressionS *value = symbol_get_value_expression (e->loc.view); + expressionS *value = symbol_get_value_expression (e->loc.u.view); /* We can't compare the view numbers at this point, because in VIEWX we've only determined whether we're to reset it so far. */ @@ -404,16 +403,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, { expressionS incv; - if (!p->loc.view) + if (!p->loc.u.view) { - p->loc.view = symbol_temp_make (); - gas_assert (!S_IS_DEFINED (p->loc.view)); + p->loc.u.view = symbol_temp_make (); + gas_assert (!S_IS_DEFINED (p->loc.u.view)); } memset (&incv, 0, sizeof (incv)); incv.X_unsigned = 1; incv.X_op = O_symbol; - incv.X_add_symbol = p->loc.view; + incv.X_add_symbol = p->loc.u.view; incv.X_add_number = 1; if (viewx.X_op == O_constant) @@ -430,16 +429,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, } } - if (!S_IS_DEFINED (e->loc.view)) + if (!S_IS_DEFINED (e->loc.u.view)) { - symbol_set_value_expression (e->loc.view, &viewx); - S_SET_SEGMENT (e->loc.view, expr_section); - symbol_set_frag (e->loc.view, &zero_address_frag); + symbol_set_value_expression (e->loc.u.view, &viewx); + S_SET_SEGMENT (e->loc.u.view, expr_section); + symbol_set_frag (e->loc.u.view, &zero_address_frag); } /* Define and attempt to simplify any earlier views needed to compute E's. */ - if (h && p && p->loc.view && !S_IS_DEFINED (p->loc.view)) + if (h && p && p->loc.u.view && !S_IS_DEFINED (p->loc.u.view)) { struct line_entry *h2; /* Reverse the list to avoid quadratic behavior going backwards @@ -459,7 +458,9 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, break; set_or_check_view (r, r->next, NULL); } - while (r->next && r->next->loc.view && !S_IS_DEFINED (r->next->loc.view) + while (r->next + && r->next->loc.u.view + && !S_IS_DEFINED (r->next->loc.u.view) && (r = r->next)); /* Unreverse the list, so that we can go forward again. */ @@ -475,14 +476,14 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, view of the previous subsegment. */ if (r == h) continue; - gas_assert (S_IS_DEFINED (r->loc.view)); - resolve_expression (symbol_get_value_expression (r->loc.view)); + gas_assert (S_IS_DEFINED (r->loc.u.view)); + resolve_expression (symbol_get_value_expression (r->loc.u.view)); } while (r != p && (r = r->next)); /* Now that we've defined and computed all earlier views that might be needed to compute E's, attempt to simplify it. */ - resolve_expression (symbol_get_value_expression (e->loc.view)); + resolve_expression (symbol_get_value_expression (e->loc.u.view)); } } @@ -518,10 +519,8 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc) /* Subseg heads are chained to previous subsegs in dwarf2_finish. */ - if (loc->view && lss->head) - set_or_check_view (e, - (struct line_entry *)lss->ptail, - lss->head); + if (loc->filenum != -1u && loc->u.view && lss->head) + set_or_check_view (e, (struct line_entry *) lss->ptail, lss->head); *lss->ptail = e; lss->ptail = &e->next; @@ -532,9 +531,6 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc) void dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc) { - static unsigned int line = -1; - static unsigned int filenum = -1; - symbolS *sym; /* Early out for as-yet incomplete location information. */ @@ -552,20 +548,35 @@ dwarf2_gen_line_info (addressT ofs, struct dwarf2_line_info *loc) symbols apply to assembler code. It is necessary to emit duplicate line symbols when a compiler asks for them, because GDB uses them to determine the end of the prologue. */ - if (debug_type == DEBUG_DWARF2 - && line == loc->line && filenum == loc->filenum) - return; + if (debug_type == DEBUG_DWARF2) + { + static unsigned int line = -1; + static const char *filename = NULL; + + if (line == loc->line) + { + if (filename == loc->u.filename) + return; + if (filename_cmp (filename, loc->u.filename) == 0) + { + filename = loc->u.filename; + return; + } + } - line = loc->line; - filenum = loc->filenum; + line = loc->line; + filename = loc->u.filename; + } if (linkrelax) { - char name[120]; + static int label_num = 0; + char name[32]; /* Use a non-fake name for the line number location, so that it can be referred to by relocations. */ - sprintf (name, ".Loc.%u.%u", line, filenum); + sprintf (name, ".Loc.%u", label_num); + label_num++; sym = symbol_new (name, now_seg, frag_now, ofs); } else @@ -624,13 +635,15 @@ get_directory_table_entry (const char *dirname, { const char * pwd = file0_dirname ? file0_dirname : getpwd (); - if (dwarf_level >= 5 && strcmp (dirname, pwd) != 0) + if (dwarf_level >= 5 && filename_cmp (dirname, pwd) != 0) { - /* In DWARF-5 the 0 entry in the directory table is expected to be - the same as the DW_AT_comp_dir (which is set to the current build - directory). Since we are about to create a directory entry that - is not the same, allocate the current directory first. - FIXME: Alternatively we could generate an error message here. */ + /* In DWARF-5 the 0 entry in the directory table is + expected to be the same as the DW_AT_comp_dir (which + is set to the current build directory). Since we are + about to create a directory entry that is not the + same, allocate the current directory first. + FIXME: Alternatively we could generate an error + message here. */ (void) get_directory_table_entry (pwd, NULL, strlen (pwd), true); d = 1; @@ -745,14 +758,30 @@ allocate_filenum (const char * pathname) if (!assign_file_to_slot (i, file, dir)) return -1; - num_of_auto_assigned++; - last_used = i; last_used_dir_len = dir_len; return i; } +/* Run through the list of line entries starting at E, allocating + file entries for gas generated debug. */ + +static void +do_allocate_filenum (struct line_entry *e) +{ + do + { + if (e->loc.filenum == -1u) + { + e->loc.filenum = allocate_filenum (e->loc.u.filename); + e->loc.u.view = NULL; + } + e = e->next; + } + while (e); +} + /* Allocate slot NUM in the .debug_line file table to FILENAME. If DIRNAME is not NULL or there is a directory component to FILENAME then this will be stored in the directory table, if not already present. @@ -929,17 +958,12 @@ dwarf2_where (struct dwarf2_line_info *line) { if (debug_type == DEBUG_DWARF2) { - const char *filename; - - memset (line, 0, sizeof (*line)); - filename = as_where (&line->line); - line->filenum = allocate_filenum (filename); - /* FIXME: We should check the return value from allocate_filenum. */ + line->u.filename = as_where (&line->line); + line->filenum = -1u; line->column = 0; line->flags = DWARF2_FLAG_IS_STMT; line->isa = current.isa; line->discriminator = current.discriminator; - line->view = NULL; } else *line = current; @@ -1018,7 +1042,7 @@ dwarf2_consume_line_info (void) | DWARF2_FLAG_PROLOGUE_END | DWARF2_FLAG_EPILOGUE_BEGIN); current.discriminator = 0; - current.view = NULL; + current.u.view = NULL; } /* Called for each (preferably code) label. If dwarf2_loc_mark_labels @@ -1060,7 +1084,6 @@ dwarf2_directive_filename (void) char *filename; const char * dirname = NULL; int filename_len; - unsigned int i; /* Continue to accept a bare string and pass it off. */ SKIP_WHITESPACE (); @@ -1132,18 +1155,6 @@ dwarf2_directive_filename (void) return NULL; } - if (num_of_auto_assigned) - { - /* Clear slots auto-assigned before the first .file <NUMBER> - directive was seen. */ - if (files_in_use != (num_of_auto_assigned + 1)) - abort (); - for (i = 1; i < files_in_use; i++) - files[i].filename = NULL; - files_in_use = 0; - num_of_auto_assigned = 0; - } - if (! allocate_filename_to_slot (dirname, filename, (unsigned int) num, with_md5)) return NULL; @@ -1191,6 +1202,11 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) return; } + /* debug_type will be turned off by dwarf2_directive_filename, and + if we don't have a dwarf style .file then files_in_use will be + zero and the above error will trigger. */ + gas_assert (debug_type == DEBUG_NONE); + current.filenum = filenum; current.line = line; current.discriminator = 0; @@ -1333,7 +1349,7 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) S_SET_VALUE (sym, 0); symbol_set_frag (sym, &zero_address_frag); } - current.view = sym; + current.u.view = sym; } else { @@ -1347,10 +1363,9 @@ dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) demand_empty_rest_of_line (); dwarf2_any_loc_directive_seen = dwarf2_loc_directive_seen = true; - debug_type = DEBUG_NONE; /* If we were given a view id, emit the row right away. */ - if (current.view) + if (current.u.view) dwarf2_emit_insn (0); } @@ -1984,7 +1999,7 @@ process_entries (segT seg, struct line_entry *e) frag_ofs = S_GET_VALUE (lab); if (last_frag == NULL - || (e->loc.view == force_reset_view && force_reset_view + || (e->loc.u.view == force_reset_view && force_reset_view /* If we're going to reset the view, but we know we're advancing the PC, we don't have to force with set_address. We know we do when we're at the same @@ -2850,16 +2865,19 @@ dwarf2_finish (void) struct line_subseg *lss = s->head; struct line_entry **ptail = lss->ptail; + if (lss->head && SEG_NORMAL (s->seg)) + do_allocate_filenum (lss->head); + /* Reset the initial view of the first subsection of the section. */ - if (lss->head && lss->head->loc.view) + if (lss->head && lss->head->loc.u.view) set_or_check_view (lss->head, NULL, NULL); while ((lss = lss->next) != NULL) { /* Link the first view of subsequent subsections to the previous view. */ - if (lss->head && lss->head->loc.view) + if (lss->head && lss->head->loc.u.view) set_or_check_view (lss->head, !s->head ? NULL : (struct line_entry *)ptail, s->head ? s->head->head : NULL); diff --git a/gas/dwarf2dbg.h b/gas/dwarf2dbg.h index 14d770c40dd..700d9dec5cb 100644 --- a/gas/dwarf2dbg.h +++ b/gas/dwarf2dbg.h @@ -36,7 +36,12 @@ struct dwarf2_line_info unsigned int isa; unsigned int flags; unsigned int discriminator; - symbolS *view; + /* filenum == -1u chooses filename, otherwise view. */ + union + { + symbolS *view; + const char *filename; + } u; }; /* Implements the .file FILENO "FILENAME" directive. FILENO can be 0 diff --git a/gas/testsuite/gas/i386/dwarf2-line-3.s b/gas/testsuite/gas/i386/dwarf2-line-3.s index 2085ef93940..e933719fbc3 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-3.s +++ b/gas/testsuite/gas/i386/dwarf2-line-3.s @@ -7,6 +7,7 @@ main: .cfi_startproc nop + .file 1 "dwarf2-test.c" .loc 1 1 ret .cfi_endproc diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.d b/gas/testsuite/gas/i386/dwarf2-line-4.d index c0c85f4639f..a01fd0540f3 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-4.d +++ b/gas/testsuite/gas/i386/dwarf2-line-4.d @@ -33,11 +33,14 @@ Raw dump of debug contents of section \.z?debug_line: The File Name Table $offset 0x.*$: Entry Dir Time Size Name - 1 1 0 0 dwarf2-line-4.s + 1 0 0 0 dwarf2-test.c + 2 1 0 0 dwarf2-line-4.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Special opcode 13: advance Address by 0 to 0x0 and Line by 8 to 9 + \[0x.*\] Set File Name to entry 1 in the File Name Table \[0x.*\] Advance Line by -8 to 1 \[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 1 \[0x.*\] Advance PC by 1 to 0x2 diff --git a/gas/testsuite/gas/i386/dwarf2-line-4.s b/gas/testsuite/gas/i386/dwarf2-line-4.s index 89bb62d9db7..7348f4be62c 100644 --- a/gas/testsuite/gas/i386/dwarf2-line-4.s +++ b/gas/testsuite/gas/i386/dwarf2-line-4.s @@ -7,6 +7,7 @@ main: .cfi_startproc nop + .file 1 "dwarf2-test.c" .loc 1 1 ret .cfi_endproc diff --git a/gas/testsuite/gas/i386/dwarf4-line-1.d b/gas/testsuite/gas/i386/dwarf4-line-1.d index 4f8321e9bfd..8199efbb0c2 100644 --- a/gas/testsuite/gas/i386/dwarf4-line-1.d +++ b/gas/testsuite/gas/i386/dwarf4-line-1.d @@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line: Entry Dir Time Size Name 1 0 0 0 foo.c 2 0 0 0 foo.h + 3 1 0 0 dwarf4-line-1.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Advance Line by 81 to 82 \[0x.*\] Copy - \[0x.*\] Set File Name to entry 2 in the File Name Table + \[0x.*\] Set File Name to entry 3 in the File Name Table \[0x.*\] Advance Line by -73 to 9 \[0x.*\] Special opcode 19: advance Address by 1 to 0x1 and Line by 0 to 9 \[0x.*\] Advance PC by 3 to 0x4 diff --git a/gas/testsuite/gas/i386/dwarf5-line-1.d b/gas/testsuite/gas/i386/dwarf5-line-1.d index f57fc47d269..2c2cf5696c4 100644 --- a/gas/testsuite/gas/i386/dwarf5-line-1.d +++ b/gas/testsuite/gas/i386/dwarf5-line-1.d @@ -36,12 +36,14 @@ Raw dump of debug contents of section \.z?debug_line: 0 $indirect line string, offset: 0x.*$: .*/gas/testsuite 1 $indirect line string, offset: 0x.*$: .*/gas/testsuite/gas/i386 - The File Name Table $offset 0x.*, lines 2, columns 3$: + The File Name Table $offset 0x.*, lines 3, columns 3$: Entry Dir MD5 Name 0 0 0xbbd69fc03ce253b2dbaab2522dd519ae $indirect line string, offset: 0x.*$: core.c 1 0 0x0 $indirect line string, offset: 0x.*$: types.h + 2 1 0x0 $indirect line string, offset: 0x.*$: dwarf5-line-1.s Line Number Statements: + \[0x.*\] Set File Name to entry 2 in the File Name Table \[0x.*\] Extended opcode 2: set Address to 0x0 \[0x.*\] Special opcode 8: advance Address by 0 to 0x0 and Line by 3 to 4 \[0x.*\] Advance PC by 1 to 0x1 diff --git a/gas/testsuite/gas/i386/dwarf5-line-2.d b/gas/testsuite/gas/i386/dwarf5-line-2.d index 2f96df510d0..85f98c8ab9c 100644 --- a/gas/testsuite/gas/i386/dwarf5-line-2.d +++ b/gas/testsuite/gas/i386/dwarf5-line-2.d @@ -36,9 +36,10 @@ Raw dump of debug contents of section \.z?debug_line: 0 $indirect line string, offset: 0x.*$: .*/gas/testsuite 1 $indirect line string, offset: 0x.*$: .*/gas/testsuite/gas/i386 - The File Name Table $offset 0x.*, lines 1, columns 3$: + The File Name Table $offset 0x.*, lines 2, columns 3$: Entry Dir MD5 Name 0 0 0xbbd69fc03ce253b2dbaab2522dd519ae $indirect line string, offset: 0x.*$: core.c + 1 1 0x0 $indirect line string, offset: .*$: dwarf5-line-2.s Line Number Statements: \[0x.*\] Extended opcode 2: set Address to 0x0 </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] 447.dealII grew in size by 2% after gcc: libstdc++: Fix and improve std::vector<bool> implementation.

by ci_notify＠linaro.org

After gcc commit 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f Author: François Dumont <fdumont(a)gcc.gnu.org> libstdc++: Fix and improve std::vector<bool> implementation. the following benchmarks grew in size by more than 1%: - 447.dealII grew in size by 2% from 348834 to 356146 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their latest release branch - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-gcc-6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f cd investigate-gcc-6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5f9669d9e23a1116e040c80e0f3d4f43639bda52 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 6f00ccbad3d72a39d9e2bc0d500dbd62d1abc60f Author: François Dumont <fdumont(a)gcc.gnu.org> Date: Tue Jan 21 07:18:08 2020 +0100 libstdc++: Fix and improve std::vector<bool> implementation. Do not consider allocator noexcept qualification for vector<bool> move constructor. Improve swap performance using TBAA like in main vector implementation. Bypass _M_initialize_dispatch/_M_assign_dispatch in post-c++11 modes. libstdc++-v3/ChangeLog: * include/bits/stl_bvector.h [_GLIBCXX_INLINE_VERSION](_Bvector_impl_data::_M_start): Define as _Bit_type*. (_Bvector_impl_data(const _Bvector_impl_data&)): Default. (_Bvector_impl_data(_Bvector_impl_data&&)): Delegate to latter. (_Bvector_impl_data::operator=(const _Bvector_impl_data&)): Default. (_Bvector_impl_data::_M_move_data(_Bvector_impl_data&&)): Use latter. (_Bvector_impl_data::_M_reset()): Likewise. (_Bvector_impl_data::_M_swap_data): New. (_Bvector_impl::_Bvector_impl(_Bvector_impl&&)): Implement explicitely. (_Bvector_impl::_Bvector_impl(_Bit_alloc_type&&, _Bvector_impl&&)): New. (_Bvector_base::_Bvector_base(_Bvector_base&&, const allocator_type&)): New, use latter. (vector::vector(vector&&, const allocator_type&, true_type)): New, use latter. (vector::vector(vector&&, const allocator_type&, false_type)): New. (vector::vector(vector&&, const allocator_type&)): Use latters. (vector::vector(const vector&, const allocator_type&)): Adapt. [__cplusplus >= 201103](vector::vector(_InputIt, _InputIt, const allocator_type&)): Use _M_initialize_range. (vector::operator[](size_type)): Use iterator operator[]. (vector::operator[](size_type) const): Use const_iterator operator[]. (vector::swap(vector&)): Add assertions on allocators. Use _M_swap_data. [__cplusplus >= 201103](vector::insert(const_iterator, _InputIt, _InputIt)): Use _M_insert_range. (vector::_M_initialize(size_type)): Adapt. [__cplusplus >= 201103](vector::_M_initialize_dispatch): Remove. [__cplusplus >= 201103](vector::_M_insert_dispatch): Remove. * python/libstdcxx/v6/printers.py (StdVectorPrinter._iterator): Stop using start _M_offset. (StdVectorPrinter.to_string): Likewise. * testsuite/23_containers/vector/bool/allocator/swap.cc: Adapt. * testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc: Add check. --- libstdc++-v3/include/bits/stl_bvector.h | 140 +++++++++++++-------- libstdc++-v3/python/libstdcxx/v6/printers.py | 5 +- .../23_containers/vector/bool/allocator/swap.cc | 22 ++-- .../vector/bool/cons/noexcept_move_construct.cc | 32 ++++- 4 files changed, 130 insertions(+), 69 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index a365e7182eb..d6f5435bdfb 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -427,53 +427,75 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER struct _Bvector_impl_data { - _Bit_iterator _M_start; - _Bit_iterator _M_finish; - _Bit_pointer _M_end_of_storage; +#if !_GLIBCXX_INLINE_VERSION + _Bit_iterator _M_start; +#else + // We don't need the offset field for the start, it's always zero. + struct { + _Bit_type* _M_p; + // Allow assignment from iterators (assume offset is zero): + void operator=(_Bit_iterator __it) { _M_p = __it._M_p; } + } _M_start; +#endif + _Bit_iterator _M_finish; + _Bit_pointer _M_end_of_storage; _Bvector_impl_data() _GLIBCXX_NOEXCEPT : _M_start(), _M_finish(), _M_end_of_storage() { } #if __cplusplus >= 201103L + _Bvector_impl_data(const _Bvector_impl_data&) = default; + _Bvector_impl_data& + operator=(const _Bvector_impl_data&) = default; + _Bvector_impl_data(_Bvector_impl_data&& __x) noexcept - : _M_start(__x._M_start), _M_finish(__x._M_finish) - , _M_end_of_storage(__x._M_end_of_storage) + : _Bvector_impl_data(__x) { __x._M_reset(); } void _M_move_data(_Bvector_impl_data&& __x) noexcept { - this->_M_start = __x._M_start; - this->_M_finish = __x._M_finish; - this->_M_end_of_storage = __x._M_end_of_storage; + *this = __x; __x._M_reset(); } #endif void _M_reset() _GLIBCXX_NOEXCEPT + { *this = _Bvector_impl_data(); } + + void + _M_swap_data(_Bvector_impl_data& __x) _GLIBCXX_NOEXCEPT { - _M_start = _M_finish = _Bit_iterator(); - _M_end_of_storage = _Bit_pointer(); + // Do not use std::swap(_M_start, __x._M_start), etc as it loses + // information used by TBAA. + std::swap(*this, __x); } }; struct _Bvector_impl : public _Bit_alloc_type, public _Bvector_impl_data - { - public: - _Bvector_impl() _GLIBCXX_NOEXCEPT_IF( - is_nothrow_default_constructible<_Bit_alloc_type>::value) - : _Bit_alloc_type() - { } + { + _Bvector_impl() _GLIBCXX_NOEXCEPT_IF( + is_nothrow_default_constructible<_Bit_alloc_type>::value) + : _Bit_alloc_type() + { } - _Bvector_impl(const _Bit_alloc_type& __a) _GLIBCXX_NOEXCEPT - : _Bit_alloc_type(__a) - { } + _Bvector_impl(const _Bit_alloc_type& __a) _GLIBCXX_NOEXCEPT + : _Bit_alloc_type(__a) + { } #if __cplusplus >= 201103L - _Bvector_impl(_Bvector_impl&&) = default; + // Not defaulted, to enforce noexcept(true) even when + // !is_nothrow_move_constructible<_Bit_alloc_type>. + _Bvector_impl(_Bvector_impl&& __x) noexcept + : _Bit_alloc_type(std::move(__x)), _Bvector_impl_data(std::move(__x)) + { } + + _Bvector_impl(_Bit_alloc_type&& __a, _Bvector_impl&& __x) noexcept + : _Bit_alloc_type(std::move(__a)), _Bvector_impl_data(std::move(__x)) + { } #endif _Bit_type* @@ -511,6 +533,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #if __cplusplus >= 201103L _Bvector_base(_Bvector_base&&) = default; + + _Bvector_base(_Bvector_base&& __x, const allocator_type& __a) noexcept + : _M_impl(_Bit_alloc_type(__a), std::move(__x._M_impl)) + { } #endif ~_Bvector_base() @@ -647,14 +673,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER : _Base(_Bit_alloc_traits::_S_select_on_copy(__x._M_get_Bit_allocator())) { _M_initialize(__x.size()); - _M_copy_aligned(__x.begin(), __x.end(), this->_M_impl._M_start); + _M_copy_aligned(__x.begin(), __x.end(), begin()); } #if __cplusplus >= 201103L vector(vector&&) = default; - vector(vector&& __x, const allocator_type& __a) - noexcept(_Bit_alloc_traits::_S_always_equal()) + private: + vector(vector&& __x, const allocator_type& __a, true_type) noexcept + : _Base(std::move(__x), __a) + { } + + vector(vector&& __x, const allocator_type& __a, false_type) : _Base(__a) { if (__x.get_allocator() == __a) @@ -667,11 +697,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } + public: + vector(vector&& __x, const allocator_type& __a) + noexcept(_Bit_alloc_traits::_S_always_equal()) + : vector(std::move(__x), __a, + typename _Bit_alloc_traits::is_always_equal{}) + { } + vector(const vector& __x, const allocator_type& __a) : _Base(__a) { _M_initialize(__x.size()); - _M_copy_aligned(__x.begin(), __x.end(), this->_M_impl._M_start); + _M_copy_aligned(__x.begin(), __x.end(), begin()); } vector(initializer_list<bool> __l, @@ -689,13 +726,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER vector(_InputIterator __first, _InputIterator __last, const allocator_type& __a = allocator_type()) : _Base(__a) - { _M_initialize_dispatch(__first, __last, __false_type()); } + { + _M_initialize_range(__first, __last, + std::__iterator_category(__first)); + } #else template<typename _InputIterator> vector(_InputIterator __first, _InputIterator __last, const allocator_type& __a = allocator_type()) : _Base(__a) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_initialize_dispatch(__first, __last, _Integral()); } @@ -762,7 +803,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER vector& operator=(initializer_list<bool> __l) { - this->assign (__l.begin(), __l.end()); + this->assign(__l.begin(), __l.end()); return *this; } #endif @@ -786,6 +827,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void assign(_InputIterator __first, _InputIterator __last) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_assign_dispatch(__first, __last, _Integral()); } @@ -874,17 +916,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER reference operator[](size_type __n) - { - return *iterator(this->_M_impl._M_start._M_p - + __n / int(_S_word_bit), __n % int(_S_word_bit)); - } + { return begin()[__n]; } const_reference operator[](size_type __n) const - { - return *const_iterator(this->_M_impl._M_start._M_p - + __n / int(_S_word_bit), __n % int(_S_word_bit)); - } + { return begin()[__n]; } protected: void @@ -951,10 +987,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void swap(vector& __x) _GLIBCXX_NOEXCEPT { - std::swap(this->_M_impl._M_start, __x._M_impl._M_start); - std::swap(this->_M_impl._M_finish, __x._M_impl._M_finish); - std::swap(this->_M_impl._M_end_of_storage, - __x._M_impl._M_end_of_storage); +#if __cplusplus >= 201103L + __glibcxx_assert(_Bit_alloc_traits::propagate_on_container_swap::value + || _M_get_Bit_allocator() == __x._M_get_Bit_allocator()); +#endif + this->_M_impl._M_swap_data(__x._M_impl); _Bit_alloc_traits::_S_on_swap(_M_get_Bit_allocator(), __x._M_get_Bit_allocator()); } @@ -992,8 +1029,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _InputIterator __first, _InputIterator __last) { difference_type __offset = __position - cbegin(); - _M_insert_dispatch(__position._M_const_cast(), - __first, __last, __false_type()); + _M_insert_range(__position._M_const_cast(), + __first, __last, + std::__iterator_category(__first)); return begin() + __offset; } #else @@ -1002,6 +1040,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER insert(iterator __position, _InputIterator __first, _InputIterator __last) { + // Check whether it's an integral type. If so, it's not an iterator. typedef typename std::__is_integer<_InputIterator>::__type _Integral; _M_insert_dispatch(__position, __first, __last, _Integral()); } @@ -1113,15 +1152,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { _Bit_pointer __q = this->_M_allocate(__n); this->_M_impl._M_end_of_storage = __q + _S_nword(__n); - this->_M_impl._M_start = iterator(std::__addressof(*__q), 0); + iterator __start = iterator(std::__addressof(*__q), 0); + this->_M_impl._M_start = __start; + this->_M_impl._M_finish = __start + difference_type(__n); } - else - { - this->_M_impl._M_end_of_storage = _Bit_pointer(); - this->_M_impl._M_start = iterator(0, 0); - } - this->_M_impl._M_finish = this->_M_impl._M_start + difference_type(__n); - } void @@ -1141,8 +1175,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _M_shrink_to_fit(); #endif - // Check whether it's an integral type. If so, it's not an iterator. - +#if __cplusplus < 201103L // _GLIBCXX_RESOLVE_LIB_DEFECTS // 438. Ambiguity in the "do the right thing" clause template<typename _Integer> @@ -1159,6 +1192,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER __false_type) { _M_initialize_range(__first, __last, std::__iterator_category(__first)); } +#endif template<typename _InputIterator> void @@ -1176,7 +1210,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { const size_type __n = std::distance(__first, __last); _M_initialize(__n); - std::copy(__first, __last, this->_M_impl._M_start); + std::copy(__first, __last, begin()); } #if __cplusplus < 201103L @@ -1240,8 +1274,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } - // Check whether it's an integral type. If so, it's not an iterator. - +#if __cplusplus < 201103L // _GLIBCXX_RESOLVE_LIB_DEFECTS // 438. Ambiguity in the "do the right thing" clause template<typename _Integer> @@ -1257,6 +1290,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER __false_type) { _M_insert_range(__pos, __first, __last, std::__iterator_category(__first)); } +#endif void _M_fill_insert(iterator __position, size_type __n, bool __x); diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py index e4da8dfe5b6..0bf307b8e5f 100644 --- a/libstdc++-v3/python/libstdcxx/v6/printers.py +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py @@ -405,7 +405,7 @@ class StdVectorPrinter: self.bitvec = bitvec if bitvec: self.item = start['_M_p'] - self.so = start['_M_offset'] + self.so = 0 self.finish = finish['_M_p'] self.fo = finish['_M_offset'] itype = self.item.dereference().type @@ -453,12 +453,11 @@ class StdVectorPrinter: end = self.val['_M_impl']['_M_end_of_storage'] if self.is_bool: start = self.val['_M_impl']['_M_start']['_M_p'] - so = self.val['_M_impl']['_M_start']['_M_offset'] finish = self.val['_M_impl']['_M_finish']['_M_p'] fo = self.val['_M_impl']['_M_finish']['_M_offset'] itype = start.dereference().type bl = 8 * itype.sizeof - length = (bl - so) + bl * ((finish - start) - 1) + fo + length = bl * (finish - start) + fo capacity = bl * (end - start) return ('%s<bool> of length %d, capacity %d' % (self.typename, int (length), int (capacity))) diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc b/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc index a8107145c58..793115b473e 100644 --- a/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc +++ b/libstdc++-v3/testsuite/23_containers/vector/bool/allocator/swap.cc @@ -28,19 +28,17 @@ namespace __gnu_test // It is undefined behaviour to swap() containers with unequal allocators // if the allocator doesn't propagate, so ensure the allocators compare // equal, while still being able to test propagation via get_personality(). - bool - operator==(const propagating_allocator<T, false>&, - const propagating_allocator<T, false>&) - { - return true; - } + template<typename Type> + bool + operator==(const propagating_allocator<Type, false>&, + const propagating_allocator<Type, false>&) + { return true; } - bool - operator!=(const propagating_allocator<T, false>&, - const propagating_allocator<T, false>&) - { - return false; - } + template<typename Type> + bool + operator!=(const propagating_allocator<Type, false>&, + const propagating_allocator<Type, false>&) + { return false; } } using __gnu_test::propagating_allocator; diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc b/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc index 03794d8ebd8..296ba33bba8 100644 --- a/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc +++ b/libstdc++-v3/testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc @@ -23,4 +23,34 @@ typedef std::vector<bool> vbtype; -static_assert(std::is_nothrow_move_constructible<vbtype>::value, "Error"); +static_assert( std::is_nothrow_move_constructible<vbtype>::value, + "noexcept move constructor" ); +static_assert( std::is_nothrow_constructible<vbtype, + vbtype&&, const typename vbtype::allocator_type&>::value, + "noexcept move constructor with allocator" ); + +template<typename Type> + class not_noexcept_move_constructor_alloc : public std::allocator<Type> + { + public: + not_noexcept_move_constructor_alloc() noexcept { } + + not_noexcept_move_constructor_alloc( + const not_noexcept_move_constructor_alloc& x) noexcept + : std::allocator<Type>(x) + { } + + not_noexcept_move_constructor_alloc( + not_noexcept_move_constructor_alloc&& x) noexcept(false) + : std::allocator<Type>(std::move(x)) + { } + + template<typename _Tp1> + struct rebind + { typedef not_noexcept_move_constructor_alloc<_Tp1> other; }; + }; + +typedef std::vector<bool, not_noexcept_move_constructor_alloc<bool>> vbtype2; + +static_assert( std::is_nothrow_move_constructible<vbtype2>::value, + "noexcept move constructor with not noexcept alloc" ); </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:9e58de3ce00fc2385c9efb7faf321e0c601f0b0c

by ci_notify＠linaro.org

Identified regression caused by *gcc:9e58de3ce00fc2385c9efb7faf321e0c601f0b0c*: commit 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c Author: Andrew Pinski <apinski(a)marvell.com> Fix PR lto/49664: liblto_plugin.so exports too many symbols Results regressed to (for first_bad == 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 512b383534785f9fc021e700a1fdda86cf0f3fe7) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_lto: 2 This commit has regressed these CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_lto Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Reproduce builds: <cut> mkdir investigate-gcc-9e58de3ce00fc2385c9efb7faf321e0c601f0b0c cd investigate-gcc-9e58de3ce00fc2385c9efb7faf321e0c601f0b0c # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c ../artifacts/test.sh # Reproduce last_good build git checkout --detach 512b383534785f9fc021e700a1fdda86cf0f3fe7 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9e58de3ce00fc2385c9efb7faf321e0c601f0b0c Author: Andrew Pinski <apinski(a)marvell.com> Date: Sun Sep 12 08:58:16 2021 +0000 Fix PR lto/49664: liblto_plugin.so exports too many symbols So right now liblto_plugin.so exports many libiberty symbols and simple_object file symbols but really it just needs to export onload. This fixes the problem by using "-export-symbols-regex onload" on the libtool link line. lto-plugin/ChangeLog: PR lto/49664 * Makefile.am: Export only onload. * Makefile.in: Regenerate. --- lto-plugin/Makefile.am | 3 ++- lto-plugin/Makefile.in | 7 ++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am index 8b20e1d1d87..988d7a78294 100644 --- a/lto-plugin/Makefile.am +++ b/lto-plugin/Makefile.am @@ -21,7 +21,8 @@ in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) liblto_plugin_la_SOURCES = lto-plugin.c # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS. liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) \ - $(lt_host_flags) -module -avoid-version -bindir $(libexecsubdir) + $(lt_host_flags) -module -avoid-version -bindir $(libexecsubdir) \ + -export-symbols-regex onload # Can be simplified when libiberty becomes a normal convenience library. libiberty = $(with_libiberty)/libiberty.a libiberty_noasan = $(with_libiberty)/noasan/libiberty.a diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in index 20611c6b1e6..f8df31bb1e8 100644 --- a/lto-plugin/Makefile.in +++ b/lto-plugin/Makefile.in @@ -323,6 +323,7 @@ prefix = @prefix@ program_transform_name = @program_transform_name@ psdir = @psdir@ real_target_noncanonical = @real_target_noncanonical@ +runstatedir = @runstatedir@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ srcdir = @srcdir@ @@ -350,9 +351,9 @@ libexecsub_LTLIBRARIES = liblto_plugin.la in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) liblto_plugin_la_SOURCES = lto-plugin.c # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS. -liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module -avoid-version \ - -bindir $(libexecsubdir) $(if $(wildcard \ - $(libiberty_noasan)),, $(if $(wildcard \ +liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module \ + -avoid-version -bindir $(libexecsubdir) -export-symbols-regex \ + onload $(if $(wildcard $(libiberty_noasan)),, $(if $(wildcard \ $(libiberty_pic)),,-Wc,$(libiberty))) # Can be simplified when libiberty becomes a normal convenience library. libiberty = $(with_libiberty)/libiberty.a </cut>

3 years, 9 months

3
5
0 0

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit dc746ef741993a7aed1f7fc0083cd7a9636481a3 Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 28705 # linux build successful: all # First few build errors in logs: from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 28705 # linux build successful: all # linux boot successful: boot THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Reproduce builds: <cut> mkdir investigate-binutils-dc746ef741993a7aed1f7fc0083cd7a9636481a3 cd investigate-binutils-dc746ef741993a7aed1f7fc0083cd7a9636481a3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach dc746ef741993a7aed1f7fc0083cd7a9636481a3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach f677852bbdaeac38c7d8ef859905879a21d5bb71 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit dc746ef741993a7aed1f7fc0083cd7a9636481a3 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Thu Sep 16 00:00:07 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 3fc5b8197cf..f7ae1790855 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210915 +#define BFD_VERSION_DATE 20210916 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

3 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 16 Sep

by Peter Maydell

Progress (short week, 3 days) * UM-2 [QEMU upstream maintainership] + more code review, notably the Apple Silicon hvf support, which is nearly ready to go in * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Sent out v2 of the "optimized code gen for MVE" patchset; this now covers all the insns that have an easy optimized version. + Fixed a bug where we weren't correctly setting up FPSCR.LTPSIZE when using QEMU's user-mode-only emulator + Wrote some code to add support for the (not yet finalized) gdbstub XML that tells GDB that the guest CPU has MVE. This causes a GDB with the MVE handling to crash, so one or the other of us has got something wrong :-) KVM Forum was this week, as a 2-day virtual conference. I felt the programme was comparatively a bit small this year, but there were some interesting talks. Also a BoF session on whether/how we should consider adding Rust code to QEMU: I am pushing for (a) a clearer medium-to-long-term vision of where we would be going and why we'd be doing this and (b) more design-sketch type work of "what would XYZ in rust look like", which would hopefully both (a) make the benefit/lack thereof a bit more clear and (b) demonstrate that there are enough people enthusiastic enough about the prospect to make it a success... -- PMM

3 years, 9 months

1
0
0 0

[TCWG CI] 447.dealII,[.] grew in size by 164% after llvm: [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.

by ci_notify＠linaro.org

After llvm commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 Author: Amy Kwan <amy.kwan1(a)ibm.com> [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out. the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%): - 447.dealII,[.] contract<3> grew in size by 164% Benchmark: Toolchain: Clang + Glibc + LLVM Linker Version: all components were built from their latest release branch Target: aarch64-linux-gnu Compiler flags: -Oz Hardware: APM Mustang 8x X-Gene1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Oz First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 cd investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 ../artifacts/test.sh # Reproduce last_good build git checkout --detach c8905f1bb304f1cfe297312ae0dda9946cb27594 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 Author: Amy Kwan <amy.kwan1(a)ibm.com> Date: Fri Sep 3 14:53:57 2021 -0400 [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out. It appears when testing LLVM 13 on Power, we run into failures with the `libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp` test case optimizing values out. Despite some the functions in the test already being marked with optnone, adding the `MarkAsLive()` calls inside of the pretty printer comparison functions resolves the issues of the values being optimized out. This patch aims to address https://llvm.org/PR51675. Differential Revision: https://reviews.llvm.org/D109204 (cherry picked from commit 217c6d643124be312f4a99b203118744edb9d54c) --- libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp index 2d8e9620089a..7c8d307d19fb 100644 --- a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp +++ b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp @@ -92,24 +92,28 @@ void MarkAsLive(Type &&) {} template <typename TypeToPrint> void ComparePrettyPrintToChars( TypeToPrint value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } template <typename TypeToPrint> void ComparePrettyPrintToRegex( TypeToPrint value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } void CompareExpressionPrettyPrintToChars( std::string value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } void CompareExpressionPrettyPrintToRegex( std::string value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } </cut>

3 years, 9 months

2
1
0 0

[TCWG CI] 456.hmmer slowed down by 4% after gcc: c++ ICE with nested requirement as default tpl parm[PR94827]

by ci_notify＠linaro.org

After gcc commit c416c52bcdb120db5e8c53a51bd78c4360daf79b Author: Nathan Sidwell <nathan(a)acm.org> c++ ICE with nested requirement as default tpl parm[PR94827] the following benchmarks slowed down by more than 2%: - 456.hmmer slowed down by 4% Benchmark: Toolchain: GCC + Glibc + GNU Linker Version: all components were built from their latest release branch Target: aarch64-linux-gnu Compiler flags: -O3 -flto Hardware: NVidia TX1 4x Cortex-A57 This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Reproduce builds: <cut> mkdir investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b cd investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach c416c52bcdb120db5e8c53a51bd78c4360daf79b ../artifacts/test.sh # Reproduce last_good build git checkout --detach b1983f4582bbe060b7da83578acb9ed653681fc8 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit c416c52bcdb120db5e8c53a51bd78c4360daf79b Author: Nathan Sidwell <nathan(a)acm.org> Date: Thu Apr 30 08:23:16 2020 -0700 c++ ICE with nested requirement as default tpl parm[PR94827] Template headers are not incrementally updated as we parse its parameters. We maintain a dummy level until the closing > when we replace the dummy with a real parameter set. requires processing was expecting a properly populated arg_vec in current_template_parms, and then creates a self-mapping of parameters from that. But we don't need to do that, just teach map_arguments to look at TREE_VALUE when args is NULL. * constraint.cc (map_arguments): If ARGS is null, it's a self-mapping of parms. (finish_nested_requirement): Do not pass argified current_template_parms to normalization. (tsubst_nested_requirement): Don't assert no template parms. --- gcc/cp/ChangeLog | 10 ++++++++++ gcc/cp/constraint.cc | 27 ++++++++++++++++----------- gcc/testsuite/g++.dg/concepts/pr94827.C | 15 +++++++++++++++ 3 files changed, 41 insertions(+), 11 deletions(-) diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog index 1fa0e123cb1..3c57945cecf 100644 --- a/gcc/cp/ChangeLog +++ b/gcc/cp/ChangeLog @@ -1,3 +1,13 @@ +2020-04-30 Jason Merrill <jason(a)redhat.com> + Nathan Sidwell <nathan(a)acm.org> + + PR c++/94827 + * constraint.cc (map_arguments): If ARGS is null, it's a + self-mapping of parms. + (finish_nested_requirement): Do not pass argified + current_template_parms to normalization. + (tsubst_nested_requirement): Don't assert no template parms. + 2020-04-30 Iain Sandoe <iain(a)sandoe.co.uk> PR c++/94886 diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 866b0f51b05..85513fecf43 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -546,12 +546,16 @@ static tree map_arguments (tree parms, tree args) { for (tree p = parms; p; p = TREE_CHAIN (p)) - { - int level; - int index; - template_parm_level_and_index (TREE_VALUE (p), &level, &index); - TREE_PURPOSE (p) = TMPL_ARG (args, level, index); - } + if (args) + { + int level; + int index; + template_parm_level_and_index (TREE_VALUE (p), &level, &index); + TREE_PURPOSE (p) = TMPL_ARG (args, level, index); + } + else + TREE_PURPOSE (p) = TREE_VALUE (p); + return parms; } @@ -2005,8 +2009,6 @@ tsubst_compound_requirement (tree t, tree args, subst_info info) static tree tsubst_nested_requirement (tree t, tree args, subst_info info) { - gcc_assert (!uses_template_parms (args)); - /* Ensure that we're in an evaluation context prior to satisfaction. */ tree norm = TREE_VALUE (TREE_TYPE (t)); tree result = satisfy_constraint (norm, args, info); @@ -2953,12 +2955,15 @@ finish_compound_requirement (location_t loc, tree expr, tree type, bool noexcept tree finish_nested_requirement (location_t loc, tree expr) { + /* Currently open template headers have dummy arg vectors, so don't + pass into normalization. */ + tree norm = normalize_constraint_expression (expr, NULL_TREE, false); + tree args = current_template_parms + ? template_parms_to_args (current_template_parms) : NULL_TREE; + /* Save the normalized constraint and complete set of normalization arguments with the requirement. We keep the complete set of arguments around for re-normalization during diagnostics. */ - tree args = current_template_parms - ? template_parms_to_args (current_template_parms) : NULL_TREE; - tree norm = normalize_constraint_expression (expr, args, false); tree info = build_tree_list (args, norm); /* Build the constraint, saving its normalization as its type. */ diff --git a/gcc/testsuite/g++.dg/concepts/pr94827.C b/gcc/testsuite/g++.dg/concepts/pr94827.C new file mode 100644 index 00000000000..f14ec2551a1 --- /dev/null +++ b/gcc/testsuite/g++.dg/concepts/pr94827.C @@ -0,0 +1,15 @@ +// PR 94287 ICE looking inside open template-parm level +// { dg-do run { target c++17 } } +// { dg-options -fconcepts } + +template <typename T, + bool X = requires { requires (sizeof(T)==1); } > + int foo(T) { return X; } + +int main() { + if (!foo('4')) + return 1; + if (foo (4)) + return 2; + return 0; +} </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm: Inform pass manager when child loops are deleted

by ci_notify＠linaro.org

After llvm commit f17d60d620283b5d53286056ceeaeb8c27b6530a Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com> Inform pass manager when child loops are deleted Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a cd investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach f17d60d620283b5d53286056ceeaeb8c27b6530a ../artifacts/test.sh # Reproduce last_good build git checkout --detach f56129fe78d5c849971017976c71333b6b1a27c6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f17d60d620283b5d53286056ceeaeb8c27b6530a Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com> Date: Fri Sep 3 20:50:33 2021 +0200 Inform pass manager when child loops are deleted As part of the nontrivial unswitching we could end up removing child loops. This patch add a notification to the pass manager when that happens (using the markLoopAsDeleted callback). Without this there could be stale LoopAccessAnalysis results cached in the analysis manager. Those analysis results are cached based on a Loop* as key. Since the BumpPtrAllocator used to allocate Loop objects could be resetted between different runs of for example the loop-distribute pass (running on different functions), a new Loop object could be created using the same Loop pointer. And then when requiring the LoopAccessAnalysis for the loop we got the stale (corrupt) result from the destroyed loop. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D109257 (fixes PR51754) (cherry-picked from commit 0f0344dd1e3b53387bb396070916e67f4c426da6) --- llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | 43 +++++++++---- .../nontrivial-unswitch-markloopasdeleted.ll | 71 ++++++++++++++++++++++ 2 files changed, 102 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp index b9cccc2af309..b1c105258027 100644 --- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp +++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp @@ -1587,10 +1587,12 @@ deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks, BB->eraseFromParent(); } -static void deleteDeadBlocksFromLoop(Loop &L, - SmallVectorImpl<BasicBlock *> &ExitBlocks, - DominatorTree &DT, LoopInfo &LI, - MemorySSAUpdater *MSSAU) { +static void +deleteDeadBlocksFromLoop(Loop &L, + SmallVectorImpl<BasicBlock *> &ExitBlocks, + DominatorTree &DT, LoopInfo &LI, + MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { // Find all the dead blocks tied to this loop, and remove them from their // successors. SmallSetVector<BasicBlock *, 8> DeadBlockSet; @@ -1640,6 +1642,7 @@ static void deleteDeadBlocksFromLoop(Loop &L, }) && "If the child loop header is dead all blocks in the child loop must " "be dead as well!"); + DestroyLoopCB(*ChildL, ChildL->getName()); LI.destroy(ChildL); return true; }); @@ -1980,6 +1983,8 @@ static bool rebuildLoopAfterUnswitch(Loop &L, ArrayRef<BasicBlock *> ExitBlocks, ParentL->removeChildLoop(llvm::find(*ParentL, &L)); else LI.removeLoop(llvm::find(LI, &L)); + // markLoopAsDeleted for L should be triggered by the caller (it is typically + // done by using the UnswitchCB callback). LI.destroy(&L); return false; } @@ -2019,7 +2024,8 @@ static void unswitchNontrivialInvariants( SmallVectorImpl<BasicBlock *> &ExitBlocks, IVConditionInfo &PartialIVInfo, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { auto *ParentBB = TI.getParent(); BranchInst *BI = dyn_cast<BranchInst>(&TI); SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI); @@ -2319,7 +2325,7 @@ static void unswitchNontrivialInvariants( // Now that our cloned loops have been built, we can update the original loop. // First we delete the dead blocks from it and then we rebuild the loop // structure taking these deletions into account. - deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU); + deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU, DestroyLoopCB); if (MSSAU && VerifyMemorySSA) MSSAU->getMemorySSA()->verifyMemorySSA(); @@ -2670,7 +2676,8 @@ static bool unswitchBestCondition( Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, AAResults &AA, TargetTransformInfo &TTI, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { // Collect all invariant conditions within this loop (as opposed to an inner // loop which would be handled when visiting that inner loop). SmallVector<std::pair<Instruction *, TinyPtrVector<Value *>>, 4> @@ -2958,7 +2965,7 @@ static bool unswitchBestCondition( << "\n"); unswitchNontrivialInvariants(L, *BestUnswitchTI, BestUnswitchInvariants, ExitBlocks, PartialIVInfo, DT, LI, AC, - UnswitchCB, SE, MSSAU); + UnswitchCB, SE, MSSAU, DestroyLoopCB); return true; } @@ -2988,7 +2995,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, AAResults &AA, TargetTransformInfo &TTI, bool Trivial, bool NonTrivial, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { assert(L.isRecursivelyLCSSAForm(DT, LI) && "Loops must be in LCSSA form before unswitching."); @@ -3036,7 +3044,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, // Try to unswitch the best invariant condition. We prefer this full unswitch to // a partial unswitch when possible below the threshold. - if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU)) + if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU, + DestroyLoopCB)) return true; // No other opportunities to unswitch. @@ -3083,6 +3092,10 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM, U.markLoopAsDeleted(L, LoopName); }; + auto DestroyLoopCB = [&U](Loop &L, StringRef Name) { + U.markLoopAsDeleted(L, Name); + }; + Optional<MemorySSAUpdater> MSSAU; if (AR.MSSA) { MSSAU = MemorySSAUpdater(AR.MSSA); @@ -3091,7 +3104,8 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM, } if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.AA, AR.TTI, Trivial, NonTrivial, UnswitchCB, &AR.SE, - MSSAU.hasValue() ? MSSAU.getPointer() : nullptr)) + MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, + DestroyLoopCB)) return PreservedAnalyses::all(); if (AR.MSSA && VerifyMemorySSA) @@ -3179,12 +3193,17 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) { LPM.markLoopAsDeleted(*L); }; + auto DestroyLoopCB = [&LPM](Loop &L, StringRef /* Name */) { + LPM.markLoopAsDeleted(L); + }; + if (MSSA && VerifyMemorySSA) MSSA->verifyMemorySSA(); bool Changed = unswitchLoop(*L, DT, LI, AC, AA, TTI, true, NonTrivial, UnswitchCB, SE, - MSSAU.hasValue() ? MSSAU.getPointer() : nullptr); + MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, + DestroyLoopCB); if (MSSA && VerifyMemorySSA) MSSA->verifyMemorySSA(); diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll new file mode 100644 index 000000000000..455a38535576 --- /dev/null +++ b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll @@ -0,0 +1,71 @@ +; RUN: opt < %s -enable-loop-distribute -passes='loop-distribute,loop-mssa(simple-loop-unswitch<nontrivial>),loop-distribute' -o /dev/null -S -debug-pass-manager=verbose 2>&1 | FileCheck %s + + +; Running loop-distribute will result in LoopAccessAnalysis being required and +; cached in the LoopAnalysisManagerFunctionProxy. +; +; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner<header><latch><exiting> + + +; Then simple-loop-unswitch is removing/replacing some loops (resulting in +; Loop objects used as key in the analyses cache is destroyed). So here we +; want to see that any analysis results cached on the destroyed loop is +; cleared. A special case here is that loop_a_inner is destroyed when +; unswitching the parent loop. +; +; The bug solved and verified by this test case was related to the +; SimpleLoopUnswitch not marking the Loop as removed, so we missed clearing +; the analysis caches. +; +; CHECK: Running pass: SimpleLoopUnswitchPass on Loop at depth 1 containing: %loop_begin<header>,%loop_b,%loop_b_inner,%loop_b_inner_exit,%loop_a,%loop_a_inner,%loop_a_inner_exit,%latch<latch><exiting> +; CHECK-NEXT: Clearing all analysis results for: loop_a_inner + + +; When running loop-distribute the second time we can see that loop_a_inner +; isn't analysed because the loop no longer exists (instead we find a new loop, +; loop_a_inner.us). This kind of verifies that it was correct to remove the +; loop_a_inner related analysis above. +; +; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner.us<header><latch><exiting> + + +define i32 @test6(i1* %ptr, i1 %cond1, i32* %a.ptr, i32* %b.ptr) { +entry: + br label %loop_begin + +loop_begin: + %v = load i1, i1* %ptr + br i1 %cond1, label %loop_a, label %loop_b + +loop_a: + br label %loop_a_inner + +loop_a_inner: + %va = load i1, i1* %ptr + %a = load i32, i32* %a.ptr + br i1 %va, label %loop_a_inner, label %loop_a_inner_exit + +loop_a_inner_exit: + %a.lcssa = phi i32 [ %a, %loop_a_inner ] + br label %latch + +loop_b: + br label %loop_b_inner + +loop_b_inner: + %vb = load i1, i1* %ptr + %b = load i32, i32* %b.ptr + br i1 %vb, label %loop_b_inner, label %loop_b_inner_exit + +loop_b_inner_exit: + %b.lcssa = phi i32 [ %b, %loop_b_inner ] + br label %latch + +latch: + %ab.phi = phi i32 [ %a.lcssa, %loop_a_inner_exit ], [ %b.lcssa, %loop_b_inner_exit ] + br i1 %v, label %loop_begin, label %loop_exit + +loop_exit: + %ab.lcssa = phi i32 [ %ab.phi, %latch ] + ret i32 %ab.lcssa +} </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:d2bb6d512c0f7da8749de51522a0a98f3f08242a

by ci_notify＠linaro.org

Identified regression caused by *llvm:d2bb6d512c0f7da8749de51522a0a98f3f08242a*: commit d2bb6d512c0f7da8749de51522a0a98f3f08242a Author: Min-Yih Hsu <minyihh(a)uci.edu> [X86] Add explicit library dependency on LLVMInstrumentation Results regressed to (for first_bad == d2bb6d512c0f7da8749de51522a0a98f3f08242a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-d2bb6d512c0f7da8749de51522a0a98f3f08242a/results_id: 1 # 433.milc,milc_base.default regressed by 103 # 433.milc,[.] mult_su3_mat_vec regressed by 123 from (for last_good == ef8707574bbc7d264644d9e6730118cc0addd871) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-ef8707574bbc7d264644d9e6730118cc0addd871/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-d2bb6d512c0f7da8749de51522a0a98f3f08242a cd investigate-llvm-d2bb6d512c0f7da8749de51522a0a98f3f08242a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d2bb6d512c0f7da8749de51522a0a98f3f08242a ../artifacts/test.sh # Reproduce last_good build git checkout --detach ef8707574bbc7d264644d9e6730118cc0addd871 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d2bb6d512c0f7da8749de51522a0a98f3f08242a Author: Min-Yih Hsu <minyihh(a)uci.edu> Date: Tue Aug 24 14:07:21 2021 -0700 [X86] Add explicit library dependency on LLVMInstrumentation Patch 9588b685c6b2 introduced dependency on ASAN. But it didn't explicitly put LLVMInstrumentation as one of the library dependencies such that the build will fail if we're building LLVM as shared libraries (i.e. -DBUILD_SHARED_LIBS=ON). This patch explicitly links X86CodeGen against the Instrumentation component. Differential Revision: https://reviews.llvm.org/D108662 --- llvm/lib/Target/X86/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/llvm/lib/Target/X86/CMakeLists.txt b/llvm/lib/Target/X86/CMakeLists.txt index a2816f6e5e84..d13c5d250554 100644 --- a/llvm/lib/Target/X86/CMakeLists.txt +++ b/llvm/lib/Target/X86/CMakeLists.txt @@ -91,6 +91,7 @@ add_llvm_target(X86CodeGen ${sources} AsmPrinter CodeGen Core + Instrumentation MC SelectionDAG Support </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:392aa7d7adfbd84253121d2ef779bf3c627e8d0b

by ci_notify＠linaro.org

Identified regression caused by *gcc:392aa7d7adfbd84253121d2ef779bf3c627e8d0b*: commit 392aa7d7adfbd84253121d2ef779bf3c627e8d0b Author: Jeff Law <law(a)torsion.usersys.redhat.com> Fix some testsuite failures for H8/SX multilibs where short branches where used when long branches were necessary. Results regressed to (for first_bad == 392aa7d7adfbd84253121d2ef779bf3c627e8d0b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-392aa7d7adfbd84253121d2ef779bf3c627e8d0b/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 104 from (for last_good == c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Reproduce builds: <cut> mkdir investigate-gcc-392aa7d7adfbd84253121d2ef779bf3c627e8d0b cd investigate-gcc-392aa7d7adfbd84253121d2ef779bf3c627e8d0b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 392aa7d7adfbd84253121d2ef779bf3c627e8d0b ../artifacts/test.sh # Reproduce last_good build git checkout --detach c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 392aa7d7adfbd84253121d2ef779bf3c627e8d0b Author: Jeff Law <law(a)torsion.usersys.redhat.com> Date: Wed Apr 29 10:19:22 2020 -0400 Fix some testsuite failures for H8/SX multilibs where short branches where used when long branches were necessary. * config/h8300/h8300.md (H8/SX div patterns): All H8/SX specific division instructions are 4 bytes long. --- gcc/ChangeLog | 5 +++++ gcc/config/h8300/h8300.md | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 80064da83ce..a2d4a1b82f4 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2020-04-29 Jeff Law <law(a)redhat.com> + + * config/h8300/h8300.md (H8/SX div patterns): All H8/SX specific + division instructions are 4 bytes long. + 2020-04-29 Jakub Jelinek <jakub(a)redhat.com> PR target/94826 diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md index 3e5cdbeeebe..a86b8ea2074 100644 --- a/gcc/config/h8300/h8300.md +++ b/gcc/config/h8300/h8300.md @@ -1218,7 +1218,7 @@ (match_operand:HSI 2 "reg_or_nibble_operand" "r IP4>X")))] "TARGET_H8300SX" { return <MODE>mode == HImode ? "divu.w\\t%T2,%T0" : "divu.l\\t%S2,%S0"; } - [(set_attr "length" "2")]) + [(set_attr "length" "4")]) (define_insn "div<mode>3" [(set (match_operand:HSI 0 "register_operand" "=r") @@ -1226,7 +1226,7 @@ (match_operand:HSI 2 "reg_or_nibble_operand" "r IP4>X")))] "TARGET_H8300SX" { return <MODE>mode == HImode ? "divs.w\\t%T2,%T0" : "divs.l\\t%S2,%S0"; } - [(set_attr "length" "2")]) + [(set_attr "length" "4")]) (define_insn "udivmodqi4" [(set (match_operand:QI 0 "register_operand" "=r") </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:2eb554a9feafff5188d8b924908205c87d7f2fee

by ci_notify＠linaro.org

Identified regression caused by *llvm:2eb554a9feafff5188d8b924908205c87d7f2fee*: commit 2eb554a9feafff5188d8b924908205c87d7f2fee Author: Roman Lebedev <lebedev.ri(a)gmail.com> Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Results regressed to (for first_bad == 2eb554a9feafff5188d8b924908205c87d7f2fee) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-2eb554a9feafff5188d8b924908205c87d7f2fee/results_id: 1 # 483.xalancbmk,libc.so.6 regressed by 81400 from (for last_good == 7142eb17fb3419a76c9ac4afce0df986ff08d61c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-7142eb17fb3419a76c9ac4afce0df986ff08d61c/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-2eb554a9feafff5188d8b924908205c87d7f2fee cd investigate-llvm-2eb554a9feafff5188d8b924908205c87d7f2fee # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 2eb554a9feafff5188d8b924908205c87d7f2fee ../artifacts/test.sh # Reproduce last_good build git checkout --detach 7142eb17fb3419a76c9ac4afce0df986ff08d61c ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 2eb554a9feafff5188d8b924908205c87d7f2fee Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Mon Aug 16 10:53:15 2021 +0300 Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit 3d9beefc7d713ad8462d92427ccd17b9532ce904. --- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 75 +++------------------- .../SimplifyCFG/fold-branch-to-common-dest.ll | 18 +++--- 2 files changed, 18 insertions(+), 75 deletions(-) diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 68a0388398fc..847fdd760d2f 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -1095,24 +1095,17 @@ static void CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses( // Update (liveout) uses of bonus instructions, // now that the bonus instruction has been cloned into predecessor. - // Note that we expect to be in a block-closed SSA form for this to work! + SSAUpdater SSAUpdate; + SSAUpdate.Initialize(BonusInst.getType(), + (NewBonusInst->getName() + ".merge").str()); + SSAUpdate.AddAvailableValue(BB, &BonusInst); + SSAUpdate.AddAvailableValue(PredBlock, NewBonusInst); for (Use &U : make_early_inc_range(BonusInst.uses())) { auto *UI = cast<Instruction>(U.getUser()); - auto *PN = dyn_cast<PHINode>(UI); - if (!PN) { - assert(UI->getParent() == BB && BonusInst.comesBefore(UI) && - "If the user is not a PHI node, then it should be in the same " - "block as, and come after, the original bonus instruction."); - continue; // Keep using the original bonus instruction. - } - // Is this the block-closed SSA form PHI node? - if (PN->getIncomingBlock(U) == BB) - continue; // Great, keep using the original bonus instruction. - // The only other alternative is an "use" when coming from - // the predecessor block - here we should refer to the cloned bonus instr. - assert(PN->getIncomingBlock(U) == PredBlock && - "Not in block-closed SSA form?"); - U.set(NewBonusInst); + if (UI->getParent() != PredBlock) + SSAUpdate.RewriteUseAfterInsertions(U); + else // Use is in the same block as, and comes before, NewBonusInst. + SSAUpdate.RewriteUse(U); } } } @@ -3039,56 +3032,6 @@ static bool performBranchToCommonDestFolding(BranchInst *BI, BranchInst *PBI, LLVM_DEBUG(dbgs() << "FOLDING BRANCH TO COMMON DEST:\n" << *PBI << *BB); - // We want to duplicate all the bonus instructions in this block, - // and rewrite their uses, but in some cases with self-loops, - // the naive use rewrite approach won't work (will result in miscompilations). - // To avoid this problem, let's form block-closed SSA form. - for (Instruction &BonusInst : - reverse(iterator_range<BasicBlock::iterator>(*BB))) { - auto IsBCSSAUse = [BB, &BonusInst](Use &U) { - auto *UI = cast<Instruction>(U.getUser()); - if (auto *PN = dyn_cast<PHINode>(UI)) - return PN->getIncomingBlock(U) == BB; - return UI->getParent() == BB && BonusInst.comesBefore(UI); - }; - - // Does this instruction require rewriting of uses? - if (all_of(BonusInst.uses(), IsBCSSAUse)) - continue; - - SSAUpdater SSAUpdate; - Type *Ty = BonusInst.getType(); - SmallVector<PHINode *, 8> BCSSAPHIs; - SSAUpdate.Initialize(Ty, BonusInst.getName()); - - // Into each successor block of BB, insert a PHI node, that receives - // the BonusInst when coming from it's basic block, or poison otherwise. - for (BasicBlock *Succ : successors(BB)) { - // The block may have the same successor multiple times. Do it only once. - if (SSAUpdate.HasValueForBlock(Succ)) - continue; - BCSSAPHIs.emplace_back(PHINode::Create( - Ty, 0, BonusInst.getName() + ".bcssa", &Succ->front())); - PHINode *PN = BCSSAPHIs.back(); - for (BasicBlock *PredOfSucc : predecessors(Succ)) - PN->addIncoming(PredOfSucc == BB ? (Value *)&BonusInst - : PoisonValue::get(Ty), - PredOfSucc); - SSAUpdate.AddAvailableValue(Succ, PN); - } - - // And rewrite all uses that break block-closed SSA form. - for (Use &U : make_early_inc_range(BonusInst.uses())) - if (!IsBCSSAUse(U)) - SSAUpdate.RewriteUseAfterInsertions(U); - - // We might not have ended up needing PHI's in all of the succ blocks, - // drop the ones that are certainly unused, but don't bother otherwise. - for (PHINode *PN : BCSSAPHIs) - if (PN->use_empty()) - PN->eraseFromParent(); - } - IRBuilder<> Builder(PBI); // The builder is used to create instructions to eliminate the branch in BB. // If BB's terminator has !annotation metadata, add it to the new diff --git a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll index d948b61d65a0..2ff041826077 100644 --- a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll +++ b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll @@ -834,7 +834,7 @@ define void @pr48450() { ; CHECK-NEXT: entry: ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_BCSSA1:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] +; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_MERGE:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] ; CHECK-NEXT: [[C:%.*]] = call i1 @gen1() ; CHECK-NEXT: br i1 [[C]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]] ; CHECK: for.inc: @@ -849,7 +849,7 @@ define void @pr48450() { ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C2_NOT]], i1 true, i1 [[CMP_NOT]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[IF_END_LOOPEXIT]], label [[FOR_BODYTHREAD_PRE_SPLIT]] ; CHECK: for.bodythread-pre-split: -; CHECK-NEXT: [[DEC_BCSSA1]] = phi i8 [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC]], [[IF_THEN]] ] +; CHECK-NEXT: [[DEC_MERGE]] = phi i8 [ [[DEC]], [[IF_THEN]] ], [ [[DEC_OLD]], [[FOR_INC]] ] ; CHECK-NEXT: call void @sideeffect0() ; CHECK-NEXT: br label [[FOR_BODY]] ; CHECK: if.end.loopexit: @@ -885,7 +885,7 @@ define void @pr48450_2(i1 %enable_loopback) { ; CHECK-NEXT: entry: ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_BCSSA1:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] +; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_MERGE:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] ; CHECK-NEXT: [[C:%.*]] = call i1 @gen1() ; CHECK-NEXT: br i1 [[C]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]] ; CHECK: for.inc: @@ -900,7 +900,7 @@ define void @pr48450_2(i1 %enable_loopback) { ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C2_NOT]], i1 true, i1 [[CMP_NOT]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[IF_END_LOOPEXIT]], label [[FOR_BODYTHREAD_PRE_SPLIT]] ; CHECK: for.bodythread-pre-split: -; CHECK-NEXT: [[DEC_BCSSA1]] = phi i8 [ poison, [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK:%.*]] ], [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC]], [[IF_THEN]] ] +; CHECK-NEXT: [[DEC_MERGE]] = phi i8 [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC_MERGE]], [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK:%.*]] ], [ [[DEC]], [[IF_THEN]] ] ; CHECK-NEXT: [[SHOULD_LOOPBACK:%.*]] = phi i1 [ true, [[FOR_INC]] ], [ false, [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK]] ], [ true, [[IF_THEN]] ] ; CHECK-NEXT: [[DO_LOOPBACK:%.*]] = and i1 [[SHOULD_LOOPBACK]], [[ENABLE_LOOPBACK:%.*]] ; CHECK-NEXT: call void @sideeffect0() @@ -1005,8 +1005,8 @@ define void @pr49510() { ; CHECK-NEXT: [[TOBOOL_OLD:%.*]] = icmp ne i16 [[DOTOLD]], 0 ; CHECK-NEXT: br i1 [[TOBOOL_OLD]], label [[LAND_RHS:%.*]], label [[FOR_END:%.*]] ; CHECK: land.rhs: -; CHECK-NEXT: [[DOTBCSSA:%.*]] = phi i16 [ [[DOTOLD]], [[ENTRY:%.*]] ], [ [[TMP0:%.*]], [[LAND_RHS]] ] -; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[DOTBCSSA]], 0 +; CHECK-NEXT: [[DOTMERGE:%.*]] = phi i16 [ [[TMP0:%.*]], [[LAND_RHS]] ], [ [[DOTOLD]], [[ENTRY:%.*]] ] +; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[DOTMERGE]], 0 ; CHECK-NEXT: [[TMP0]] = load i16, i16* @global_pr49510, align 1 ; CHECK-NEXT: [[TOBOOL:%.*]] = icmp ne i16 [[TMP0]], 0 ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[TOBOOL]], i1 false @@ -1043,15 +1043,15 @@ define i32 @pr51125() { ; CHECK-NEXT: [[ISZERO_OLD:%.*]] = icmp eq i32 [[LD_OLD]], 0 ; CHECK-NEXT: br i1 [[ISZERO_OLD]], label [[EXIT:%.*]], label [[L2:%.*]] ; CHECK: L2: -; CHECK-NEXT: [[LD_BCSSA1:%.*]] = phi i32 [ [[LD_OLD]], [[ENTRY:%.*]] ], [ [[LD:%.*]], [[L2]] ] +; CHECK-NEXT: [[LD_MERGE:%.*]] = phi i32 [ [[LD:%.*]], [[L2]] ], [ [[LD_OLD]], [[ENTRY:%.*]] ] ; CHECK-NEXT: store i32 -1, i32* @global_pr51125, align 4 -; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[LD_BCSSA1]], -1 +; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[LD_MERGE]], -1 ; CHECK-NEXT: [[LD]] = load i32, i32* @global_pr51125, align 4 ; CHECK-NEXT: [[ISZERO:%.*]] = icmp eq i32 [[LD]], 0 ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 true, i1 [[ISZERO]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[EXIT]], label [[L2]] ; CHECK: exit: -; CHECK-NEXT: [[R:%.*]] = phi i32 [ [[LD_BCSSA1]], [[L2]] ], [ [[LD_OLD]], [[ENTRY]] ] +; CHECK-NEXT: [[R:%.*]] = phi i32 [ [[LD]], [[L2]] ], [ [[LD_OLD]], [[ENTRY]] ] ; CHECK-NEXT: ret i32 [[R]] ; entry: </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # build_abe stage2: 5 # build_abe gdb: 6 # build_abe qemu: 7 This commit has regressed these CI configurations: - tcwg_gnu_cross_build/master-aarch64 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Even more details: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap: 2 This commit has regressed these CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # build_abe gdb: 6 This commit has regressed these CI configurations: - tcwg_gnu_native_build/master-arm Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Even more details: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:5485bbebb3679245dd4bc7c149bbc940f8b2e632

by ci_notify＠linaro.org

Identified regression caused by *gcc:5485bbebb3679245dd4bc7c149bbc940f8b2e632*: commit 5485bbebb3679245dd4bc7c149bbc940f8b2e632 Author: Aldy Hernandez <aldyh(a)redhat.com> Refactor jump_thread_path_registry. Results regressed to (for first_bad == 5485bbebb3679245dd4bc7c149bbc940f8b2e632) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-5485bbebb3679245dd4bc7c149bbc940f8b2e632/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 104 from (for last_good == 3fca63b0b6faf6a30ed735b86b8eb59944701fc1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-3fca63b0b6faf6a30ed735b86b8eb59944701fc1/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-5485bbebb3679245dd4bc7c149bbc940f8b2e632 cd investigate-gcc-5485bbebb3679245dd4bc7c149bbc940f8b2e632 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 5485bbebb3679245dd4bc7c149bbc940f8b2e632 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3fca63b0b6faf6a30ed735b86b8eb59944701fc1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 5485bbebb3679245dd4bc7c149bbc940f8b2e632 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Sat Sep 11 09:37:39 2021 +0200 Refactor jump_thread_path_registry. In an attempt to refactor thread_through_all_blocks(), I've realized that there is a mess of code dealing with coexisting forward and backward thread types. However, this is an impossible scenario, as the registry contains either forward/old-style threads, or backward threads (EDGE_FSM_THREADs), never both. The fact that both types of threads cannot coexist, simplifies the code considerably. For that matter, it splits things up nicely because there are some common bits that can go into a base class, and some differing code that can go into derived classes. Diving things in this way makes it very obvious which parts belong in the old-style copier and which parts belong to the generic copier. Doing all this provided some nice cleanups, as well as fixing a latent bug in adjust_paths_after_duplication. The diff is somewhat hard to read, so perhaps looking at the final output would be easier. A general overview of what this patch achieves can be seen by just looking at this simplified class layout: // Abstract class for the jump thread registry. class jt_path_registry { public: jt_path_registry (); virtual ~jt_path_registry (); bool register_jump_thread (vec<jump_thread_edge *> *); bool thread_through_all_blocks (bool peel_loop_headers); jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t); vec<jump_thread_edge *> *allocate_thread_path (); protected: vec<vec<jump_thread_edge *> *> m_paths; unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; }; // Forward threader path registry using a custom BB copier. class fwd_jt_path_registry : public jt_path_registry { public: fwd_jt_path_registry (); ~fwd_jt_path_registry (); void remove_jump_threads_including (edge); private: bool update_cfg (bool peel_loop_headers) override; void mark_threaded_blocks (bitmap threaded_blocks); bool thread_block_1 (basic_block, bool noloop_only, bool joiners); bool thread_block (basic_block, bool noloop_only); bool thread_through_loop_header (class loop *loop, bool may_peel_loop_headers); class redirection_data *lookup_redirection_data (edge e, enum insert_option); hash_table<struct removed_edges> *m_removed_edges; hash_table<redirection_data> *m_redirection_data; }; // Backward threader path registry using a generic BB copier. class back_jt_path_registry : public jt_path_registry { private: bool update_cfg (bool peel_loop_headers) override; void adjust_paths_after_duplication (unsigned curr_path_num); bool duplicate_thread_path (edge entry, edge exit, basic_block *region, unsigned n_region, unsigned current_path_no); bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); }; That is, the forward and backward bits have been completely split, while deriving from a base class for the common functionality. Most everything is mechanical, but there are a few gotchas: a) back_jt_path_registry::update_cfg(), which contains the backward threading specific bits, is rather simple, since most of the code in the original thread_through_all_blocks() only applied to the forward threader: removed edges, mark_threaded_blocks, thread_through_loop_header, the copy tables (*). (*) The back threader has its own copy tables in duplicate_thread_path. b) In some cases, adjust_paths_after_duplication() was commoning out so many blocks that it was removing the initial EDGE_FSM_THREAD marker. I've fixed this. c) AFAICT, when run from the forward threader, thread_through_all_blocks() attempts to remove threads starting with an edge already seen, but it would never see anything because the loop doing the checking only has a visited_starting_edges.contains(), and no corresponding visited_starting_edges.add(). The add() method in thread_through_all_blocks belongs to the backward threading bits, and as I've explained, both types cannot coexist. I've removed the checks in the forward bits since they don't appear to do anything. If this was an oversight, and we want to avoid threading already seen edges in the forward threader, I can move this functionality to the base class. Ultimately I would like to move all the registry code to tree-ssa-threadregistry.*. I've avoided this in this patch to aid in review. My apologies for this longass explanation, but I want to make sure we're covering all of our bases. Tested on x86-64 Linux by a very tedious process of moving chunks around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and repeating ad-nauseum. And of course, by running a full bootstrap and tests. OK? p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD type. gcc/ChangeLog: * tree-ssa-threadbackward.c (class back_threader_registry): Use back_jt_path_registry. * tree-ssa-threadedge.c (jump_threader::jump_threader): Use fwd_jt_path_registry. * tree-ssa-threadedge.h (class jump_threader): Same.. * tree-ssa-threadupdate.c (jump_thread_path_registry::jump_thread_path_registry): Rename... (jt_path_registry::jt_path_registry): ...to this. (jump_thread_path_registry::~jump_thread_path_registry): Rename... (jt_path_registry::~jt_path_registry): ...this. (fwd_jt_path_registry::fwd_jt_path_registry): New. (fwd_jt_path_registry::~fwd_jt_path_registry): New. (jump_thread_path_registry::allocate_thread_edge): Rename... (jt_path_registry::allocate_thread_edge): ...to this. (jump_thread_path_registry::allocate_thread_path): Rename... (jt_path_registry::allocate_thread_path): ...to this. (jump_thread_path_registry::lookup_redirection_data): Rename... (fwd_jt_path_registry::lookup_redirection_data): ...to this. (jump_thread_path_registry::thread_block_1): Rename... (fwd_jt_path_registry::thread_block_1): ...to this. (jump_thread_path_registry::thread_block): Rename... (fwd_jt_path_registry::thread_block): ...to this. (jt_path_registry::thread_through_loop_header): Rename... (fwd_jt_path_registry::thread_through_loop_header): ...to this. (jump_thread_path_registry::mark_threaded_blocks): Rename... (fwd_jt_path_registry::mark_threaded_blocks): ...to this. (jump_thread_path_registry::debug_path): Rename... (jt_path_registry::debug_path): ...to this. (jump_thread_path_registry::dump): Rename... (jt_path_registry::debug): ...to this. (jump_thread_path_registry::rewire_first_differing_edge): Rename... (back_jt_path_registry::rewire_first_differing_edge): ...to this. (jump_thread_path_registry::adjust_paths_after_duplication): Rename... (back_jt_path_registry::adjust_paths_after_duplication): ...to this. (jump_thread_path_registry::duplicate_thread_path): Rename... (back_jt_path_registry::duplicate_thread_path): ...to this. Also, drop ill-formed candidates. (jump_thread_path_registry::remove_jump_threads_including): Rename... (fwd_jt_path_registry::remove_jump_threads_including): ...to this. (jt_path_registry::thread_through_all_blocks): New. (back_jt_path_registry::update_cfg): New. (fwd_jt_path_registry::update_cfg): New. (jump_thread_path_registry::register_jump_thread): Rename... (jt_path_registry::register_jump_thread): ...to this. * tree-ssa-threadupdate.h (class jump_thread_path_registry): Abstract to... (class jt_path_registry): ...here. (class fwd_jt_path_registry): New. (class back_jt_path_registry): New. --- gcc/tree-ssa-threadbackward.c | 2 +- gcc/tree-ssa-threadedge.c | 2 +- gcc/tree-ssa-threadedge.h | 2 +- gcc/tree-ssa-threadupdate.c | 213 +++++++++++++++++++++--------------------- gcc/tree-ssa-threadupdate.h | 60 +++++++----- 5 files changed, 149 insertions(+), 130 deletions(-) diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index e72992328de..7ff5cecbdab 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -56,7 +56,7 @@ public: bool register_path (const vec<basic_block> &, edge taken); bool thread_through_all_blocks (bool may_peel_loop_headers); private: - jump_thread_path_registry m_lowlevel_registry; + back_jt_path_registry m_lowlevel_registry; const int m_max_allowable_paths; int m_threaded_paths; }; diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 3c7cdc58b93..422cb89401b 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -71,7 +71,7 @@ jump_threader::jump_threader (jump_threader_simplifier *simplifier, dummy_cond = gimple_build_cond (NE_EXPR, integer_zero_node, integer_zero_node, NULL, NULL); - m_registry = new jump_thread_path_registry (); + m_registry = new fwd_jt_path_registry (); m_simplifier = simplifier; m_state = state; } diff --git a/gcc/tree-ssa-threadedge.h b/gcc/tree-ssa-threadedge.h index 0002b200d8b..18e6bd41aaa 100644 --- a/gcc/tree-ssa-threadedge.h +++ b/gcc/tree-ssa-threadedge.h @@ -75,7 +75,7 @@ private: // Dummy condition to avoid creating lots of throw away statements. gcond *dummy_cond; - class jump_thread_path_registry *m_registry; + class fwd_jt_path_registry *m_registry; jump_threader_simplifier *m_simplifier; jt_state *m_state; }; diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 18f16efbb7a..93538104fdf 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -167,29 +167,36 @@ jump_thread_path_allocator::allocate_thread_path () return new (r) vec<jump_thread_edge *> (); } -jump_thread_path_registry::jump_thread_path_registry () +jt_path_registry::jt_path_registry () { m_paths.create (5); - m_removed_edges = new hash_table<struct removed_edges> (17); m_num_threaded_edges = 0; - m_redirection_data = NULL; } -jump_thread_path_registry::~jump_thread_path_registry () +jt_path_registry::~jt_path_registry () { m_paths.release (); +} + +fwd_jt_path_registry::fwd_jt_path_registry () +{ + m_removed_edges = new hash_table<struct removed_edges> (17); + m_redirection_data = NULL; +} + +fwd_jt_path_registry::~fwd_jt_path_registry () +{ delete m_removed_edges; } jump_thread_edge * -jump_thread_path_registry::allocate_thread_edge (edge e, - jump_thread_edge_type t) +jt_path_registry::allocate_thread_edge (edge e, jump_thread_edge_type t) { return m_allocator.allocate_thread_edge (e, t); } vec<jump_thread_edge *> * -jump_thread_path_registry::allocate_thread_path () +jt_path_registry::allocate_thread_path () { return m_allocator.allocate_thread_path (); } @@ -426,8 +433,7 @@ create_block_for_threading (basic_block bb, edges associated with E in the hash table. */ redirection_data * -jump_thread_path_registry::lookup_redirection_data (edge e, - enum insert_option insert) +fwd_jt_path_registry::lookup_redirection_data (edge e, insert_option insert) { struct redirection_data **slot; struct redirection_data *elt; @@ -1413,9 +1419,9 @@ redirection_block_p (basic_block bb) If JOINERS is true, then thread through joiner blocks as well. */ bool -jump_thread_path_registry::thread_block_1 (basic_block bb, - bool noloop_only, - bool joiners) +fwd_jt_path_registry::thread_block_1 (basic_block bb, + bool noloop_only, + bool joiners) { /* E is an incoming edge into BB that we may or may not want to redirect to a duplicate of BB. */ @@ -1594,7 +1600,7 @@ jump_thread_path_registry::thread_block_1 (basic_block bb, opportunity. */ bool -jump_thread_path_registry::thread_block (basic_block bb, bool noloop_only) +fwd_jt_path_registry::thread_block (basic_block bb, bool noloop_only) { bool retval; retval = thread_block_1 (bb, noloop_only, false); @@ -1675,9 +1681,8 @@ determine_bb_domination_status (class loop *loop, basic_block bb) to the inside of the loop. */ bool -jump_thread_path_registry::thread_through_loop_header - (class loop *loop, - bool may_peel_loop_headers) +fwd_jt_path_registry::thread_through_loop_header (class loop *loop, + bool may_peel_loop_headers) { basic_block header = loop->header; edge e, tgt_edge, latch = loop_latch_edge (loop); @@ -1932,7 +1937,7 @@ count_stmts_and_phis_in_block (basic_block bb) hash table lookups to map from threaded edge to new target. */ void -jump_thread_path_registry::mark_threaded_blocks (bitmap threaded_blocks) +fwd_jt_path_registry::mark_threaded_blocks (bitmap threaded_blocks) { unsigned int i; bitmap_iterator bi; @@ -2197,7 +2202,7 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n) } void -jump_thread_path_registry::debug_path (FILE *dump_file, int pathno) +jt_path_registry::debug_path (FILE *dump_file, int pathno) { vec<jump_thread_edge *> *p = m_paths[pathno]; fprintf (dump_file, "path: "); @@ -2208,7 +2213,7 @@ jump_thread_path_registry::debug_path (FILE *dump_file, int pathno) } void -jump_thread_path_registry::dump () +jt_path_registry::debug () { for (unsigned i = 0; i < m_paths.length (); ++i) debug_path (stderr, i); @@ -2223,8 +2228,8 @@ jump_thread_path_registry::dump () Returns TRUE if we were able to successfully rewire the edge. */ bool -jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num, - unsigned edge_num) +back_jt_path_registry::rewire_first_differing_edge (unsigned path_num, + unsigned edge_num) { vec<jump_thread_edge *> *path = m_paths[path_num]; edge &e = (*path)[edge_num]->e; @@ -2269,11 +2274,9 @@ jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num, specifies the path that was just threaded. */ void -jump_thread_path_registry::adjust_paths_after_duplication - (unsigned curr_path_num) +back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num) { vec<jump_thread_edge *> *curr_path = m_paths[curr_path_num]; - gcc_assert ((*curr_path)[0]->type == EDGE_FSM_THREAD); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2347,8 +2350,16 @@ jump_thread_path_registry::adjust_paths_after_duplication m_paths.unordered_remove (cand_path_num); continue; } - /* Otherwise, just remove the redundant sub-path. */ - cand_path->block_remove (0, j); + if ((*cand_path)[j]->type != EDGE_FSM_THREAD) + { + /* If all the EDGE_FSM_THREADs are common, all that's + left is the final EDGE_NO_COPY_SRC_BLOCK. */ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Dropping illformed candidate.\n"); + } + else + /* Otherwise, just remove the redundant sub-path. */ + cand_path->block_remove (0, j); } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2372,11 +2383,11 @@ jump_thread_path_registry::adjust_paths_after_duplication Returns false if it is unable to copy the region, true otherwise. */ bool -jump_thread_path_registry::duplicate_thread_path (edge entry, - edge exit, - basic_block *region, - unsigned n_region, - unsigned current_path_no) +back_jt_path_registry::duplicate_thread_path (edge entry, + edge exit, + basic_block *region, + unsigned n_region, + unsigned current_path_no) { unsigned i; class loop *loop = entry->dest->loop_father; @@ -2551,7 +2562,7 @@ valid_jump_thread_path (vec<jump_thread_edge *> *path) DOM/VRP rather than for every case where DOM optimizes away a COND_EXPR. */ void -jump_thread_path_registry::remove_jump_threads_including (edge_def *e) +fwd_jt_path_registry::remove_jump_threads_including (edge_def *e) { if (!m_paths.exists ()) return; @@ -2560,69 +2571,52 @@ jump_thread_path_registry::remove_jump_threads_including (edge_def *e) *slot = e; } -/* Walk through all blocks and thread incoming edges to the appropriate - outgoing edge for each edge pair recorded in THREADED_EDGES. +/* Thread all paths that have been queued for jump threading, and + update the CFG accordingly. It is the caller's responsibility to fix the dominance information and rewrite duplicated SSA_NAMEs back into SSA form. - If MAY_PEEL_LOOP_HEADERS is false, we avoid threading edges through - loop headers if it does not simplify the loop. + If PEEL_LOOP_HEADERS is false, avoid threading edges through loop + headers if it does not simplify the loop. - Returns true if one or more edges were threaded, false otherwise. */ + Returns true if one or more edges were threaded. */ bool -jump_thread_path_registry::thread_through_all_blocks - (bool may_peel_loop_headers) +jt_path_registry::thread_through_all_blocks (bool peel_loop_headers) { - bool retval = false; - unsigned int i; - auto_bitmap threaded_blocks; - hash_set<edge> visited_starting_edges; - - if (!m_paths.exists ()) - { - retval = false; - goto out; - } + if (m_paths.length () == 0) + return false; m_num_threaded_edges = 0; - /* Remove any paths that referenced removed edges. */ - if (m_removed_edges) - for (i = 0; i < m_paths.length (); ) - { - unsigned int j; - vec<jump_thread_edge *> *path = m_paths[i]; + bool retval = update_cfg (peel_loop_headers); - for (j = 0; j < path->length (); j++) - { - edge e = (*path)[j]->e; - if (m_removed_edges->find_slot (e, NO_INSERT)) - break; - } + statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges); - if (j != path->length ()) - { - cancel_thread (path, "Thread references removed edge"); - m_paths.unordered_remove (i); - continue; - } - i++; - } + if (retval) + { + loops_state_set (LOOPS_NEED_FIXUP); + return true; + } + return false; +} - /* Jump-thread all FSM threads before other jump-threads. */ - for (i = 0; i < m_paths.length ();) +/* This is the backward threader version of thread_through_all_blocks + using a generic BB copier. */ + +bool +back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/) +{ + bool retval = false; + hash_set<edge> visited_starting_edges; + + while (m_paths.length ()) { - vec<jump_thread_edge *> *path = m_paths[i]; + vec<jump_thread_edge *> *path = m_paths[0]; edge entry = (*path)[0]->e; - /* Only code-generate FSM jump-threads in this loop. */ - if ((*path)[0]->type != EDGE_FSM_THREAD) - { - i++; - continue; - } + gcc_checking_assert ((*path)[0]->type == EDGE_FSM_THREAD); /* Do not jump-thread twice from the same starting edge. @@ -2638,8 +2632,8 @@ jump_thread_path_registry::thread_through_all_blocks || !valid_jump_thread_path (path)) { /* Remove invalid FSM jump-thread paths. */ - cancel_thread (path, "Invalid FSM jump-thread path"); - m_paths.unordered_remove (i); + cancel_thread (path, "Avoiding threading twice from same edge"); + m_paths.unordered_remove (0); continue; } @@ -2650,7 +2644,7 @@ jump_thread_path_registry::thread_through_all_blocks for (unsigned int j = 0; j < len - 1; j++) region[j] = (*path)[j]->e->dest; - if (duplicate_thread_path (entry, exit, region, len - 1, i)) + if (duplicate_thread_path (entry, exit, region, len - 1, 0)) { /* We do not update dominance info. */ free_dominance_info (CDI_DOMINATORS); @@ -2660,27 +2654,44 @@ jump_thread_path_registry::thread_through_all_blocks } path->release (); - m_paths.unordered_remove (i); + m_paths.unordered_remove (0); free (region); } + return retval; +} - /* Remove from PATHS all the jump-threads starting with an edge already - jump-threaded. */ - for (i = 0; i < m_paths.length ();) - { - vec<jump_thread_edge *> *path = m_paths[i]; - edge entry = (*path)[0]->e; +/* This is the forward threader version of thread_through_all_blocks, + using a custom BB copier. */ - /* Do not jump-thread twice from the same block. */ - if (visited_starting_edges.contains (entry)) - { - cancel_thread (path, "Avoiding threading twice from same BB"); - m_paths.unordered_remove (i); - } - else +bool +fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) +{ + bool retval = false; + + /* Remove any paths that referenced removed edges. */ + if (m_removed_edges) + for (unsigned i = 0; i < m_paths.length (); ) + { + unsigned int j; + vec<jump_thread_edge *> *path = m_paths[i]; + + for (j = 0; j < path->length (); j++) + { + edge e = (*path)[j]->e; + if (m_removed_edges->find_slot (e, NO_INSERT)) + break; + } + + if (j != path->length ()) + { + cancel_thread (path, "Thread references removed edge"); + m_paths.unordered_remove (i); + continue; + } i++; - } + } + auto_bitmap threaded_blocks; mark_threaded_blocks (threaded_blocks); initialize_original_copy_tables (); @@ -2737,16 +2748,8 @@ jump_thread_path_registry::thread_through_all_blocks gcc_assert (e->aux == NULL); } - statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges); - free_original_copy_tables (); - m_paths.release (); - - if (retval) - loops_state_set (LOOPS_NEED_FIXUP); - - out: return retval; } @@ -2761,7 +2764,7 @@ jump_thread_path_registry::thread_through_all_blocks Return TRUE if PATH was successfully threaded. */ bool -jump_thread_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) +jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) { if (!dbg_cnt (registered_jump_thread)) { diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 2030bda15af..58e3a38e0c5 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -54,49 +54,65 @@ private: obstack m_obstack; }; -// This is the underlying jump thread registry. When all candidates -// have been registered with register_jump_thread(), -// thread_through_all_blocks() is called to actually change the CFG. +// Abstract class for the jump thread registry. +// +// When all candidates have been registered with +// register_jump_thread(), thread_through_all_blocks() is called to +// update the CFG. -class jump_thread_path_registry +class jt_path_registry { public: - jump_thread_path_registry (); - ~jump_thread_path_registry (); + jt_path_registry (); + virtual ~jt_path_registry (); bool register_jump_thread (vec<jump_thread_edge *> *); - void remove_jump_threads_including (edge); - bool thread_through_all_blocks (bool); + bool thread_through_all_blocks (bool peel_loop_headers); jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t); vec<jump_thread_edge *> *allocate_thread_path (); - void dump (); + void debug (); +protected: + void debug_path (FILE *, int pathno); + vec<vec<jump_thread_edge *> *> m_paths; + unsigned long m_num_threaded_edges; +private: + virtual bool update_cfg (bool peel_loop_headers) = 0; + jump_thread_path_allocator m_allocator; + DISABLE_COPY_AND_ASSIGN (jt_path_registry); +}; + +// Forward threader path registry using a custom BB copier. +class fwd_jt_path_registry : public jt_path_registry +{ +public: + fwd_jt_path_registry (); + ~fwd_jt_path_registry (); + void remove_jump_threads_including (edge); private: - void debug_path (FILE *, int pathno); + bool update_cfg (bool peel_loop_headers) override; void mark_threaded_blocks (bitmap threaded_blocks); - bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); - void adjust_paths_after_duplication (unsigned curr_path_num); - bool duplicate_thread_path (edge entry, - edge exit, - basic_block *region, - unsigned n_region, - unsigned current_path_no); bool thread_block_1 (basic_block, bool noloop_only, bool joiners); bool thread_block (basic_block, bool noloop_only); bool thread_through_loop_header (class loop *loop, bool may_peel_loop_headers); class redirection_data *lookup_redirection_data (edge e, enum insert_option); - vec<vec<jump_thread_edge *> *> m_paths; - hash_table<struct removed_edges> *m_removed_edges; // Main data structure to hold information for duplicates of BB. hash_table<redirection_data> *m_redirection_data; +}; - // Jump threading statistics. - unsigned long m_num_threaded_edges; +// Backward threader path registry using a generic BB copier. - jump_thread_path_allocator m_allocator; +class back_jt_path_registry : public jt_path_registry +{ +private: + bool update_cfg (bool peel_loop_headers) override; + void adjust_paths_after_duplication (unsigned curr_path_num); + bool duplicate_thread_path (edge entry, edge exit, basic_block *region, + unsigned n_region, unsigned current_path_no); + bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); }; // Rather than search all the edges in jump thread paths each time DOM </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig Culprit: <cut> commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 Author: Slark Xiao <slark_xiao(a)163.com> Date: Tue Aug 31 10:40:25 2021 +0800 net: Add depends on OF_NET for LiteX's LiteETH Current settings may produce a build error when CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls a headfile <linux/of.h> and some functions in <linux/of_net.h>. Signed-off-by: Slark Xiao <slark_xiao(a)163.com> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> </cut> Results regressed to (for first_bad == c3496da580b0fc10fdeba8f6a5e6aef4c78b5598) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29873 # linux build successful: all # First few build errors in logs: from (for last_good == a9e7c3cedc2914f63cd135b75832b9bf850af782) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29873 # linux build successful: all # linux boot successful: boot Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 cd investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a9e7c3cedc2914f63cd135b75832b9bf850af782 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Full commit (up to 1000 lines): <cut> commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 Author: Slark Xiao <slark_xiao(a)163.com> Date: Tue Aug 31 10:40:25 2021 +0800 net: Add depends on OF_NET for LiteX's LiteETH Current settings may produce a build error when CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls a headfile <linux/of.h> and some functions in <linux/of_net.h>. Signed-off-by: Slark Xiao <slark_xiao(a)163.com> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- drivers/net/ethernet/litex/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/litex/Kconfig b/drivers/net/ethernet/litex/Kconfig index 265dba414b41..63bf01d28f0c 100644 --- a/drivers/net/ethernet/litex/Kconfig +++ b/drivers/net/ethernet/litex/Kconfig @@ -17,6 +17,7 @@ if NET_VENDOR_LITEX config LITEX_LITEETH tristate "LiteX Ethernet support" + depends on OF_NET help If you wish to compile a kernel for hardware with a LiteX LiteEth device then you should answer Y to this. </cut>

3 years, 9 months

3
3
0 0

[TCWG CI] Regression caused by llvm:09507b53250dc266632c204558cb1c2b56e8ddea

by ci_notify＠linaro.org

Identified regression caused by *llvm:09507b53250dc266632c204558cb1c2b56e8ddea*: commit 09507b53250dc266632c204558cb1c2b56e8ddea Author: Cullen Rhodes <cullen.rhodes(a)arm.com> [AArch64][SME] Disable NEON in streaming mode Results regressed to (for first_bad == 09507b53250dc266632c204558cb1c2b56e8ddea) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-09507b53250dc266632c204558cb1c2b56e8ddea/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 93c55d5ea24b8f455b0621bac373f142e0008739) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-93c55d5ea24b8f455b0621bac373f142e0008739/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-09507b53250dc266632c204558cb1c2b56e8ddea cd investigate-llvm-09507b53250dc266632c204558cb1c2b56e8ddea # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 09507b53250dc266632c204558cb1c2b56e8ddea ../artifacts/test.sh # Reproduce last_good build git checkout --detach 93c55d5ea24b8f455b0621bac373f142e0008739 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 09507b53250dc266632c204558cb1c2b56e8ddea Author: Cullen Rhodes <cullen.rhodes(a)arm.com> Date: Mon Aug 16 07:31:55 2021 +0000 [AArch64][SME] Disable NEON in streaming mode In streaming mode most of the NEON instruction set is illegal, disable NEON when compiling with `+streaming-sve`, unless NEON is explictly requested. Subsequent patches will add support for the small subset of NEON instructions that are legal in streaming mode. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D107902 --- llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp | 11 ++++++++++- llvm/test/MC/AArch64/SME/streaming-sve-feature.s | 8 ++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp index 3c2df1621e11..987cabce6cc9 100644 --- a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp +++ b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp @@ -57,7 +57,16 @@ createAArch64MCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) { CPU = "apple-a12"; } - return createAArch64MCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + // Most of the NEON instruction set isn't supported in streaming mode on SME + // targets, disable NEON unless explicitly requested. + bool RequestedNEON = FS.contains("neon"); + bool RequestedStreamingSVE = FS.contains("streaming-sve"); + MCSubtargetInfo *STI = + createAArch64MCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + if (RequestedStreamingSVE && !RequestedNEON && + STI->hasFeature(AArch64::FeatureNEON)) + STI->ToggleFeature(AArch64::FeatureNEON); + return STI; } void AArch64_MC::initLLVMToCVRegMapping(MCRegisterInfo *MRI) { diff --git a/llvm/test/MC/AArch64/SME/streaming-sve-feature.s b/llvm/test/MC/AArch64/SME/streaming-sve-feature.s new file mode 100644 index 000000000000..e35505ca39c5 --- /dev/null +++ b/llvm/test/MC/AArch64/SME/streaming-sve-feature.s @@ -0,0 +1,8 @@ +// RUN: llvm-mc -triple=aarch64 -mattr=+streaming-sve,+neon < %s 2>&1 | FileCheck %s +// RUN: not llvm-mc -triple=aarch64 -mattr=+streaming-sve < %s 2>&1 | FileCheck %s --check-prefix=CHECK-ERROR + +// Verify NEON is disabled when targeting streaming mode, if it's not +// explicitly requested. +add v0.8b, v1.8b, v2.8b +// CHECK: add v0.8b, v1.8b, v2.8b +// CHECK-ERROR: error: instruction requires: neon </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:4389a413e2129d7d55ee779638b649aa852b6f8a

by ci_notify＠linaro.org

Identified regression caused by *llvm:4389a413e2129d7d55ee779638b649aa852b6f8a*: commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" Results regressed to (for first_bad == 4389a413e2129d7d55ee779638b649aa852b6f8a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-4389a413e2129d7d55ee779638b649aa852b6f8a/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == dfce2909ee1ea1523ec27b834a0e56429e9c2beb) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-dfce2909ee1ea1523ec27b834a0e56429e9c2beb/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a cd investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4389a413e2129d7d55ee779638b649aa852b6f8a ../artifacts/test.sh # Reproduce last_good build git checkout --detach dfce2909ee1ea1523ec27b834a0e56429e9c2beb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 838669794ea8..980d0ab45975 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 1c79640be80f..96bbc0250126 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2641,7 +2641,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2666,7 +2666,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2686,18 +2686,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2783,11 +2785,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2897,17 +2897,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by gcc:01b5038718056b024b370b74a874fbd92c5bbab3

by ci_notify＠linaro.org

Identified regression caused by *gcc:01b5038718056b024b370b74a874fbd92c5bbab3*: commit 01b5038718056b024b370b74a874fbd92c5bbab3 Author: Aldy Hernandez <aldyh(a)redhat.com> Disable threading through latches until after loop optimizations. Results regressed to (for first_bad == 01b5038718056b024b370b74a874fbd92c5bbab3) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-01b5038718056b024b370b74a874fbd92c5bbab3/results_id: 1 # 459.GemsFDTD,GemsFDTD_base.default regressed by 102 # 464.h264ref,h264ref_base.default regressed by 102 from (for last_good == fb88bf9931f17d137eb50c001e1c924aa1e34e83) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-fb88bf9931f17d137eb50c001e1c924aa1e34e83/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3 cd investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 01b5038718056b024b370b74a874fbd92c5bbab3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fb88bf9931f17d137eb50c001e1c924aa1e34e83 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 01b5038718056b024b370b74a874fbd92c5bbab3 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 9 20:30:28 2021 +0200 Disable threading through latches until after loop optimizations. The motivation for this patch was enabling the use of global ranges in the path solver, but this caused certain properties of loops being destroyed which made subsequent loop optimizations to fail. Consequently, this patch's mail goal is to disable jump threading involving the latch until after loop optimizations have run. As can be seen in the test adjustments, we mostly shift the threading from the early threaders (ethread, thread[12] to the late threaders thread[34]). I have nuked some of the early notes in the testcases that came as part of the jump threader rewrite. They're mostly noise now. Note that we could probably relax some other restrictions in profitable_path_p when loop optimizations have completed, but it would require more testing, and I'm hesitant to touch more things than needed at this point. I have added a reminder to the function to keep this in mind. Finally, perhaps as a follow-up, we should apply the same restrictions to the forward threader. At some point I'd like to combine the cost models. Tested on x86-64 Linux. p.s. There is a thorough discussion involving the limitations of jump threading involving loops here: https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html gcc/ChangeLog: * tree-pass.h (PROP_loop_opts_done): New. * gimple-range-path.cc (path_range_query::internal_range_of_expr): Intersect with global range. * tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done. * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Disable threading through latches until after loop optimizations have run. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of threading through latches. * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. Co-authored-by: Michael Matz <matz(a)suse.de> --- gcc/gimple-range-path.cc | 3 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 +-- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 37 ++--------------------- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 17 +---------- gcc/tree-pass.h | 2 ++ gcc/tree-ssa-loop.c | 2 +- gcc/tree-ssa-threadbackward.c | 28 +++++++++++++++-- 7 files changed, 37 insertions(+), 56 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index a4fa3b296ff..c616b65756f 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt) basic_block bb = stmt ? gimple_bb (stmt) : exit_bb (); if (stmt && range_defined_in_block (r, name, bb)) { + if (TREE_CODE (name) == SSA_NAME) + r.intersect (gimple_range_global (name)); + set_cache (r, name); return true; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c index e1c33e86cd7..823ada982ff 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ +/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ void foo(); void bla(); @@ -26,4 +26,4 @@ void thread_latch_through_header (void) case. And we want to thread through the header as well. These are both caught by threading in DOM. */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */ -/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */ +/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c index c7bf867b084..ee46759bacc 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c @@ -1,41 +1,8 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */ -/* All the threads in the thread1 dump start on a X->BB12 edge, as can - be seen in the dump: - - Registering FSM jump thread: (x, 12) incoming edge; ... - etc - etc - - Before the new evrp, we were threading paths that started at the - following edges: - - Registering FSM jump thread: (10, 12) incoming edge - Registering FSM jump thread: (6, 12) incoming edge - Registering FSM jump thread: (9, 12) incoming edge - - This was because the PHI at BB12 had constant values coming in from - BB10, BB6, and BB9: - - # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)> - - Now with the new evrp, we get: - - # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)> - - Thus, we have 3 more paths that are known to be constant and can be - threaded. Which means that by the second threading pass, we can - only find one profitable path. - - For the record, all these extra constants are better paths coming - out of switches. For example: - - SWITCH_BB -> BBx -> BBy -> BBz -> PHI - - We now know the value of the switch index at PHI. */ /* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */ +/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */ int sum0, sum1, sum2, sum3; int foo (char *s, char **ret) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index 5fc2145a432..ba07942f9dd 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,23 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* Here we have the same issue as was commented in ssa-dom-thread-6.c. - The PHI coming into the threader has a lot more constants, so the - threader can thread more paths. - -$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2 -252c252 -< # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)> ---- -> # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)> -272a273 - - I spot checked a few and they all have the same pattern. We are - basically tracking the switch index better through multiple - paths. */ - /* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 83941bc0cee..eb75eb17951 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -225,6 +225,8 @@ protected: been optimized. */ #define PROP_gimple_lomp_dev (1 << 16) /* done omp_device_lower */ #define PROP_rtl_split_insns (1 << 17) /* RTL has insns split. */ +#define PROP_loop_opts_done (1 << 18) /* SSA loop optimizations + have completed. */ #define PROP_gimple \ (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp) diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 0cc4b3bbccf..1bbf2f1fb2c 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -540,7 +540,7 @@ const pass_data pass_data_tree_loop_done = OPTGROUP_LOOP, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_cfg, /* properties_required */ - 0, /* properties_provided */ + PROP_loop_opts_done, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_cleanup_cfg, /* todo_flags_finish */ diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 449232c7715..e72992328de 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "ssa.h" #include "tree-cfgcleanup.h" #include "tree-pretty-print.h" +#include "cfghooks.h" // Path registry for the backwards threader. After all paths have been // registered with register_path(), thread_through_all_blocks() is called @@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers) TAKEN_EDGE, otherwise it is NULL. CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path - would create an irreducible loop. */ + would create an irreducible loop. + + ?? It seems we should be able to loosen some of the restrictions in + this function after loop optimizations have run. */ bool back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, @@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, the last entry in the array when determining if we thread through the loop latch. */ if (loop->latch == bb) - threaded_through_latch = true; + { + threaded_through_latch = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " (latch)"); + } } gimple *stmt = get_gimple_control_stmt (m_path[0]); @@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, "a multiway branch.\n"); return false; } + + /* Threading through an empty latch would cause code to be added to + the latch. This could alter the loop form sufficiently to cause + loop optimizations to fail. Disable these threads until after + loop optimizations have run. */ + if ((threaded_through_latch + || (taken_edge && taken_edge->dest == loop->latch)) + && !(cfun->curr_properties & PROP_loop_opts_done) + && empty_block_p (loop->latch)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + " FAIL: FSM Thread through latch before loop opts would create non-empty latch\n"); + return false; + + } return true; } </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:50f4ae58eb136bc9d802cb98f02b6ff237eb61e0

by ci_notify＠linaro.org

Identified regression caused by *llvm:50f4ae58eb136bc9d802cb98f02b6ff237eb61e0*: commit 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 Author: David Green <david.green(a)arm.com> [AArch64] Correct store ReadAdrBase operand Results regressed to (for first_bad == 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0/results_id: 1 # 447.dealII,[.] _ZNK9MappingQ1ILi3EE12compute_fillERK12TriaIte regressed by 114 from (for last_good == 955c9437fd605216445fbd608de4ef1d96f825e9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-955c9437fd605216445fbd608de4ef1d96f825e9/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 cd investigate-llvm-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 955c9437fd605216445fbd608de4ef1d96f825e9 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 Author: David Green <david.green(a)arm.com> Date: Mon Aug 23 21:07:55 2021 +0100 [AArch64] Correct store ReadAdrBase operand It appears that the Read operand for stores was being placed on the first operand (the stored value) not the address base. This adds a ReadST for the stored value operand, allowing the ReadAdrBase to correctly act upon the address. Differential Revision: https://reviews.llvm.org/D108287 --- llvm/lib/Target/AArch64/AArch64InstrFormats.td | 20 +- llvm/lib/Target/AArch64/AArch64SchedA53.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA55.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA57.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA64FX.td | 1 + llvm/lib/Target/AArch64/AArch64SchedCyclone.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM3.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM4.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM5.td | 1 + llvm/lib/Target/AArch64/AArch64SchedFalkor.td | 1 + llvm/lib/Target/AArch64/AArch64SchedKryo.td | 1 + llvm/lib/Target/AArch64/AArch64SchedTSV110.td | 1 + llvm/lib/Target/AArch64/AArch64SchedThunderX.td | 1 + .../lib/Target/AArch64/AArch64SchedThunderX2T99.td | 1 + .../Target/AArch64/AArch64SchedThunderX3T110.td | 1 + llvm/lib/Target/AArch64/AArch64Schedule.td | 1 + .../llvm-mca/AArch64/Cortex/A55-store-readadv.s | 246 ++++++++++----------- 17 files changed, 148 insertions(+), 133 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td index 10c6fcd5cacd..ea0c62d2045f 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td +++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td @@ -3482,7 +3482,7 @@ multiclass Store8RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed8 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend8:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3492,7 +3492,7 @@ multiclass Store8RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed8 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend8:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3554,7 +3554,7 @@ multiclass Store16RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed16 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend16:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3564,7 +3564,7 @@ multiclass Store16RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed16 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend16:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3626,7 +3626,7 @@ multiclass Store32RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3636,7 +3636,7 @@ multiclass Store32RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3698,7 +3698,7 @@ multiclass Store64RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3708,7 +3708,7 @@ multiclass Store64RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3768,7 +3768,7 @@ multiclass Store128RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, def roW : LoadStore128RO<sz, V, opc, regtype, asm, (outs), (ins regtype:$Rt, GPR64sp:$Rn, GPR32:$Rm, ro_Wextend128:$extend), []>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3776,7 +3776,7 @@ multiclass Store128RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, def roX : LoadStore128RO<sz, V, opc, regtype, asm, (outs), (ins regtype:$Rt, GPR64sp:$Rn, GPR64:$Rm, ro_Xextend128:$extend), []>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } diff --git a/llvm/lib/Target/AArch64/AArch64SchedA53.td b/llvm/lib/Target/AArch64/AArch64SchedA53.td index 65c84b1f39c0..3fef369a4e2b 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA53.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA53.td @@ -149,6 +149,7 @@ def A53WriteFSqrtDP : SchedWriteRes<[A53UnitFPMDS]> { let Latency = 32; // No forwarding for these reads. def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadVLD, 0>; // ALU - Most operands in the ALU pipes are not needed for two cycles. Shiftable diff --git a/llvm/lib/Target/AArch64/AArch64SchedA55.td b/llvm/lib/Target/AArch64/AArch64SchedA55.td index 0e680078c348..34d6fb5fb306 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA55.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA55.td @@ -182,6 +182,7 @@ def CortexA55WriteFSqrtDP : SchedWriteRes<[CortexA55UnitFPDIV]> { let Latency = def : ReadAdvance<ReadVLD, 0>; def : ReadAdvance<ReadExtrHi, 1>; def : ReadAdvance<ReadAdrBase, 1>; +def : ReadAdvance<ReadST, 1>; // ALU - ALU input operands are generally needed in EX1. An operand produced in // in say EX2 can be forwarded for consumption to ALU in EX1, thereby diff --git a/llvm/lib/Target/AArch64/AArch64SchedA57.td b/llvm/lib/Target/AArch64/AArch64SchedA57.td index c1eacca8cc1f..c9addac18ba7 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA57.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA57.td @@ -116,6 +116,7 @@ def : ReadAdvance<ReadIM, 0>; def : ReadAdvance<ReadIMA, 2, [WriteIM32, WriteIM64]>; def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; diff --git a/llvm/lib/Target/AArch64/AArch64SchedA64FX.td b/llvm/lib/Target/AArch64/AArch64SchedA64FX.td index 6df800487ce2..dc551364ed6f 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA64FX.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA64FX.td @@ -761,6 +761,7 @@ def : ReadAdvance<ReadIMA, 0>; def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadVLD, 0>; //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/AArch64/AArch64SchedCyclone.td b/llvm/lib/Target/AArch64/AArch64SchedCyclone.td index 11df304a974c..310c240966f9 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedCyclone.td +++ b/llvm/lib/Target/AArch64/AArch64SchedCyclone.td @@ -258,6 +258,7 @@ def CyReadAdrBase : SchedReadVariant<[ SchedVar<ScaledIdxPred, [ReadBaseRS]>, // Read base reg after shifting offset. SchedVar<NoSchedPred, [ReadDefault]>]>; // Read base reg with no shift. def : SchedAlias<ReadAdrBase, CyReadAdrBase>; // Map AArch64->Cyclone type. +def : ReadAdvance<ReadST, 0>; //--- // 7.8.9,7.8.11. Load/Store, paired diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td index 6a33258be02c..a96917c9364a 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td @@ -277,6 +277,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td index db066a19b0b6..8c5d6bbf0ceb 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td @@ -581,6 +581,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td index 0429b6ab2ee2..64f88d719aa9 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td @@ -616,6 +616,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedFalkor.td b/llvm/lib/Target/AArch64/AArch64SchedFalkor.td index 8bb95e442249..8c40efd07e8a 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedFalkor.td +++ b/llvm/lib/Target/AArch64/AArch64SchedFalkor.td @@ -111,6 +111,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; // Detailed Refinements // ----------------------------------------------------------------------------- diff --git a/llvm/lib/Target/AArch64/AArch64SchedKryo.td b/llvm/lib/Target/AArch64/AArch64SchedKryo.td index 45964e1ed6de..f824ce462fe0 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedKryo.td +++ b/llvm/lib/Target/AArch64/AArch64SchedKryo.td @@ -117,6 +117,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/AArch64/AArch64SchedTSV110.td b/llvm/lib/Target/AArch64/AArch64SchedTSV110.td index 438371c1b6a8..4a1b5167e89d 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedTSV110.td +++ b/llvm/lib/Target/AArch64/AArch64SchedTSV110.td @@ -113,6 +113,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; def : InstRW<[WriteI], (instrs COPY)>; diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX.td index 125eb284cfd1..f41f12733e69 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX.td @@ -192,6 +192,7 @@ def THXT8XWriteFSqrtDP : SchedWriteRes<[THXT8XUnitFPMDS]> { def : ReadAdvance<ReadExtrHi, 1>; def : ReadAdvance<ReadAdrBase, 2>; def : ReadAdvance<ReadVLD, 2>; +def : ReadAdvance<ReadST, 2>; // FIXME: This needs more targeted benchmarking. // ALU - Most operands in the ALU pipes are not needed for two cycles. Shiftable diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td index 8d8675b7ac6f..0da286e942a0 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td @@ -362,6 +362,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // 3. Instruction Tables. diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td index 00838cc4b9bd..8f03be9be0dd 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td @@ -621,6 +621,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // 3. Instruction Tables. diff --git a/llvm/lib/Target/AArch64/AArch64Schedule.td b/llvm/lib/Target/AArch64/AArch64Schedule.td index 49c0c1782236..4e5a67a3a394 100644 --- a/llvm/lib/Target/AArch64/AArch64Schedule.td +++ b/llvm/lib/Target/AArch64/AArch64Schedule.td @@ -47,6 +47,7 @@ def WriteAdr : SchedWrite; // Address pre/post increment. def WriteLDIdx : SchedWrite; // Load from a register index (maybe scaled). def WriteSTIdx : SchedWrite; // Store to a register index (maybe scaled). +def ReadST : SchedRead; // Read the stored value. def ReadAdrBase : SchedRead; // Read the base resister of a reg-offset LD/ST. // Serialized two-level address load. diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s index ff45caf46e21..ad49a96c27c5 100644 --- a/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s +++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s @@ -125,12 +125,12 @@ stp x0, x1, [x2], #16 # CHECK: Iterations: 100 # CHECK-NEXT: Instructions: 11800 -# CHECK-NEXT: Total Cycles: 20301 +# CHECK-NEXT: Total Cycles: 19801 # CHECK-NEXT: Total uOps: 14400 # CHECK: Dispatch Width: 2 -# CHECK-NEXT: uOps Per Cycle: 0.71 -# CHECK-NEXT: IPC: 0.58 +# CHECK-NEXT: uOps Per Cycle: 0.73 +# CHECK-NEXT: IPC: 0.60 # CHECK-NEXT: Block RThroughput: 72.0 # CHECK: Instruction Info: @@ -401,127 +401,127 @@ stp x0, x1, [x2], #16 # CHECK-NEXT: - - - - - - - - - - - 1.00 stp x0, x1, [x2], #16 # CHECK: Timeline view: -# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 -# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123 +# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 012345678 +# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 -# CHECK: [0,0] DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,1] . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16] -# CHECK-NEXT: [0,2] . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,3] . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16]! -# CHECK-NEXT: [0,4] . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,5] . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2], #16 -# CHECK-NEXT: [0,6] . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,7] . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2], #16 -# CHECK-NEXT: [0,8] . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,9] . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16]! -# CHECK-NEXT: [0,10] . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,11] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16] -# CHECK-NEXT: [0,12] . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,13] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2], #16 -# CHECK-NEXT: [0,14] . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,15] . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16]! -# CHECK-NEXT: [0,16] . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,17] . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16] -# CHECK-NEXT: [0,18] . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,19] . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2], #16 -# CHECK-NEXT: [0,20] . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,21] . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16]! -# CHECK-NEXT: [0,22] . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,23] . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16] -# CHECK-NEXT: [0,24] . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,25] . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2], #16 -# CHECK-NEXT: [0,26] . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,27] . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16]! -# CHECK-NEXT: [0,28] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,29] . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16] -# CHECK-NEXT: [0,30] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,31] . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2], #16 -# CHECK-NEXT: [0,32] . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,33] . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16]! -# CHECK-NEXT: [0,34] . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,35] . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16] -# CHECK-NEXT: [0,36] . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,37] . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2], #16 -# CHECK-NEXT: [0,38] . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,39] . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16]! -# CHECK-NEXT: [0,40] . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,41] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16] -# CHECK-NEXT: [0,42] . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,43] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2], #16 -# CHECK-NEXT: [0,44] . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,45] . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16]! -# CHECK-NEXT: [0,46] . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,47] . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16] -# CHECK-NEXT: [0,48] . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,49] . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2], #16 -# CHECK-NEXT: [0,50] . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,51] . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16]! -# CHECK-NEXT: [0,52] . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,53] . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16] -# CHECK-NEXT: [0,54] . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,55] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . str d0, [x2, x2, lsl #3] -# CHECK-NEXT: [0,56] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,57] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . str q0, [x2, w0, sxtw] -# CHECK-NEXT: [0,58] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,59] . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . str w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,60] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,61] . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . str x0, [x2, w0, sxtw] -# CHECK-NEXT: [0,62] . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,63] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strb w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,64] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,65] . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strh w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,66] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,67] . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . stur b0, [x2, #255] -# CHECK-NEXT: [0,68] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,69] . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . stur d0, [x2, #255] -# CHECK-NEXT: [0,70] . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,71] . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . stur h0, [x2, #255] -# CHECK-NEXT: [0,72] . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,73] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur q0, [x2, #255] -# CHECK-NEXT: [0,74] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,75] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur s0, [x2, #255] -# CHECK-NEXT: [0,76] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,77] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . stur w0, [x2, #255] -# CHECK-NEXT: [0,78] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,79] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . sturb w0, [x2, #255] -# CHECK-NEXT: [0,80] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,81] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . sturh w0, [x2, #255] -# CHECK-NEXT: [0,82] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,83] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp d0, d1, [x2, #16] -# CHECK-NEXT: [0,84] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,85] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp q0, q1, [x2, #16] -# CHECK-NEXT: [0,86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,87] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . stnp s0, s1, [x2, #16] -# CHECK-NEXT: [0,88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,89] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . stnp s0, s1, [x2, #16] -# CHECK-NEXT: [0,90] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,91] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . stnp w0, w1, [x2, #16] -# CHECK-NEXT: [0,92] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,93] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stnp x0, x1, [x2, #16] -# CHECK-NEXT: [0,94] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,95] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stp d0, d1, [x2, #16] -# CHECK-NEXT: [0,96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,97] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . stp d0, d1, [x2, #16]! -# CHECK-NEXT: [0,98] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,99] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . stp d0, d1, [x2], #16 -# CHECK-NEXT: [0,100] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . stp q0, q1, [x2, #16] -# CHECK-NEXT: [0,102] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,103] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . stp q0, q1, [x2, #16]! -# CHECK-NEXT: [0,104] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,105] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . stp q0, q1, [x2], #16 -# CHECK-NEXT: [0,106] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . add x2, x3, #1 -# CHECK-NEXT: [0,107] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16] -# CHECK-NEXT: [0,108] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . add x2, x3, #1 -# CHECK-NEXT: [0,109] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16]! -# CHECK-NEXT: [0,110] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . add x2, x3, #1 -# CHECK-NEXT: [0,111] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . stp s0, s1, [x2], #16 -# CHECK-NEXT: [0,112] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 -# CHECK-NEXT: [0,113] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . stp x0, x1, [x2, #16] -# CHECK-NEXT: [0,114] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 -# CHECK-NEXT: [0,115] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . stp x0, x1, [x2, #16]! -# CHECK-NEXT: [0,116] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE. add x2, x3, #1 -# CHECK-NEXT: [0,117] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE stp x0, x1, [x2], #16 +# CHECK: [0,0] DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,1] . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16] +# CHECK-NEXT: [0,2] . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,3] . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16]! +# CHECK-NEXT: [0,4] . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,5] . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2], #16 +# CHECK-NEXT: [0,6] . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,7] . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2], #16 +# CHECK-NEXT: [0,8] . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,9] . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16]! +# CHECK-NEXT: [0,10] . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,11] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16] +# CHECK-NEXT: [0,12] . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,13] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2], #16 +# CHECK-NEXT: [0,14] . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,15] . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16]! +# CHECK-NEXT: [0,16] . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,17] . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16] +# CHECK-NEXT: [0,18] . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,19] . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2], #16 +# CHECK-NEXT: [0,20] . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,21] . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16]! +# CHECK-NEXT: [0,22] . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,23] . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16] +# CHECK-NEXT: [0,24] . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,25] . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2], #16 +# CHECK-NEXT: [0,26] . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,27] . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16]! +# CHECK-NEXT: [0,28] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,29] . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16] +# CHECK-NEXT: [0,30] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,31] . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2], #16 +# CHECK-NEXT: [0,32] . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,33] . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16]! +# CHECK-NEXT: [0,34] . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,35] . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16] +# CHECK-NEXT: [0,36] . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,37] . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2], #16 +# CHECK-NEXT: [0,38] . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,39] . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16]! +# CHECK-NEXT: [0,40] . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,41] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16] +# CHECK-NEXT: [0,42] . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,43] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2], #16 +# CHECK-NEXT: [0,44] . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,45] . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16]! +# CHECK-NEXT: [0,46] . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,47] . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16] +# CHECK-NEXT: [0,48] . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,49] . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2], #16 +# CHECK-NEXT: [0,50] . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,51] . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16]! +# CHECK-NEXT: [0,52] . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,53] . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16] +# CHECK-NEXT: [0,54] . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,55] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . str d0, [x2, x2, lsl #3] +# CHECK-NEXT: [0,56] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,57] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . str q0, [x2, w0, sxtw] +# CHECK-NEXT: [0,58] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,59] . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . str w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,60] . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,61] . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . str x0, [x2, w0, sxtw] +# CHECK-NEXT: [0,62] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,63] . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strb w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,64] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,65] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strh w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,66] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,67] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . stur b0, [x2, #255] +# CHECK-NEXT: [0,68] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,69] . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . stur d0, [x2, #255] +# CHECK-NEXT: [0,70] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,71] . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . stur h0, [x2, #255] +# CHECK-NEXT: [0,72] . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,73] . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur q0, [x2, #255] +# CHECK-NEXT: [0,74] . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,75] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur s0, [x2, #255] +# CHECK-NEXT: [0,76] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,77] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . stur w0, [x2, #255] +# CHECK-NEXT: [0,78] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,79] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . sturb w0, [x2, #255] +# CHECK-NEXT: [0,80] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,81] . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . sturh w0, [x2, #255] +# CHECK-NEXT: [0,82] . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,83] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp d0, d1, [x2, #16] +# CHECK-NEXT: [0,84] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,85] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp q0, q1, [x2, #16] +# CHECK-NEXT: [0,86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,87] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . stnp s0, s1, [x2, #16] +# CHECK-NEXT: [0,88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,89] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . stnp s0, s1, [x2, #16] +# CHECK-NEXT: [0,90] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,91] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . stnp w0, w1, [x2, #16] +# CHECK-NEXT: [0,92] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,93] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stnp x0, x1, [x2, #16] +# CHECK-NEXT: [0,94] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,95] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stp d0, d1, [x2, #16] +# CHECK-NEXT: [0,96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,97] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . stp d0, d1, [x2, #16]! +# CHECK-NEXT: [0,98] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,99] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . stp d0, d1, [x2], #16 +# CHECK-NEXT: [0,100] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . stp q0, q1, [x2, #16] +# CHECK-NEXT: [0,102] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,103] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . stp q0, q1, [x2, #16]! +# CHECK-NEXT: [0,104] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,105] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . stp q0, q1, [x2], #16 +# CHECK-NEXT: [0,106] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . add x2, x3, #1 +# CHECK-NEXT: [0,107] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16] +# CHECK-NEXT: [0,108] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . add x2, x3, #1 +# CHECK-NEXT: [0,109] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16]! +# CHECK-NEXT: [0,110] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . add x2, x3, #1 +# CHECK-NEXT: [0,111] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . stp s0, s1, [x2], #16 +# CHECK-NEXT: [0,112] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 +# CHECK-NEXT: [0,113] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . stp x0, x1, [x2, #16] +# CHECK-NEXT: [0,114] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 +# CHECK-NEXT: [0,115] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . stp x0, x1, [x2, #16]! +# CHECK-NEXT: [0,116] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE. add x2, x3, #1 +# CHECK-NEXT: [0,117] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE stp x0, x1, [x2], #16 # CHECK: Average Wait times (based on the timeline view): # CHECK-NEXT: [0]: Executions </cut>

3 years, 9 months

1
0
0 0

[TCWG CI] Regression caused by llvm:a26f1bf67ec70f72e64101cf483b26466928fc38

by ci_notify＠linaro.org

Identified regression caused by *llvm:a26f1bf67ec70f72e64101cf483b26466928fc38*: commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> [PassManager] Run additional LICM before LoopRotate Results regressed to (for first_bad == a26f1bf67ec70f72e64101cf483b26466928fc38) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os_mthumb artifacts/build-a26f1bf67ec70f72e64101cf483b26466928fc38/results_id: 1 # 447.dealII,[.] SparseMatrix<double>::vmult<Vector<double>. Ve regressed by 111 from (for last_good == bb1e5399e4586239d6424f5eea5a9f06c52ebe9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os_mthumb artifacts/build-bb1e5399e4586239d6424f5eea5a9f06c52ebe9b/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 cd investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a26f1bf67ec70f72e64101cf483b26466928fc38 ../artifacts/test.sh # Reproduce last_good build git checkout --detach bb1e5399e4586239d6424f5eea5a9f06c52ebe9b ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 --- llvm/lib/Passes/PassBuilder.cpp | 10 +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp | 4 + llvm/test/CodeGen/AMDGPU/opt-pipeline.ll | 30 +++++--- llvm/test/Other/new-pm-defaults.ll | 7 +- llvm/test/Other/new-pm-thinlto-defaults.ll | 7 +- .../Other/new-pm-thinlto-postlink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-postlink-samplepgo-defaults.ll | 7 +- .../Other/new-pm-thinlto-prelink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-prelink-samplepgo-defaults.ll | 5 +- llvm/test/Other/opt-O2-pipeline.ll | 10 ++- llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | 10 ++- llvm/test/Other/opt-O3-pipeline.ll | 10 ++- llvm/test/Other/opt-Os-pipeline.ll | 10 ++- llvm/test/Other/pass-pipelines.ll | 3 + llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll | 25 ++++--- .../PhaseOrdering/X86/spurious-peeling.ll | 87 +++++++++------------- llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll | 78 +++++++++---------- .../loop-rotation-vs-common-code-hoisting.ll | 22 +++--- 18 files changed, 193 insertions(+), 150 deletions(-) diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 3a325277e370..5a2285215769 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -568,6 +568,11 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true, isLTOPreLink(Phase))); // TODO: Investigate promotion cap for O1. @@ -736,6 +741,11 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + // Disable header duplication in loop rotation at -Oz. LPM1.addPass( LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase))); diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 109e7c97ff1b..2c80a16febef 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -431,6 +431,10 @@ void PassManagerBuilder::addFunctionSimplificationPasses( MPM.add(createLoopInstSimplifyPass()); MPM.add(createLoopSimplifyCFGPass()); } + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap)); // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO)); // TODO: Investigate promotion cap for O1. diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 34e5e6c647da..5e33d968c710 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -129,16 +129,20 @@ ; GCN-O1-NEXT: Simplify the CFG ; GCN-O1-NEXT: Reassociate expressions ; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O1-NEXT: Function Alias Analysis Results +; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Natural Loop Information ; GCN-O1-NEXT: Canonicalize natural loops ; GCN-O1-NEXT: LCSSA Verifier ; GCN-O1-NEXT: Loop-Closed SSA Form Pass -; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O1-NEXT: Function Alias Analysis Results ; GCN-O1-NEXT: Scalar Evolution Analysis +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Loop Pass Manager +; GCN-O1-NEXT: Loop Invariant Code Motion ; GCN-O1-NEXT: Loop Pass Manager ; GCN-O1-NEXT: Rotate Loops -; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Lazy Branch Probability Analysis ; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: Loop Pass Manager @@ -451,16 +455,20 @@ ; GCN-O2-NEXT: Simplify the CFG ; GCN-O2-NEXT: Reassociate expressions ; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O2-NEXT: Function Alias Analysis Results +; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Natural Loop Information ; GCN-O2-NEXT: Canonicalize natural loops ; GCN-O2-NEXT: LCSSA Verifier ; GCN-O2-NEXT: Loop-Closed SSA Form Pass -; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O2-NEXT: Function Alias Analysis Results ; GCN-O2-NEXT: Scalar Evolution Analysis +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis +; GCN-O2-NEXT: Loop Pass Manager +; GCN-O2-NEXT: Loop Invariant Code Motion ; GCN-O2-NEXT: Loop Pass Manager ; GCN-O2-NEXT: Rotate Loops -; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Lazy Branch Probability Analysis ; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: Loop Pass Manager @@ -810,16 +818,20 @@ ; GCN-O3-NEXT: Simplify the CFG ; GCN-O3-NEXT: Reassociate expressions ; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O3-NEXT: Function Alias Analysis Results +; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Natural Loop Information ; GCN-O3-NEXT: Canonicalize natural loops ; GCN-O3-NEXT: LCSSA Verifier ; GCN-O3-NEXT: Loop-Closed SSA Form Pass -; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O3-NEXT: Function Alias Analysis Results ; GCN-O3-NEXT: Scalar Evolution Analysis +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis +; GCN-O3-NEXT: Loop Pass Manager +; GCN-O3-NEXT: Loop Invariant Code Motion ; GCN-O3-NEXT: Loop Pass Manager ; GCN-O3-NEXT: Rotate Loops -; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Lazy Branch Probability Analysis ; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll index 01b02b8fd482..337a0857701c 100644 --- a/llvm/test/Other/new-pm-defaults.ll +++ b/llvm/test/Other/new-pm-defaults.ll @@ -113,9 +113,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}> ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -156,6 +156,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-defaults.ll b/llvm/test/Other/new-pm-thinlto-defaults.ll index fbf47de87eeb..bba43dd50e7a 100644 --- a/llvm/test/Other/new-pm-thinlto-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-defaults.ll @@ -98,9 +98,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -139,6 +139,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll index 4bcf70e15a5b..57f0e0da73b6 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll @@ -68,10 +68,10 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -112,6 +112,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll index 1071d28432b9..0e0e2854b8df 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll @@ -78,9 +78,9 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -121,6 +121,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll index e2f1385cf52b..4cfb9825c97e 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll @@ -93,10 +93,10 @@ ; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo @@ -158,6 +158,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll index d4dc552aea01..a05555c57003 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll @@ -73,8 +73,8 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis @@ -116,6 +116,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index f7217c122fdb..a3b01e5464d4 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -101,16 +101,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll index 6b98c1f80d9e..fafd5c8fdcb8 100644 --- a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll +++ b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index 00a1d61ac058..103d49bbbbab 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index 21f9b8c6009e..508c21edbc68 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -87,16 +87,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/pass-pipelines.ll b/llvm/test/Other/pass-pipelines.ll index ccd364d5d740..768e8343529e 100644 --- a/llvm/test/Other/pass-pipelines.ll +++ b/llvm/test/Other/pass-pipelines.ll @@ -53,6 +53,9 @@ ; CHECK-O2-NEXT: FunctionPass Manager ; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager +; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager ; CHECK-O2-NOT: Manager ; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll index 82deee9f367b..8f43029fa303 100644 --- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll +++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll @@ -22,30 +22,33 @@ define dso_local i32 @main() { ; CHECK-NEXT: bb: ; CHECK-NEXT: [[I6:%.*]] = load i32, i32* @a, align 4 ; CHECK-NEXT: [[I24:%.*]] = load i32, i32* @b, align 4 -; CHECK-NEXT: [[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4 -; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]] +; CHECK-NEXT: [[D_PROMOTED7:%.*]] = load i32, i32* @d, align 4 +; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]] ; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0 -; CHECK-NEXT: br label [[BB1:%.*]] -; CHECK: bb1: -; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.*]], label [[BB19_PREHEADER:%.*]] -; CHECK: bb19.preheader: +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.*]], label [[BB27_PREHEADER:%.*]] +; CHECK: bb27.preheader: ; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]] ; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4 ; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0 -; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]] -; CHECK: bb13.preheader.bb27.thread.split_crit_edge: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 +; CHECK-NEXT: br label [[BB27:%.*]] +; CHECK: bb27.thread: ; CHECK-NEXT: store i32 0, i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 0, i32* @c, align 4 ; CHECK-NEXT: br label [[BB32:%.*]] +; CHECK: bb27: +; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB36:%.*]] ; CHECK: bb32.loopexit: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: br label [[BB32]] ; CHECK: bb32: -; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ] +; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ] ; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4 ; CHECK-NEXT: ret i32 0 +; CHECK: bb36: +; CHECK-NEXT: store i32 1, i32* @c, align 4 +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]] ; bb: %i = alloca i32, align 4 diff --git a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll index 3e659414d982..4661bd8a36cc 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll @@ -16,32 +16,28 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; OLDPM-NEXT: entry: ; OLDPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; OLDPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; OLDPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; OLDPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; OLDPM: for.body7.lr.ph.i: ; OLDPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; OLDPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] -; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]] -; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; OLDPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]] +; OLDPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; OLDPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] +; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]] +; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; OLDPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]] +; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; OLDPM: for.body7.i: -; OLDPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[J_012_I:%.*]] = phi i32 [ [[INC_I:%.*]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; OLDPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]] +; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1 +; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 ; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] ; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; OLDPM: _ZN12FloatVecPair6vecIncEv.exit: @@ -51,39 +47,30 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; NEWPM-NEXT: entry: ; NEWPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; NEWPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; NEWPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; NEWPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; NEWPM: for.body7.lr.ph.i: ; NEWPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; NEWPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] -; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]] -; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; NEWPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]] -; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1 +; NEWPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; NEWPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] +; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]] +; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; NEWPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]] ; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; NEWPM: for.body7.i: -; NEWPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[INC_I_PHI:%.*]] = phi i32 [ [[INC_I_0:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; NEWPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]] +; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]] -; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]] -; NEWPM: for.body7.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1 -; NEWPM-NEXT: br label [[FOR_BODY7_I]] +; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 +; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] +; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; NEWPM: _ZN12FloatVecPair6vecIncEv.exit: ; NEWPM-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll index 280f849dbb35..8b8b535f1a77 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll @@ -15,18 +15,18 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-LABEL: @vdiv( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[N:%.*]], 0 -; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_END:%.*]] -; CHECK: for.body.lr.ph: +; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]] +; CHECK: for.body.preheader: ; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 ; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4 -; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.*]], label [[VECTOR_MEMCHECK:%.*]] +; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.*]], label [[VECTOR_MEMCHECK:%.*]] ; CHECK: vector.memcheck: ; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr double, double* [[X:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[SCEVGEP6:%.*]] = getelementptr double, double* [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt double* [[SCEVGEP6]], [[X]] ; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt double* [[SCEVGEP]], [[Y]] ; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] -; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]] +; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]] ; CHECK: vector.ph: ; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292 ; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[A:%.*]], i32 0 @@ -49,39 +49,39 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[NITER:%.*]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP9:%.*]] = bitcast double* [[TMP8]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7 ; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]] ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4 ; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP14:%.*]] = bitcast double* [[TMP13]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]] ; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP16]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8 ; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP18]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]] ; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP22:%.*]] = bitcast double* [[TMP21]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12 ; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP24:%.*]] = bitcast double* [[TMP23]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]] ; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP27:%.*]] = bitcast double* [[TMP26]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16 ; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4 ; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0 -; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] ; CHECK: middle.block.unr-lcssa: ; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0 @@ -94,78 +94,78 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[EPIL_ITER:%.*]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.*]], [[VECTOR_BODY_EPIL]] ] ; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP30:%.*]] = bitcast double* [[TMP29]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]] ; CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP33:%.*]] = bitcast double* [[TMP32]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4 ; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1 ; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]] -; CHECK: for.body.preheader: -; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]] +; CHECK: for.body.preheader8: +; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1 ; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 -; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0 -; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] +; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 +; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0 +; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] ; CHECK: for.body.prol.preheader: ; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]] ; CHECK: for.body.prol: ; CHECK-NEXT: [[INDVARS_IV_PROL:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.*]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ] -; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ] +; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ] ; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]] ; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1 ; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1 ; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]] ; CHECK: for.body.prol.loopexit: -; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] +; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] ; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3 -; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]] -; CHECK: for.body.preheader.new: +; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]] +; CHECK: for.body.preheader8.new: ; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]] ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV]] -; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1 ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]] ; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2 ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]] ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3 ; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]] ; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4 ; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]] ; CHECK: for.end: ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll index 7d7d18a5247d..bb320af193e3 100644 --- a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll +++ b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll @@ -76,18 +76,20 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_OLDPM: for.cond.preheader: +; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_OLDPM: for.body.preheader: ; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]] +; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_OLDPM: for.cond.cleanup: ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f2() ; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_OLDPM: for.body: -; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ] +; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f1() -; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1 +; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1 ; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]] ; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]] ; ROTATED_LATER_OLDPM: return: @@ -98,24 +100,24 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_NEWPM: for.cond.preheader: +; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_NEWPM: for.body.preheader: ; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]] -; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_NEWPM: for.cond.cleanup: ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f2() ; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_NEWPM: for.body: -; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ] +; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f1() ; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]] ; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]] ; ROTATED_LATER_NEWPM: return: ; ROTATED_LATER_NEWPM-NEXT: ret void </cut>

3 years, 9 months

1
0
0 0

[ACTIVITY] report week ending 10 Sep

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Respin of a linux-user cleanup patchset + Code review, as usual * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Working on version 2 of the "optimized code gen for MVE" patchset; this now covers all the insns that have an easy optimized version. -- PMM

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-master-aarch64-lts-allyesconfig - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_kernel/gnu-master-aarch64-lts-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-allyesconfig Culprit: <cut> commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 Author: qing zhao <qing.zhao(a)oracle.com> Date: Thu Sep 9 15:44:49 2021 -0700 Add -ftrivial-auto-var-init option and uninitialized variable attribute. Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, -Wuninitialized will still report warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. You can control this behavior for a specific variable by using the variable attribute "uninitialized" to control runtime overhead. gcc/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * builtins.c (expand_builtin_memset): Make external visible. * builtins.h (expand_builtin_memset): Declare extern. * common.opt (ftrivial-auto-var-init=): New option. * doc/extend.texi: Document the uninitialized attribute. * doc/invoke.texi: Document -ftrivial-auto-var-init. * flag-types.h (enum auto_init_type): New enumerated type auto_init_type. * gimple-fold.c (clear_padding_type): Add one new parameter. (clear_padding_union): Likewise. (clear_padding_emit_loop): Likewise. (clear_type_padding_in_mask): Likewise. (gimple_fold_builtin_clear_padding): Handle this new parameter. * gimplify.c (gimple_add_init_for_auto_var): New function. (gimple_add_padding_init_for_auto_var): New function. (is_var_need_auto_init): New function. (gimplify_decl_expr): Add initialization to automatic variables per users' requests. (gimplify_call_expr): Add one new parameter for call to __builtin_clear_padding. (gimplify_init_constructor): Add padding initialization in the end. * internal-fn.c (INIT_PATTERN_VALUE): New macro. (expand_DEFERRED_INIT): New function. * internal-fn.def (DEFERRED_INIT): New internal function. * tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT. * tree-sra.c (generate_subtree_deferred_init): New function. (scan_function): Avoid setting cannot_scalarize_away_bitmap for calls to .DEFERRED_INIT. (sra_modify_deferred_init): New function. (sra_modify_function_body): Handle calls to DEFERRED_INIT specially. * tree-ssa-structalias.c (find_func_aliases_for_call): Likewise. * tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT specially. (check_defs): Likewise. (warn_uninitialized_vars): Likewise. * tree-ssa.c (ssa_undefined_value_p): Likewise. * tree.c (build_common_builtin_nodes): Build tree node for BUILT_IN_CLEAR_PADDING when needed. gcc/c-family/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-attribs.c (handle_uninitialized_attribute): New function. (c_common_attribute_table): Add "uninitialized" attribute. gcc/testsuite/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-c++-common/auto-init-1.c: New test. * c-c++-common/auto-init-10.c: New test. * c-c++-common/auto-init-11.c: New test. * c-c++-common/auto-init-12.c: New test. * c-c++-common/auto-init-13.c: New test. * c-c++-common/auto-init-14.c: New test. * c-c++-common/auto-init-15.c: New test. * c-c++-common/auto-init-16.c: New test. * c-c++-common/auto-init-2.c: New test. * c-c++-common/auto-init-3.c: New test. * c-c++-common/auto-init-4.c: New test. * c-c++-common/auto-init-5.c: New test. * c-c++-common/auto-init-6.c: New test. * c-c++-common/auto-init-7.c: New test. * c-c++-common/auto-init-8.c: New test. * c-c++-common/auto-init-9.c: New test. * c-c++-common/auto-init-esra.c: New test. * c-c++-common/auto-init-padding-1.c: New test. * c-c++-common/auto-init-padding-2.c: New test. * c-c++-common/auto-init-padding-3.c: New test. * g++.dg/auto-init-uninit-pred-1_a.C: New test. * g++.dg/auto-init-uninit-pred-2_a.C: New test. * g++.dg/auto-init-uninit-pred-3_a.C: New test. * g++.dg/auto-init-uninit-pred-4.C: New test. * gcc.dg/auto-init-sra-1.c: New test. * gcc.dg/auto-init-sra-2.c: New test. * gcc.dg/auto-init-uninit-1.c: New test. * gcc.dg/auto-init-uninit-12.c: New test. * gcc.dg/auto-init-uninit-13.c: New test. * gcc.dg/auto-init-uninit-14.c: New test. * gcc.dg/auto-init-uninit-15.c: New test. * gcc.dg/auto-init-uninit-16.c: New test. * gcc.dg/auto-init-uninit-17.c: New test. * gcc.dg/auto-init-uninit-18.c: New test. * gcc.dg/auto-init-uninit-19.c: New test. * gcc.dg/auto-init-uninit-2.c: New test. * gcc.dg/auto-init-uninit-20.c: New test. * gcc.dg/auto-init-uninit-21.c: New test. * gcc.dg/auto-init-uninit-22.c: New test. * gcc.dg/auto-init-uninit-23.c: New test. * gcc.dg/auto-init-uninit-24.c: New test. * gcc.dg/auto-init-uninit-25.c: New test. * gcc.dg/auto-init-uninit-26.c: New test. * gcc.dg/auto-init-uninit-3.c: New test. * gcc.dg/auto-init-uninit-34.c: New test. * gcc.dg/auto-init-uninit-36.c: New test. * gcc.dg/auto-init-uninit-37.c: New test. * gcc.dg/auto-init-uninit-4.c: New test. * gcc.dg/auto-init-uninit-5.c: New test. * gcc.dg/auto-init-uninit-6.c: New test. * gcc.dg/auto-init-uninit-8.c: New test. * gcc.dg/auto-init-uninit-9.c: New test. * gcc.dg/auto-init-uninit-A.c: New test. * gcc.dg/auto-init-uninit-B.c: New test. * gcc.dg/auto-init-uninit-C.c: New test. * gcc.dg/auto-init-uninit-H.c: New test. * gcc.dg/auto-init-uninit-I.c: New test. * gcc.target/aarch64/auto-init-1.c: New test. * gcc.target/aarch64/auto-init-2.c: New test. * gcc.target/aarch64/auto-init-3.c: New test. * gcc.target/aarch64/auto-init-4.c: New test. * gcc.target/aarch64/auto-init-5.c: New test. * gcc.target/aarch64/auto-init-6.c: New test. * gcc.target/aarch64/auto-init-7.c: New test. * gcc.target/aarch64/auto-init-8.c: New test. * gcc.target/aarch64/auto-init-padding-1.c: New test. * gcc.target/aarch64/auto-init-padding-10.c: New test. * gcc.target/aarch64/auto-init-padding-11.c: New test. * gcc.target/aarch64/auto-init-padding-12.c: New test. * gcc.target/aarch64/auto-init-padding-2.c: New test. * gcc.target/aarch64/auto-init-padding-3.c: New test. * gcc.target/aarch64/auto-init-padding-4.c: New test. * gcc.target/aarch64/auto-init-padding-5.c: New test. * gcc.target/aarch64/auto-init-padding-6.c: New test. * gcc.target/aarch64/auto-init-padding-7.c: New test. * gcc.target/aarch64/auto-init-padding-8.c: New test. * gcc.target/aarch64/auto-init-padding-9.c: New test. * gcc.target/i386/auto-init-1.c: New test. * gcc.target/i386/auto-init-2.c: New test. * gcc.target/i386/auto-init-21.c: New test. * gcc.target/i386/auto-init-22.c: New test. * gcc.target/i386/auto-init-23.c: New test. * gcc.target/i386/auto-init-24.c: New test. * gcc.target/i386/auto-init-3.c: New test. * gcc.target/i386/auto-init-4.c: New test. * gcc.target/i386/auto-init-5.c: New test. * gcc.target/i386/auto-init-6.c: New test. * gcc.target/i386/auto-init-7.c: New test. * gcc.target/i386/auto-init-8.c: New test. * gcc.target/i386/auto-init-padding-1.c: New test. * gcc.target/i386/auto-init-padding-10.c: New test. * gcc.target/i386/auto-init-padding-11.c: New test. * gcc.target/i386/auto-init-padding-12.c: New test. * gcc.target/i386/auto-init-padding-2.c: New test. * gcc.target/i386/auto-init-padding-3.c: New test. * gcc.target/i386/auto-init-padding-4.c: New test. * gcc.target/i386/auto-init-padding-5.c: New test. * gcc.target/i386/auto-init-padding-6.c: New test. * gcc.target/i386/auto-init-padding-7.c: New test. * gcc.target/i386/auto-init-padding-8.c: New test. * gcc.target/i386/auto-init-padding-9.c: New test. </cut> Results regressed to (for first_bad == a25e0b5e6ac8a77a71c229e0a7b744603365b0e9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29 # First few build errors in logs: from (for last_good == 5fe0865ab788bdc387b284a3ad57e5a95a767b18) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19270 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 cd investigate-gcc-a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5fe0865ab788bdc387b284a3ad57e5a95a767b18 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Full commit (up to 1000 lines): <cut> commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 Author: qing zhao <qing.zhao(a)oracle.com> Date: Thu Sep 9 15:44:49 2021 -0700 Add -ftrivial-auto-var-init option and uninitialized variable attribute. Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, -Wuninitialized will still report warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. You can control this behavior for a specific variable by using the variable attribute "uninitialized" to control runtime overhead. gcc/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * builtins.c (expand_builtin_memset): Make external visible. * builtins.h (expand_builtin_memset): Declare extern. * common.opt (ftrivial-auto-var-init=): New option. * doc/extend.texi: Document the uninitialized attribute. * doc/invoke.texi: Document -ftrivial-auto-var-init. * flag-types.h (enum auto_init_type): New enumerated type auto_init_type. * gimple-fold.c (clear_padding_type): Add one new parameter. (clear_padding_union): Likewise. (clear_padding_emit_loop): Likewise. (clear_type_padding_in_mask): Likewise. (gimple_fold_builtin_clear_padding): Handle this new parameter. * gimplify.c (gimple_add_init_for_auto_var): New function. (gimple_add_padding_init_for_auto_var): New function. (is_var_need_auto_init): New function. (gimplify_decl_expr): Add initialization to automatic variables per users' requests. (gimplify_call_expr): Add one new parameter for call to __builtin_clear_padding. (gimplify_init_constructor): Add padding initialization in the end. * internal-fn.c (INIT_PATTERN_VALUE): New macro. (expand_DEFERRED_INIT): New function. * internal-fn.def (DEFERRED_INIT): New internal function. * tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT. * tree-sra.c (generate_subtree_deferred_init): New function. (scan_function): Avoid setting cannot_scalarize_away_bitmap for calls to .DEFERRED_INIT. (sra_modify_deferred_init): New function. (sra_modify_function_body): Handle calls to DEFERRED_INIT specially. * tree-ssa-structalias.c (find_func_aliases_for_call): Likewise. * tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT specially. (check_defs): Likewise. (warn_uninitialized_vars): Likewise. * tree-ssa.c (ssa_undefined_value_p): Likewise. * tree.c (build_common_builtin_nodes): Build tree node for BUILT_IN_CLEAR_PADDING when needed. gcc/c-family/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-attribs.c (handle_uninitialized_attribute): New function. (c_common_attribute_table): Add "uninitialized" attribute. gcc/testsuite/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-c++-common/auto-init-1.c: New test. * c-c++-common/auto-init-10.c: New test. * c-c++-common/auto-init-11.c: New test. * c-c++-common/auto-init-12.c: New test. * c-c++-common/auto-init-13.c: New test. * c-c++-common/auto-init-14.c: New test. * c-c++-common/auto-init-15.c: New test. * c-c++-common/auto-init-16.c: New test. * c-c++-common/auto-init-2.c: New test. * c-c++-common/auto-init-3.c: New test. * c-c++-common/auto-init-4.c: New test. * c-c++-common/auto-init-5.c: New test. * c-c++-common/auto-init-6.c: New test. * c-c++-common/auto-init-7.c: New test. * c-c++-common/auto-init-8.c: New test. * c-c++-common/auto-init-9.c: New test. * c-c++-common/auto-init-esra.c: New test. * c-c++-common/auto-init-padding-1.c: New test. * c-c++-common/auto-init-padding-2.c: New test. * c-c++-common/auto-init-padding-3.c: New test. * g++.dg/auto-init-uninit-pred-1_a.C: New test. * g++.dg/auto-init-uninit-pred-2_a.C: New test. * g++.dg/auto-init-uninit-pred-3_a.C: New test. * g++.dg/auto-init-uninit-pred-4.C: New test. * gcc.dg/auto-init-sra-1.c: New test. * gcc.dg/auto-init-sra-2.c: New test. * gcc.dg/auto-init-uninit-1.c: New test. * gcc.dg/auto-init-uninit-12.c: New test. * gcc.dg/auto-init-uninit-13.c: New test. * gcc.dg/auto-init-uninit-14.c: New test. * gcc.dg/auto-init-uninit-15.c: New test. * gcc.dg/auto-init-uninit-16.c: New test. * gcc.dg/auto-init-uninit-17.c: New test. * gcc.dg/auto-init-uninit-18.c: New test. * gcc.dg/auto-init-uninit-19.c: New test. * gcc.dg/auto-init-uninit-2.c: New test. * gcc.dg/auto-init-uninit-20.c: New test. * gcc.dg/auto-init-uninit-21.c: New test. * gcc.dg/auto-init-uninit-22.c: New test. * gcc.dg/auto-init-uninit-23.c: New test. * gcc.dg/auto-init-uninit-24.c: New test. * gcc.dg/auto-init-uninit-25.c: New test. * gcc.dg/auto-init-uninit-26.c: New test. * gcc.dg/auto-init-uninit-3.c: New test. * gcc.dg/auto-init-uninit-34.c: New test. * gcc.dg/auto-init-uninit-36.c: New test. * gcc.dg/auto-init-uninit-37.c: New test. * gcc.dg/auto-init-uninit-4.c: New test. * gcc.dg/auto-init-uninit-5.c: New test. * gcc.dg/auto-init-uninit-6.c: New test. * gcc.dg/auto-init-uninit-8.c: New test. * gcc.dg/auto-init-uninit-9.c: New test. * gcc.dg/auto-init-uninit-A.c: New test. * gcc.dg/auto-init-uninit-B.c: New test. * gcc.dg/auto-init-uninit-C.c: New test. * gcc.dg/auto-init-uninit-H.c: New test. * gcc.dg/auto-init-uninit-I.c: New test. * gcc.target/aarch64/auto-init-1.c: New test. * gcc.target/aarch64/auto-init-2.c: New test. * gcc.target/aarch64/auto-init-3.c: New test. * gcc.target/aarch64/auto-init-4.c: New test. * gcc.target/aarch64/auto-init-5.c: New test. * gcc.target/aarch64/auto-init-6.c: New test. * gcc.target/aarch64/auto-init-7.c: New test. * gcc.target/aarch64/auto-init-8.c: New test. * gcc.target/aarch64/auto-init-padding-1.c: New test. * gcc.target/aarch64/auto-init-padding-10.c: New test. * gcc.target/aarch64/auto-init-padding-11.c: New test. * gcc.target/aarch64/auto-init-padding-12.c: New test. * gcc.target/aarch64/auto-init-padding-2.c: New test. * gcc.target/aarch64/auto-init-padding-3.c: New test. * gcc.target/aarch64/auto-init-padding-4.c: New test. * gcc.target/aarch64/auto-init-padding-5.c: New test. * gcc.target/aarch64/auto-init-padding-6.c: New test. * gcc.target/aarch64/auto-init-padding-7.c: New test. * gcc.target/aarch64/auto-init-padding-8.c: New test. * gcc.target/aarch64/auto-init-padding-9.c: New test. * gcc.target/i386/auto-init-1.c: New test. * gcc.target/i386/auto-init-2.c: New test. * gcc.target/i386/auto-init-21.c: New test. * gcc.target/i386/auto-init-22.c: New test. * gcc.target/i386/auto-init-23.c: New test. * gcc.target/i386/auto-init-24.c: New test. * gcc.target/i386/auto-init-3.c: New test. * gcc.target/i386/auto-init-4.c: New test. * gcc.target/i386/auto-init-5.c: New test. * gcc.target/i386/auto-init-6.c: New test. * gcc.target/i386/auto-init-7.c: New test. * gcc.target/i386/auto-init-8.c: New test. * gcc.target/i386/auto-init-padding-1.c: New test. * gcc.target/i386/auto-init-padding-10.c: New test. * gcc.target/i386/auto-init-padding-11.c: New test. * gcc.target/i386/auto-init-padding-12.c: New test. * gcc.target/i386/auto-init-padding-2.c: New test. * gcc.target/i386/auto-init-padding-3.c: New test. * gcc.target/i386/auto-init-padding-4.c: New test. * gcc.target/i386/auto-init-padding-5.c: New test. * gcc.target/i386/auto-init-padding-6.c: New test. * gcc.target/i386/auto-init-padding-7.c: New test. * gcc.target/i386/auto-init-padding-8.c: New test. * gcc.target/i386/auto-init-padding-9.c: New test. --- gcc/builtins.c | 3 +- gcc/builtins.h | 1 + gcc/c-family/c-attribs.c | 27 +++ gcc/common.opt | 16 ++ gcc/doc/extend.texi | 16 ++ gcc/doc/invoke.texi | 41 +++- gcc/flag-types.h | 7 + gcc/gimple-fold.c | 54 +++-- gcc/gimplify.c | 151 ++++++++++++- gcc/internal-fn.c | 99 +++++++++ gcc/internal-fn.def | 4 + gcc/testsuite/c-c++-common/auto-init-1.c | 39 ++++ gcc/testsuite/c-c++-common/auto-init-10.c | 22 ++ gcc/testsuite/c-c++-common/auto-init-11.c | 14 ++ gcc/testsuite/c-c++-common/auto-init-12.c | 14 ++ gcc/testsuite/c-c++-common/auto-init-13.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-14.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-15.c | 13 ++ gcc/testsuite/c-c++-common/auto-init-16.c | 13 ++ gcc/testsuite/c-c++-common/auto-init-2.c | 39 ++++ gcc/testsuite/c-c++-common/auto-init-3.c | 19 ++ gcc/testsuite/c-c++-common/auto-init-4.c | 19 ++ gcc/testsuite/c-c++-common/auto-init-5.c | 21 ++ gcc/testsuite/c-c++-common/auto-init-6.c | 21 ++ gcc/testsuite/c-c++-common/auto-init-7.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-8.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-9.c | 20 ++ gcc/testsuite/c-c++-common/auto-init-esra.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-padding-1.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-padding-2.c | 114 ++++++++++ gcc/testsuite/c-c++-common/auto-init-padding-3.c | 114 ++++++++++ gcc/testsuite/g++.dg/auto-init-uninit-pred-1_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-2_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-3_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-4.C | 3 + gcc/testsuite/gcc.dg/auto-init-sra-1.c | 24 +++ gcc/testsuite/gcc.dg/auto-init-sra-2.c | 24 +++ gcc/testsuite/gcc.dg/auto-init-uninit-1.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-12.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-13.c | 10 + gcc/testsuite/gcc.dg/auto-init-uninit-14.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-15.c | 26 +++ gcc/testsuite/gcc.dg/auto-init-uninit-16.c | 25 +++ gcc/testsuite/gcc.dg/auto-init-uninit-17.c | 15 ++ gcc/testsuite/gcc.dg/auto-init-uninit-18.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-19.c | 26 +++ gcc/testsuite/gcc.dg/auto-init-uninit-2.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-20.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-21.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-22.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-23.c | 27 +++ gcc/testsuite/gcc.dg/auto-init-uninit-24.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-25.c | 23 ++ gcc/testsuite/gcc.dg/auto-init-uninit-26.c | 23 ++ gcc/testsuite/gcc.dg/auto-init-uninit-3.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-34.c | 60 ++++++ gcc/testsuite/gcc.dg/auto-init-uninit-36.c | 238 +++++++++++++++++++++ gcc/testsuite/gcc.dg/auto-init-uninit-37.c | 156 ++++++++++++++ gcc/testsuite/gcc.dg/auto-init-uninit-4.c | 10 + gcc/testsuite/gcc.dg/auto-init-uninit-5.c | 6 + gcc/testsuite/gcc.dg/auto-init-uninit-6.c | 7 + gcc/testsuite/gcc.dg/auto-init-uninit-8.c | 8 + gcc/testsuite/gcc.dg/auto-init-uninit-9.c | 8 + gcc/testsuite/gcc.dg/auto-init-uninit-A.c | 7 + gcc/testsuite/gcc.dg/auto-init-uninit-B.c | 17 ++ gcc/testsuite/gcc.dg/auto-init-uninit-C.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-H.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-I.c | 3 + gcc/testsuite/gcc.target/aarch64/auto-init-1.c | 32 +++ gcc/testsuite/gcc.target/aarch64/auto-init-2.c | 35 +++ gcc/testsuite/gcc.target/aarch64/auto-init-3.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-4.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-5.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-6.c | 18 ++ gcc/testsuite/gcc.target/aarch64/auto-init-7.c | 32 +++ gcc/testsuite/gcc.target/aarch64/auto-init-8.c | 32 +++ .../gcc.target/aarch64/auto-init-padding-1.c | 17 ++ .../gcc.target/aarch64/auto-init-padding-10.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-11.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-12.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-2.c | 18 ++ .../gcc.target/aarch64/auto-init-padding-3.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-4.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-5.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-6.c | 20 ++ .../gcc.target/aarch64/auto-init-padding-7.c | 20 ++ .../gcc.target/aarch64/auto-init-padding-8.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-9.c | 21 ++ gcc/testsuite/gcc.target/i386/auto-init-1.c | 32 +++ gcc/testsuite/gcc.target/i386/auto-init-2.c | 36 ++++ gcc/testsuite/gcc.target/i386/auto-init-21.c | 14 ++ gcc/testsuite/gcc.target/i386/auto-init-22.c | 14 ++ gcc/testsuite/gcc.target/i386/auto-init-23.c | 13 ++ gcc/testsuite/gcc.target/i386/auto-init-24.c | 13 ++ gcc/testsuite/gcc.target/i386/auto-init-3.c | 17 ++ gcc/testsuite/gcc.target/i386/auto-init-4.c | 20 ++ gcc/testsuite/gcc.target/i386/auto-init-5.c | 20 ++ gcc/testsuite/gcc.target/i386/auto-init-6.c | 19 ++ gcc/testsuite/gcc.target/i386/auto-init-7.c | 33 +++ gcc/testsuite/gcc.target/i386/auto-init-8.c | 35 +++ .../gcc.target/i386/auto-init-padding-1.c | 19 ++ .../gcc.target/i386/auto-init-padding-10.c | 21 ++ .../gcc.target/i386/auto-init-padding-11.c | 26 +++ .../gcc.target/i386/auto-init-padding-12.c | 26 +++ .../gcc.target/i386/auto-init-padding-2.c | 19 ++ .../gcc.target/i386/auto-init-padding-3.c | 30 +++ .../gcc.target/i386/auto-init-padding-4.c | 28 +++ .../gcc.target/i386/auto-init-padding-5.c | 21 ++ .../gcc.target/i386/auto-init-padding-6.c | 22 ++ .../gcc.target/i386/auto-init-padding-7.c | 22 ++ .../gcc.target/i386/auto-init-padding-8.c | 22 ++ .../gcc.target/i386/auto-init-padding-9.c | 22 ++ gcc/tree-cfg.c | 47 +++- gcc/tree-sra.c | 124 ++++++++++- gcc/tree-ssa-structalias.c | 3 + gcc/tree-ssa-uninit.c | 48 +++++ gcc/tree-ssa.c | 40 ++++ gcc/tree.c | 13 ++ 118 files changed, 3131 insertions(+), 44 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 99548627761..3e57eb03af0 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -142,7 +142,6 @@ static rtx expand_builtin_strcpy (tree, rtx); static rtx expand_builtin_strcpy_args (tree, tree, tree, rtx); static rtx expand_builtin_stpcpy (tree, rtx, machine_mode); static rtx expand_builtin_strncpy (tree, rtx); -static rtx expand_builtin_memset (tree, rtx, machine_mode); static rtx expand_builtin_memset_args (tree, tree, tree, rtx, machine_mode, tree); static rtx expand_builtin_bzero (tree); static rtx expand_builtin_strlen (tree, rtx, machine_mode); @@ -3872,7 +3871,7 @@ builtin_memset_gen_str (void *data, void *prev, try to get the result in TARGET, if convenient (and in mode MODE if that's convenient). */ -static rtx +rtx expand_builtin_memset (tree exp, rtx target, machine_mode mode) { if (!validate_arglist (exp, diff --git a/gcc/builtins.h b/gcc/builtins.h index 16b47ac1a7b..d330b78e591 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -114,6 +114,7 @@ extern rtx builtin_strncpy_read_str (void *, void *, HOST_WIDE_INT, fixed_size_mode); extern rtx builtin_memset_read_str (void *, void *, HOST_WIDE_INT, fixed_size_mode); +extern rtx expand_builtin_memset (tree, rtx, machine_mode); extern rtx expand_builtin_saveregs (void); extern tree std_build_builtin_va_list (void); extern tree std_fn_abi_va_list (tree); diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index d14e9c441b3..007b928c54b 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -83,6 +83,7 @@ static tree handle_artificial_attribute (tree *, tree, tree, int, bool *); static tree handle_flatten_attribute (tree *, tree, tree, int, bool *); static tree handle_error_attribute (tree *, tree, tree, int, bool *); static tree handle_used_attribute (tree *, tree, tree, int, bool *); +static tree handle_uninitialized_attribute (tree *, tree, tree, int, bool *); static tree handle_externally_visible_attribute (tree *, tree, tree, int, bool *); static tree handle_no_reorder_attribute (tree *, tree, tree, int, @@ -333,6 +334,8 @@ const struct attribute_spec c_common_attribute_table[] = handle_used_attribute, NULL }, { "unused", 0, 0, false, false, false, false, handle_unused_attribute, NULL }, + { "uninitialized", 0, 0, true, false, false, false, + handle_uninitialized_attribute, NULL }, { "retain", 0, 0, true, false, false, false, handle_retain_attribute, NULL }, { "externally_visible", 0, 0, true, false, false, false, @@ -1617,6 +1620,30 @@ handle_retain_attribute (tree *pnode, tree name, tree ARG_UNUSED (args), return NULL_TREE; } +/* Handle an "uninitialized" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_uninitialized_attribute (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + tree decl = *node; + if (!VAR_P (decl)) + { + warning (OPT_Wattributes, "%qE attribute ignored because %qD " + "is not a variable", name, decl); + *no_add_attrs = true; + } + else if (TREE_STATIC (decl) || DECL_EXTERNAL (decl)) + { + warning (OPT_Wattributes, "%qE attribute ignored because %qD " + "is not a local variable", name, decl); + *no_add_attrs = true; + } + + return NULL_TREE; +} + /* Handle a "externally_visible" attribute; arguments as in struct attribute_spec.handler. */ diff --git a/gcc/common.opt b/gcc/common.opt index f103a7de004..b921f5e3b25 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -3081,6 +3081,22 @@ ftree-scev-cprop Common Var(flag_tree_scev_cprop) Init(1) Optimization Enable copy propagation of scalar-evolution information. +ftrivial-auto-var-init= +Common Joined RejectNegative Enum(auto_init_type) Var(flag_auto_var_init) Init(AUTO_INIT_UNINITIALIZED) Optimization +-ftrivial-auto-var-init=[uninitialized|pattern|zero] Add initializations to automatic variables. + +Enum +Name(auto_init_type) Type(enum auto_init_type) UnknownError(unrecognized automatic variable initialization type %qs) + +EnumValue +Enum(auto_init_type) String(uninitialized) Value(AUTO_INIT_UNINITIALIZED) + +EnumValue +Enum(auto_init_type) String(pattern) Value(AUTO_INIT_PATTERN) + +EnumValue +Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO) + ; -fverbose-asm causes extra commentary information to be produced in ; the generated assembly code (to make it more readable). This option ; is generally only of use to those who actually need to read the diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 52bc4e5b76e..8b324a097a4 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7610,6 +7610,22 @@ will be placed in new, unique sections. This additional functionality requires Binutils version 2.36 or later. +@item uninitialized +@cindex @code{uninitialized} variable attribute +This attribute, attached to a variable with automatic storage, means that +the variable should not be automatically initialized by the compiler when +the option @code{-ftrivial-auto-var-init} presents. + +With the option @code{-ftrivial-auto-var-init}, all the automatic variables +that do not have explicit initializers will be initialized by the compiler. +These additional compiler initializations might incur run-time overhead, +sometimes dramatically. This attribute can be used to mark some variables +to be excluded from such automatical initialization in order to reduce runtime +overhead. + +This attribute has no effect when the option @code{-ftrivial-auto-var-init} +does not present. + @item vector_size (@var{bytes}) @cindex @code{vector_size} variable attribute This attribute specifies the vector size for the type of the declared diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d4b3a66ee4f..b08a5eb4d73 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -573,9 +573,9 @@ Objective-C and Objective-C++ Dialects}. -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra @gol -ftree-switch-conversion -ftree-tail-merge @gol --ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol --funit-at-a-time -funroll-all-loops -funroll-loops @gol --funsafe-math-optimizations -funswitch-loops @gol +-ftree-ter -ftree-vectorize -ftree-vrp -ftrivial-auto-var-init @gol +-funconstrained-commons -funit-at-a-time -funroll-all-loops @gol +-funroll-loops -funsafe-math-optimizations -funswitch-loops @gol -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol -fweb -fwhole-program -fwpa -fuse-linker-plugin -fzero-call-used-regs @gol --param @var{name}=@var{value} @@ -11843,6 +11843,41 @@ Perform basic block vectorization on trees. This flag is enabled by default at @option{-O3} and by @option{-ftree-vectorize}, @option{-fprofile-use}, and @option{-fauto-profile}. +@item -ftrivial-auto-var-init=@var{choice} +@opindex ftrivial-auto-var-init +Initialize automatic variables with either a pattern or with zeroes to increase +the security and predictability of a program by preventing uninitialized memory +disclosure and use. +GCC still considers an automatic variable that doesn't have an explicit +initializer as uninitialized, -Wuninitialized will still report warning messages +on such automatic variables. +With this option, GCC will also initialize any padding of automatic variables +that have structure or union types to zeroes. + +The three values of @var{choice} are: + +@itemize @bullet +@item +@samp{uninitialized} doesn't initialize any automatic variables. +This is C and C++'s default. + +@item +@samp{pattern} Initialize automatic variables with values which will likely +transform logic bugs into crashes down the line, are easily recognized in a +crash dump and without being values that programmers can rely on for useful +program semantics. +The current value is byte-repeatable pattern with byte "0xFE". +The values used for pattern initialization might be changed in the future. + +@item +@samp{zero} Initialize automatic variables with zeroes. +@end itemize + +The default is @samp{uninitialized}. + +You can control this behavior for a specific variable by using the variable +attribute @code{uninitialized} (@pxref{Variable Attributes}). + @item -fvect-cost-model=@var{model} @opindex fvect-cost-model Alter the cost model used for vectorization. The @var{model} argument diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 45a2338d5f6..5bd1f771c8b 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -281,6 +281,13 @@ enum vect_cost_model { VECT_COST_MODEL_DEFAULT = 1 }; +/* Automatic variable initialization type. */ +enum auto_init_type { + AUTO_INIT_UNINITIALIZED = 0, + AUTO_INIT_PATTERN = 1, + AUTO_INIT_ZERO = 2 +}; + /* Different instrumentation modes. */ enum sanitize_code { /* AddressSanitizer. */ diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 3f2c176cff6..dd0e6b5daff 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -4518,12 +4518,14 @@ clear_padding_add_padding (clear_padding_struct *buf, } } -static void clear_padding_type (clear_padding_struct *, tree, HOST_WIDE_INT); +static void clear_padding_type (clear_padding_struct *, tree, + HOST_WIDE_INT, bool); /* Clear padding bits of union type TYPE. */ static void -clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) +clear_padding_union (clear_padding_struct *buf, tree type, + HOST_WIDE_INT sz, bool for_auto_init) { clear_padding_struct *union_buf; HOST_WIDE_INT start_off = 0, next_off = 0; @@ -4568,7 +4570,7 @@ clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) continue; gcc_assert (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE && !COMPLETE_TYPE_P (TREE_TYPE (field))); - if (!buf->clear_in_mask) + if (!buf->clear_in_mask && !for_auto_init) error_at (buf->loc, "flexible array member %qD does not have " "well defined padding bits for %qs", field, "__builtin_clear_padding"); @@ -4579,7 +4581,7 @@ clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) union_buf->off = start_off; union_buf->size = start_size; memset (union_buf->buf, ~0, start_size); - clear_padding_type (union_buf, TREE_TYPE (field), fldsz); + clear_padding_type (union_buf, TREE_TYPE (field), fldsz, for_auto_init); clear_padding_add_padding (union_buf, sz - fldsz); clear_padding_flush (union_buf, true); } @@ -4649,7 +4651,8 @@ clear_padding_type_may_have_padding_p (tree type) __builtin_clear_padding (buf.base); */ static void -clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) +clear_padding_emit_loop (clear_padding_struct *buf, tree type, + tree end, bool for_auto_init) { tree l1 = create_artificial_label (buf->loc); tree l2 = create_artificial_label (buf->loc); @@ -4660,7 +4663,7 @@ clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) g = gimple_build_label (l1); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); - clear_padding_type (buf, type, buf->sz); + clear_padding_type (buf, type, buf->sz, for_auto_init); clear_padding_flush (buf, true); g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base, size_int (buf->sz)); @@ -4678,10 +4681,16 @@ clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) } /* Clear padding bits for TYPE. Called recursively from - gimple_fold_builtin_clear_padding. */ + gimple_fold_builtin_clear_padding. If FOR_AUTO_INIT is true, + the __builtin_clear_padding is not called by the end user, + instead, it's inserted by the compiler to initialize the + paddings of automatic variable. Therefore, we should not + emit the error messages for flexible array members to confuse + the end user. */ static void -clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) +clear_padding_type (clear_padding_struct *buf, tree type, + HOST_WIDE_INT sz, bool for_auto_init) { switch (TREE_CODE (type)) { @@ -4765,7 +4774,7 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) continue; gcc_assert (TREE_CODE (ftype) == ARRAY_TYPE && !COMPLETE_TYPE_P (ftype)); - if (!buf->clear_in_mask) + if (!buf->clear_in_mask && !for_auto_init) error_at (buf->loc, "flexible array member %qD does not " "have well defined padding bits for %qs", field, "__builtin_clear_padding"); @@ -4781,7 +4790,8 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) gcc_assert (pos >= 0 && fldsz >= 0 && pos >= cur_pos); clear_padding_add_padding (buf, pos - cur_pos); cur_pos = pos; - clear_padding_type (buf, TREE_TYPE (field), fldsz); + clear_padding_type (buf, TREE_TYPE (field), + fldsz, for_auto_init); cur_pos += fldsz; } } @@ -4821,7 +4831,7 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) buf->align = TYPE_ALIGN (elttype); buf->off = 0; buf->size = 0; - clear_padding_emit_loop (buf, elttype, end); + clear_padding_emit_loop (buf, elttype, end, for_auto_init); buf->base = base; buf->sz = prev_sz; buf->align = prev_align; @@ -4831,10 +4841,10 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) break; } for (HOST_WIDE_INT i = 0; i < nelts; i++) - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case UNION_TYPE: - clear_padding_union (buf, type, sz); + clear_padding_union (buf, type, sz, for_auto_init); break; case REAL_TYPE: gcc_assert ((size_t) sz <= clear_padding_unit); @@ -4858,14 +4868,14 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) break; case COMPLEX_TYPE: fldsz = int_size_in_bytes (TREE_TYPE (type)); - clear_padding_type (buf, TREE_TYPE (type), fldsz); - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case VECTOR_TYPE: nelts = TYPE_VECTOR_SUBPARTS (type).to_constant (); fldsz = int_size_in_bytes (TREE_TYPE (type)); for (HOST_WIDE_INT i = 0; i < nelts; i++) - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case NULLPTR_TYPE: gcc_assert ((size_t) sz <= clear_padding_unit); @@ -4901,7 +4911,7 @@ clear_type_padding_in_mask (tree type, unsigned char *mask) buf.sz = int_size_in_bytes (type); buf.size = 0; buf.union_ptr = mask; - clear_padding_type (&buf, type, buf.sz); + clear_padding_type (&buf, type, buf.sz, false); clear_padding_flush (&buf, true); } @@ -4911,9 +4921,13 @@ static bool gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) { gimple *stmt = gsi_stmt (*gsi); - gcc_assert (gimple_call_num_args (stmt) == 2); + gcc_assert (gimple_call_num_args (stmt) == 3); tree ptr = gimple_call_arg (stmt, 0); tree typearg = gimple_call_arg (stmt, 1); + /* the 3rd argument of __builtin_clear_padding is to distinguish whether + this call is made by the user or by the compiler for automatic variable + initialization. */ + bool for_auto_init = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); tree type = TREE_TYPE (TREE_TYPE (typearg)); location_t loc = gimple_location (stmt); clear_padding_struct buf; @@ -4970,7 +4984,7 @@ gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) buf.sz = eltsz; buf.align = TYPE_ALIGN (elttype); buf.alias_type = build_pointer_type (elttype); - clear_padding_emit_loop (&buf, elttype, end); + clear_padding_emit_loop (&buf, elttype, end, for_auto_init); } } else @@ -4983,7 +4997,7 @@ gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) gsi_insert_before (gsi, g, GSI_SAME_STMT); } buf.alias_type = build_pointer_type (type); - clear_padding_type (&buf, type, buf.sz); + clear_padding_type (&buf, type, buf.sz, for_auto_init); clear_padding_flush (&buf, true); } diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 99d1c7fcce4..3314f76cf3f 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1743,6 +1743,94 @@ force_labels_r (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) return NULL_TREE; } +/* Generate an initialization to automatic variable DECL based on INIT_TYPE. + Build a call to internal const function DEFERRED_INIT: + 1st argument: SIZE of the DECL; + 2nd argument: INIT_TYPE; + 3rd argument: IS_VLA, 0 NO, 1 YES; + + as LHS = DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA) + if IS_VLA is false, the LHS is the DECL itself, + if IS_VLA is true, the LHS is a MEM_REF whose address is the pointer + to this DECL. */ +static void +gimple_add_init_for_auto_var (tree decl, + enum auto_init_type init_type, + bool is_vla, + gimple_seq *seq_p) +{ + gcc_assert (auto_var_p (decl)); + gcc_assert (init_type > AUTO_INIT_UNINITIALIZED); + location_t loc = EXPR_LOCATION (decl); + tree decl_size = TYPE_SIZE_UNIT (TREE_TYPE (decl)); + + tree init_type_node + = build_int_cst (integer_type_node, (int) init_type); + tree is_vla_node + = build_int_cst (integer_type_node, (int) is_vla); + + tree call = build_call_expr_internal_loc (loc, IFN_DEFERRED_INIT, + TREE_TYPE (decl), 3, + decl_size, init_type_node, + is_vla_node); + + gimplify_assign (decl, call, seq_p); +} + +/* Generate padding initialization for automatic vairable DECL. + C guarantees that brace-init with fewer initializers than members + aggregate will initialize the rest of the aggregate as-if it were + static initialization. In turn static initialization guarantees + that padding is initialized to zero. So, we always initialize paddings + to zeroes regardless INIT_TYPE. + To do the padding initialization, we insert a call to + __BUILTIN_CLEAR_PADDING (&decl, 0, for_auto_init = true). + Note, we add an additional dummy argument for __BUILTIN_CLEAR_PADDING, + 'for_auto_init' to distinguish whether this call is for automatic + variable initialization or not. + */ +static void +gimple_add_padding_init_for_auto_var (tree decl, bool is_vla, + gimple_seq *seq_p) +{ + tree addr_of_decl = NULL_TREE; + bool for_auto_init = true; + tree fn = builtin_decl_explicit (BUILT_IN_CLEAR_PADDING); + + if (is_vla) + { + /* The temporary address variable for this vla should be + created in gimplify_vla_decl. */ + gcc_assert (DECL_HAS_VALUE_EXPR_P (decl)); + gcc_assert (TREE_CODE (DECL_VALUE_EXPR (decl)) == INDIRECT_REF); + addr_of_decl = TREE_OPERAND (DECL_VALUE_EXPR (decl), 0); + } + else + { + mark_addressable (decl); + addr_of_decl = build_fold_addr_expr (decl); + } + + gimple *call = gimple_build_call (fn, + 3, addr_of_decl, + build_zero_cst (TREE_TYPE (addr_of_decl)), + build_int_cst (integer_type_node, + (int) for_auto_init)); + gimplify_seq_add_stmt (seq_p, call); +} + +/* Return true if the DECL need to be automaticly initialized by the + compiler. */ +static bool +is_var_need_auto_init (tree decl) +{ + if (auto_var_p (decl) + && (flag_auto_var_init > AUTO_INIT_UNINITIALIZED) + && (!lookup_attribute ("uninitialized", DECL_ATTRIBUTES (decl)))) + return true; + return false; +} + /* Gimplify a DECL_EXPR node *STMT_P by making any necessary allocation and initialization explicit. */ @@ -1840,6 +1928,26 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) as they may contain a label address. */ walk_tree (&init, force_labels_r, NULL, NULL); } + /* When there is no explicit initializer, if the user requested, + We should insert an artifical initializer for this automatic + variable. */ + else if (is_var_need_auto_init (decl)) + { + gimple_add_init_for_auto_var (decl, + flag_auto_var_init, + is_vla, + seq_p); + /* The expanding of a call to the above .DEFERRED_INIT will apply + block initialization to the whole space covered by this variable. + As a result, all the paddings will be initialized to zeroes + for zero initialization and 0xFE byte-repeatable patterns for + pattern initialization. + In order to make the paddings as zeroes for pattern init, We + should add a call to __builtin_clear_padding to clear the + paddings to zero in compatiple with CLANG. */ + if (flag_auto_var_init == AUTO_INIT_PATTERN) + gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p); + } } return GS_ALL_DONE; @@ -3411,11 +3519,15 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) { /* Remember the original type of the argument in an internal dummy second argument, as in GIMPLE pointer conversions are - useless. */ + useless. also mark this call as not for automatic initialization + in the internal dummy third argument. */ p = CALL_EXPR_ARG (*expr_p, 0); + bool for_auto_init = false; *expr_p - = build_call_expr_loc (EXPR_LOCATION (*expr_p), fndecl, 2, p, - build_zero_cst (TREE_TYPE (p))); + = build_call_expr_loc (EXPR_LOCATION (*expr_p), fndecl, 3, p, + build_zero_cst (TREE_TYPE (p)), + build_int_cst (integer_type_node, + (int) for_auto_init)); return GS_OK; } break; @@ -4872,6 +4984,9 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, tree object, ctor, type; enum gimplify_status ret; vec<constructor_elt, va_gc> *elts; + bool cleared = false; + bool is_empty_ctor = false; + bool is_init_expr = (TREE_CODE (*expr_p) == INIT_EXPR); gcc_assert (TREE_CODE (TREE_OPERAND (*expr_p, 1)) == CONSTRUCTOR); @@ -4914,7 +5029,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, struct gimplify_init_ctor_preeval_data preeval_data; HOST_WIDE_INT num_ctor_elements, num_nonzero_elements; HOST_WIDE_INT num_unique_nonzero_elements; - bool cleared, complete_p, valid_const_initializer; + bool complete_p, valid_const_initializer; /* Aggregate types must lower constructors to initialization of individual elements. The exception is that a CONSTRUCTOR node @@ -4923,6 +5038,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, { if (notify_temp_creation) return GS_OK; + is_empty_ctor = true; break; } @@ -5248,13 +5364,28 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, if (want_value) { *expr_p = object; - return GS_OK; + ret = GS_OK; } else { *expr_p = NULL; - return GS_ALL_DONE; - } + ret = GS_ALL_DONE; + } + + /* If the user requests to initialize automatic variables, we + should initialize paddings inside the variable. Add a call to + __BUILTIN_CLEAR_PADDING (&object, 0, for_auto_init = true) to + initialize paddings of object always to zero regardless of + INIT_TYPE. Note, we will not insert this call if the aggregate + variable has be completely cleared already or it's initialized + with an empty constructor. */ + if (is_init_expr + && ((AGGREGATE_TYPE_P (type) && !cleared && !is_empty_ctor) + || !AGGREGATE_TYPE_P (type)) + && is_var_need_auto_init (object)) + gimple_add_padding_init_for_auto_var (object, false, pre_p); + + return ret; } /* Given a pointer value OP0, return a simplified version of an @@ -5395,10 +5526,12 @@ gimplify_modify_expr_rhs (tree *expr_p, tree *from_p, tree *to_p, crack at this before we break it down. */ if (ret != GS_UNHANDLED) break; + /* If we're initializing from a CONSTRUCTOR, break this into individual MODIFY_EXPRs. */ - return gimplify_init_constructor (expr_p, pre_p, post_p, want_value, - false); + ret = gimplify_init_constructor (expr_p, pre_p, post_p, want_value, + false); + return ret; case COND_EXPR: /* If we're assigning to a non-register type, push the assignment diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 1360a00f0b9..ada2a820ff1 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -53,6 +53,9 @@ along with GCC; see the file COPYING3. If not see #include "rtl-iter.h" #include "gimple-range.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* The names of each internal function, indexed by function number. */ const char *const internal_fn_name_array[] = { #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) #CODE, @@ -2977,6 +2980,102 @@ expand_UNIQUE (internal_fn, gcall *stmt) emit_insn (pattern); } +/* Expand the IFN_DEFERRED_INIT function: + LHS = DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA); + + if IS_VLA is false, the LHS is the DECL itself, + if IS_VLA is true, the LHS is a MEM_REF whose address is the pointer + to this DECL. + + Initialize the LHS with zero/pattern according to its second argument + INIT_TYPE: + if INIT_TYPE is AUTO_INIT_ZERO, use zeroes to initialize; + if INIT_TYPE is AUTO_INIT_PATTERN, use 0xFE byte-repeatable pattern + to initialize; + The LHS variable is initialized including paddings. + The reasons to choose 0xFE for pattern initialization are: + 1. It is a non-canonical virtual address on x86_64, and at the + high end of the i386 kernel address space. + 2. It is a very large float value (-1.694739530317379e+38). + 3. It is also an unusual number for integers. */ +#define INIT_PATTERN_VALUE 0xFE +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree var_size = gimple_call_arg (stmt, 0); + enum auto_init_type init_type + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); + bool reg_lhs = true; + + tree var_type = TREE_TYPE (lhs); + gcc_assert (init_type > AUTO_INIT_UNINITIALIZED); + + if (DECL_P (lhs)) + { + rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + reg_lhs = !MEM_P (tem); + } + else if (TREE_CODE (lhs) == SSA_NAME) + reg_lhs = true; + else + { + gcc_assert (is_vla); + reg_lhs = false; + } + + + if (!reg_lhs) + { + /* If this is a VLA or the variable is not in register, + expand to a memset to initialize it. */ + + mark_addressable (lhs); + tree var_addr = build_fold_addr_expr (lhs); + + tree value = (init_type == AUTO_INIT_PATTERN) ? + build_int_cst (integer_type_node, + INIT_PATTERN_VALUE) : + integer_zero_node; + tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET), + 3, var_addr, value, var_size); + /* Expand this memset call. */ + expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type)); + } + else + { + /* If this variable is in a register, use expand_assignment might + generate better code. */ + tree init = build_zero_cst (var_type); + unsigned HOST_WIDE_INT total_bytes + = tree_to_uhwi (TYPE_SIZE_UNIT (var_type)); + + if (init_type == AUTO_INIT_PATTERN) + { + tree alt_type = NULL_TREE; + if (!can_native_interpret_type_p (var_type)) + { + alt_type </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 22 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f Author: Simon Pilgrim <llvm-dev(a)redking.me.uk> Date: Mon Aug 23 21:06:06 2021 +0100 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450 </cut> Results regressed to (for first_bad == 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f/results_id: 1 # 447.dealII,[.] _ZNK13LaplaceSolver6SolverILi3EE15assemble_mat regressed by 120 # 464.h264ref,h264ref_base.default regressed by 104 # 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 135 from (for last_good == 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4963 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4956 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f cd investigate-llvm-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f Author: Simon Pilgrim <llvm-dev(a)redking.me.uk> Date: Mon Aug 23 21:06:06 2021 +0100 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450 --- .../InstCombine/InstructionCombining.cpp | 21 ----- .../InstCombine/gep-combine-loop-invariant.ll | 12 +-- llvm/test/Transforms/InstCombine/gep-custom-dl.ll | 4 +- llvm/test/Transforms/InstCombine/getelementptr.ll | 4 +- llvm/test/Transforms/InstCombine/select-gep.ll | 12 +-- llvm/test/Transforms/InstCombine/shift.ll | 4 +- .../LoopVectorize/AArch64/sve-vector-reverse.ll | 100 ++++++++++----------- .../LoopVectorize/AArch64/vector-reverse-mask4.ll | 54 +++++------ .../Transforms/LoopVectorize/ARM/mve-reductions.ll | 26 +++--- .../X86/x86-interleaved-accesses-masked-group.ll | 60 +++++++------ .../x86-interleaved-store-accesses-with-gaps.ll | 58 ++++++------ .../LoopVectorize/consecutive-ptr-uniforms.ll | 4 +- .../LoopVectorize/interleaved-accesses.ll | 62 +++++++------ 13 files changed, 210 insertions(+), 211 deletions(-) diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index 1026b9da44e9..48645b484fd2 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -2131,27 +2131,6 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) { } } } - - // Guard the gep(gep) fold so we don't create an add inside a loop - // when there wasn't an equivalent instruction there before. - bool DifferentLoops = false; - if (LI) - if (auto *GEPLoop = LI->getLoopFor(GEP.getParent())) - if (auto *SrcOpI = dyn_cast<Instruction>(Src)) - if (LI->getLoopFor(SrcOpI->getParent()) != GEPLoop) - DifferentLoops = true; - - // Fold (gep(gep(Ptr,Idx0),Idx1) -> gep(Ptr,add(Idx0,Idx1)) - if (!DifferentLoops && GO1->getType() == SO1->getType()) { - bool NewInBounds = GEP.isInBounds() && Src->isInBounds(); - auto *NewIdx = - Builder.CreateAdd(GO1, SO1, GEP.getName() + ".idx", - /*HasNUW*/ false, /*HasNSW*/ NewInBounds); - auto *NewGEP = GetElementPtrInst::Create( - GEPEltType, Src->getPointerOperand(), {NewIdx}); - NewGEP->setIsInBounds(NewInBounds); - return NewGEP; - } } // Note that if our source is a gep chain itself then we wait for that diff --git a/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll b/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll index dfa664fde208..f9aac12cfb1f 100644 --- a/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll +++ b/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll @@ -8,10 +8,10 @@ define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i3 ; CHECK-LABEL: @foo( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[IDX_EXT2:%.*]] = zext i32 [[CUR_MATCH:%.*]] to i64 +; CHECK-NEXT: [[ADD_PTR4:%.*]] = getelementptr inbounds i8, i8* [[WIN:%.*]], i64 [[IDX_EXT2]] ; CHECK-NEXT: [[IDX_EXT1:%.*]] = zext i32 [[BEST_LEN:%.*]] to i64 -; CHECK-NEXT: [[ADD_PTR25_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT1]], [[IDX_EXT2]] -; CHECK-NEXT: [[ADD_PTR36_IDX:%.*]] = add nsw i64 [[ADD_PTR25_IDX]], -1 -; CHECK-NEXT: [[ADD_PTR36:%.*]] = getelementptr inbounds i8, i8* [[WIN:%.*]], i64 [[ADD_PTR36_IDX]] +; CHECK-NEXT: [[ADD_PTR25:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR4]], i64 [[IDX_EXT1]] +; CHECK-NEXT: [[ADD_PTR36:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR25]], i64 -1 ; CHECK-NEXT: [[TMP0:%.*]] = bitcast i8* [[ADD_PTR36]] to i32* ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[TMP0]], align 4 ; CHECK-NEXT: [[CMP7:%.*]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.*]] @@ -20,9 +20,9 @@ define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i3 ; CHECK-NEXT: br label [[IF_THEN:%.*]] ; CHECK: do.body: ; CHECK-NEXT: [[IDX_EXT:%.*]] = zext i32 [[TMP4:%.*]] to i64 -; CHECK-NEXT: [[ADD_PTR2_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT]], [[IDX_EXT1]] -; CHECK-NEXT: [[ADD_PTR3_IDX:%.*]] = add nsw i64 [[ADD_PTR2_IDX]], -1 -; CHECK-NEXT: [[ADD_PTR3:%.*]] = getelementptr inbounds i8, i8* [[WIN]], i64 [[ADD_PTR3_IDX]] +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8* [[WIN]], i64 [[IDX_EXT1]] +; CHECK-NEXT: [[ADD_PTR2:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR]], i64 -1 +; CHECK-NEXT: [[ADD_PTR3:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR2]], i64 [[IDX_EXT]] ; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8* [[ADD_PTR3]] to i32* ; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP2]], align 4 ; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]] diff --git a/llvm/test/Transforms/InstCombine/gep-custom-dl.ll b/llvm/test/Transforms/InstCombine/gep-custom-dl.ll index 0980451d8ec7..3de70f3c151c 100644 --- a/llvm/test/Transforms/InstCombine/gep-custom-dl.ll +++ b/llvm/test/Transforms/InstCombine/gep-custom-dl.ll @@ -75,8 +75,8 @@ define void @test_evaluate_gep_as_ptrs_array(i8 addrspace(2)* %B) { define i32* @test4(i32* %I, i32 %C, i32 %D) { ; CHECK-LABEL: @test4( -; CHECK-NEXT: [[B_IDX:%.*]] = add i32 [[D:%.*]], [[C:%.*]] -; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[I:%.*]], i32 [[B_IDX]] +; CHECK-NEXT: [[A:%.*]] = getelementptr i32, i32* [[I:%.*]], i32 [[C:%.*]] +; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[A]], i32 [[D:%.*]] ; CHECK-NEXT: ret i32* [[B]] ; %A = getelementptr i32, i32* %I, i32 %C diff --git a/llvm/test/Transforms/InstCombine/getelementptr.ll b/llvm/test/Transforms/InstCombine/getelementptr.ll index 688303d308c1..f2a336767fda 100644 --- a/llvm/test/Transforms/InstCombine/getelementptr.ll +++ b/llvm/test/Transforms/InstCombine/getelementptr.ll @@ -115,8 +115,8 @@ define void @test_evaluate_gep_as_ptrs_array(i8 addrspace(2)* %B) { define i32* @test7(i32* %I, i64 %C, i64 %D) { ; CHECK-LABEL: @test7( -; CHECK-NEXT: [[B_IDX:%.*]] = add i64 [[D:%.*]], [[C:%.*]] -; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[I:%.*]], i64 [[B_IDX]] +; CHECK-NEXT: [[A:%.*]] = getelementptr i32, i32* [[I:%.*]], i64 [[C:%.*]] +; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[A]], i64 [[D:%.*]] ; CHECK-NEXT: ret i32* [[B]] ; %A = getelementptr i32, i32* %I, i64 %C diff --git a/llvm/test/Transforms/InstCombine/select-gep.ll b/llvm/test/Transforms/InstCombine/select-gep.ll index 2e112fe93a4c..519f0a94a136 100644 --- a/llvm/test/Transforms/InstCombine/select-gep.ll +++ b/llvm/test/Transforms/InstCombine/select-gep.ll @@ -102,10 +102,10 @@ define i32* @test2b(i32* %p, i64 %x, i64 %y) { ; PR51069 define i32* @test2c(i32* %p, i64 %x, i64 %y) { ; CHECK-LABEL: @test2c( -; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X:%.*]], [[Y:%.*]] +; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i32, i32* [[P:%.*]], i64 [[X:%.*]] +; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X]], [[Y:%.*]] ; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 0, i64 6 -; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]] -; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[P:%.*]], i64 [[SEL_IDX1]] +; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[GEP1]], i64 [[SEL_IDX]] ; CHECK-NEXT: ret i32* [[SEL]] ; %gep1 = getelementptr inbounds i32, i32* %p, i64 %x @@ -118,10 +118,10 @@ define i32* @test2c(i32* %p, i64 %x, i64 %y) { ; PR51069 define i32* @test2d(i32* %p, i64 %x, i64 %y) { ; CHECK-LABEL: @test2d( -; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X:%.*]], [[Y:%.*]] +; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i32, i32* [[P:%.*]], i64 [[X:%.*]] +; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X]], [[Y:%.*]] ; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 6, i64 0 -; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]] -; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[P:%.*]], i64 [[SEL_IDX1]] +; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[GEP1]], i64 [[SEL_IDX]] ; CHECK-NEXT: ret i32* [[SEL]] ; %gep1 = getelementptr inbounds i32, i32* %p, i64 %x diff --git a/llvm/test/Transforms/InstCombine/shift.ll b/llvm/test/Transforms/InstCombine/shift.ll index f87de574bc99..2c5c4a7dbe1c 100644 --- a/llvm/test/Transforms/InstCombine/shift.ll +++ b/llvm/test/Transforms/InstCombine/shift.ll @@ -1774,10 +1774,10 @@ define void @ashr_out_of_range(i177* %A) { define void @ashr_out_of_range_1(i177* %A) { ; CHECK-LABEL: @ashr_out_of_range_1( ; CHECK-NEXT: [[L:%.*]] = load i177, i177* [[A:%.*]], align 4 +; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, i177* [[A]], i64 -1 ; CHECK-NEXT: [[B24_LOBIT:%.*]] = ashr i177 [[L]], 175 ; CHECK-NEXT: [[TMP1:%.*]] = trunc i177 [[B24_LOBIT]] to i64 -; CHECK-NEXT: [[G62_IDX:%.*]] = add i64 [[TMP1]], -1 -; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, i177* [[A]], i64 [[G62_IDX]] +; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, i177* [[G11]], i64 [[TMP1]] ; CHECK-NEXT: store i177 0, i177* [[G62]], align 4 ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll index d406c6de1571..5cd5af5dd9e6 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll @@ -34,30 +34,30 @@ define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{ ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1 ; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]] -; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8 -; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1 -; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[DOTIDX]] -; CHECK-NEXT: [[TMP10:%.*]] = bitcast double* [[TMP9]] to <vscale x 8 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x double>, <vscale x 8 x double>* [[TMP10]], align 8, !alias.scope !0 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8 +; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1 +; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[TMP6]], i64 [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = bitcast double* [[TMP10]] to <vscale x 8 x double>* +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x double>, <vscale x 8 x double>* [[TMP11]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[WIDE_LOAD]]) -; CHECK-NEXT: [[TMP11:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer) -; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP11]]) -; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8 -; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1 -; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]] -; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[DOTIDX8]] -; CHECK-NEXT: [[TMP16:%.*]] = bitcast double* [[TMP15]] to <vscale x 8 x double>* -; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP16]], align 8, !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3 -; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]] -; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] +; CHECK-NEXT: [[TMP12:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer) +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP12]]) +; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8 +; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1 +; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds double, double* [[TMP13]], i64 [[TMP16]] +; CHECK-NEXT: [[TMP18:%.*]] = bitcast double* [[TMP17]] to <vscale x 8 x double>* +; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP18]], align 8, !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64() +; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3 +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]] +; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -72,8 +72,8 @@ define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{ ; CHECK-NEXT: [[I_08_IN:%.*]] = phi i64 [ [[I_08:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[I_08]] -; CHECK-NEXT: [[TMP20:%.*]] = load double, double* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00 +; CHECK-NEXT: [[TMP22:%.*]] = load double, double* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00 ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[I_08]] ; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8 ; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_08_IN]], 1 @@ -126,30 +126,30 @@ define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 { ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1 ; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]] -; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8 -; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1 -; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[DOTIDX]] -; CHECK-NEXT: [[TMP10:%.*]] = bitcast i64* [[TMP9]] to <vscale x 8 x i64>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i64>, <vscale x 8 x i64>* [[TMP10]], align 8, !alias.scope !9 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8 +; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1 +; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, i64* [[TMP6]], i64 [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = bitcast i64* [[TMP10]] to <vscale x 8 x i64>* +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i64>, <vscale x 8 x i64>* [[TMP11]], align 8, !alias.scope !9 ; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[WIDE_LOAD]]) -; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer) -; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP11]]) -; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8 -; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1 -; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]] -; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[DOTIDX8]] -; CHECK-NEXT: [[TMP16:%.*]] = bitcast i64* [[TMP15]] to <vscale x 8 x i64>* -; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP16]], align 8, !alias.scope !12, !noalias !9 -; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3 -; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]] -; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] +; CHECK-NEXT: [[TMP12:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer) +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP12]]) +; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8 +; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1 +; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, i64* [[TMP13]], i64 [[TMP16]] +; CHECK-NEXT: [[TMP18:%.*]] = bitcast i64* [[TMP17]] to <vscale x 8 x i64>* +; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP18]], align 8, !alias.scope !12, !noalias !9 +; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64() +; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3 +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]] +; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -164,8 +164,8 @@ define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 { ; CHECK-NEXT: [[I_09_IN:%.*]] = phi i64 [ [[I_09:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_09]] = add nsw i64 [[I_09_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[I_09]] -; CHECK-NEXT: [[TMP20:%.*]] = load i64, i64* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP20]], 1 +; CHECK-NEXT: [[TMP22:%.*]] = load i64, i64* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP22]], 1 ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[I_09]] ; CHECK-NEXT: store i64 [[ADD]], i64* [[ARRAYIDX2]], align 8 ; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_09_IN]], 1 diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll index 4233760333ac..077d3c1f71b3 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll @@ -44,30 +44,32 @@ define void @vector_reverse_mask_v4i1(double* %a, double* %cond, i64 %N) #0 { ; CHECK-NEXT: [[TMP4:%.*]] = bitcast double* [[TMP3]] to <4 x double>* ; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP4]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds double, double* [[TMP2]], i64 -7 -; CHECK-NEXT: [[TMP6:%.*]] = bitcast double* [[TMP5]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, <4 x double>* [[TMP6]], align 8, !alias.scope !0 +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds double, double* [[TMP2]], i64 -4 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, double* [[TMP5]], i64 -3 +; CHECK-NEXT: [[TMP7:%.*]] = bitcast double* [[TMP6]] to <4 x double>* +; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, <4 x double>* [[TMP7]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP7:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer -; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP1]] -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[TMP9]], i64 -3 -; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP7]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP11:%.*]] = bitcast double* [[TMP10]] to <4 x double>* -; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP11]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds double, double* [[TMP9]], i64 -7 -; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP13:%.*]] = bitcast double* [[TMP12]] to <4 x double>* -; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP13]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP14:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> -; CHECK-NEXT: [[TMP15:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> -; CHECK-NEXT: [[TMP16:%.*]] = bitcast double* [[TMP10]] to <4 x double>* -; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP14]], <4 x double>* [[TMP16]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP12]] to <4 x double>* -; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP15]], <4 x double>* [[TMP17]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer +; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP1]] +; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[TMP10]], i64 -3 +; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> +; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[TMP10]], i64 -4 +; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds double, double* [[TMP13]], i64 -3 +; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> +; CHECK-NEXT: [[TMP15:%.*]] = bitcast double* [[TMP14]] to <4 x double>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> +; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> +; CHECK-NEXT: [[TMP18:%.*]] = bitcast double* [[TMP11]] to <4 x double>* +; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP14]] to <4 x double>* +; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP17]], <4 x double>* [[TMP19]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8 -; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] +; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -82,13 +84,13 @@ define void @vector_reverse_mask_v4i1(double* %a, double* %cond, i64 %N) #0 { ; CHECK-NEXT: [[I_08_IN:%.*]] = phi i64 [ [[I_08:%.*]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[COND]], i64 [[I_08]] -; CHECK-NEXT: [[TMP19:%.*]] = load double, double* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP19]], 0.000000e+00 +; CHECK-NEXT: [[TMP21:%.*]] = load double, double* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP21]], 0.000000e+00 ; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[FOR_INC]] ; CHECK: if.then: ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[I_08]] -; CHECK-NEXT: [[TMP20:%.*]] = load double, double* [[ARRAYIDX1]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00 +; CHECK-NEXT: [[TMP22:%.*]] = load double, double* [[ARRAYIDX1]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00 ; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8 ; CHECK-NEXT: br label [[FOR_INC]] ; CHECK: for.inc: diff --git a/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll b/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll index 3e8ac1bad93c..e66fbede57b7 100644 --- a/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll +++ b/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll @@ -1367,26 +1367,28 @@ define i32 @reduction_interleave_group(i32 %n, i32* %arr) #0 { ; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] ; CHECK: vector.body: ; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[VEC_PHI:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT: [[VEC_PHI:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP10:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[ARR:%.*]], i32 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <8 x i32>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP4]], align 4 +; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, i32* [[ARR:%.*]], i32 -1 +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[TMP4]], i32 [[TMP3]] +; CHECK-NEXT: [[TMP6:%.*]] = bitcast i32* [[TMP5]] to <8 x i32>* +; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP6]], align 4 ; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]]) -; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], [[VEC_PHI]] -; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]]) -; CHECK-NEXT: [[TMP8]] = add i32 [[TMP7]], [[TMP6]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]]) +; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], [[VEC_PHI]] +; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]]) +; CHECK-NEXT: [[TMP10]] = add i32 [[TMP9]], [[TMP8]] ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]] +; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT]], label [[SCALAR_PH]] ; CHECK: scalar.ph: ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] -; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP8]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] +; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP10]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: ; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] @@ -1402,7 +1404,7 @@ define i32 @reduction_interleave_group(i32 %n, i32* %arr) #0 { ; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N]] ; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop [[LOOP31:![0-9]+]] ; CHECK: exit: -; CHECK-NEXT: [[RET_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: [[RET_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: ret i32 [[RET_LCSSA]] ; entry: diff --git a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll index a80140fea413..884d743a1bad 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll @@ -1439,17 +1439,19 @@ define dso_local void @masked_strided2(i8* noalias nocapture readonly %p, i8* no ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i8* [[TMP7]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[TMP8]], i32 [[TMP4]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8 ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -1875,17 +1877,19 @@ define dso_local void @masked_strided2_unknown_tc(i8* noalias nocapture readonly ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP5]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = select <8 x i1> [[TMP6]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = sub <8 x i8> zeroinitializer, [[TMP7]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP2]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP7]], <8 x i8> [[TMP8]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i32 [[TMP2]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = select <8 x i1> [[TMP7]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = sub <8 x i8> zeroinitializer, [[TMP8]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, i8* [[TMP10]], i32 [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP12:%.*]] = bitcast i8* [[TMP11]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP8]], <8 x i8> [[TMP9]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP12]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP13]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -2311,16 +2315,18 @@ define dso_local void @unconditional_masked_strided2_unknown_tc(i8* noalias noca ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i8* [[TMP7]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[TMP8]], i32 [[TMP4]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll index 65838c1f4b02..24bedad51ae1 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll @@ -74,23 +74,25 @@ define dso_local void @test1(i16* noalias nocapture %points, i16* noalias nocapt ; ; ENABLED_MASKED_STRIDED-LABEL: @test1( ; ENABLED_MASKED_STRIDED-NEXT: entry: +; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 -1 ; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]] ; ENABLED_MASKED_STRIDED: vector.body: ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = bitcast i16* [[TMP0]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, <4 x i16>* [[TMP1]], align 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[INDEX]], 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = bitcast i16* [[TMP3]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i16>, <4 x i16>* [[TMP4]], align 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 [[TMP2]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = bitcast i16* [[TMP5]] to <16 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = bitcast i16* [[TMP1]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, <4 x i16>* [[TMP2]], align 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[INDEX]], 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = bitcast i16* [[TMP4]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i64 [[TMP3]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, i16* [[TMP0]], i64 [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i16* [[TMP7]] to <16 x i16>* ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD1]], <16 x i32> <i32 0, i32 4, i32 undef, i32 undef, i32 1, i32 5, i32 undef, i32 undef, i32 2, i32 6, i32 undef, i32 undef, i32 3, i32 7, i32 undef, i32 undef> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP6]], i32 2, <16 x i1> <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false>) +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP8]], i32 2, <16 x i1> <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false>) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -244,29 +246,31 @@ define dso_local void @test2(i16* noalias nocapture %points, i32 %numPoints, i16 ; ENABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer +; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 -1 ; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]] ; ENABLED_MASKED_STRIDED: vector.body: ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX]], i32 0 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer ; ENABLED_MASKED_STRIDED-NEXT: [[INDUCTION:%.*]] = or <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <4 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = bitcast i16* [[TMP1]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP2]], i32 2, <4 x i1> [[TMP0]], <4 x i16> poison) -; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = shl nsw i64 [[INDEX]], 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = bitcast i16* [[TMP4]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD3:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP5]], i32 2, <4 x i1> [[TMP0]], <4 x i16> poison) -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 [[TMP3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = bitcast i16* [[TMP6]] to <16 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <4 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = bitcast i16* [[TMP2]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP3]], i32 2, <4 x i1> [[TMP1]], <4 x i16> poison) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = shl nsw i64 [[INDEX]], 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = bitcast i16* [[TMP5]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD3:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP6]], i32 2, <4 x i1> [[TMP1]], <4 x i16> poison) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i16, i16* [[TMP0]], i64 [[TMP7]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = bitcast i16* [[TMP8]] to <16 x i16>* ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_MASKED_LOAD]], <4 x i16> [[WIDE_MASKED_LOAD3]], <16 x i32> <i32 0, i32 4, i32 undef, i32 undef, i32 1, i32 5, i32 undef, i32 undef, i32 2, i32 6, i32 undef, i32 undef, i32 3, i32 7, i32 undef, i32 undef> -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP0]], <4 x i1> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP7]], i32 2, <16 x i1> [[TMP8]]) +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP1]], <4 x i1> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3> +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP9]], i32 2, <16 x i1> [[TMP10]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END_LOOPEXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END_LOOPEXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end.loopexit: ; ENABLED_MASKED_STRIDED-NEXT: br label [[FOR_END]] ; ENABLED_MASKED_STRIDED: for.end: diff --git a/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll b/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll index 89c6efa6945c..0a127ad4ef88 100644 --- a/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll +++ b/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll @@ -50,8 +50,8 @@ for.end: ; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] ; CHECK: %offset.idx = sub i64 %n, %index ; CHECK-NOT: getelementptr -; CHECK: %[[G0IDX:.+]] = add nsw i64 %offset.idx, -3 -; CHECK: getelementptr inbounds i32, i32* %a, i64 %[[G0IDX]] +; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 -3 +; CHECK: getelementptr inbounds i32, i32* %[[G0]], i64 %offset.idx ; CHECK-NOT: getelementptr ; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body ; diff --git a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll index e56b607342e6..3e77d76a26a7 100644 --- a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll +++ b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll @@ -686,17 +686,19 @@ define void @mixed_load2_store2(i32* noalias nocapture readonly %A, i32* noalias ; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP1]], align 4 ; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP2:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]] +; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP3:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]] ; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]] -; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[TMP4]] to <8 x i32>* -; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> -; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP5]], align 4 +; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]] +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 -1 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP5]], i64 [[TMP2]] +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <8 x i32>* +; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> +; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP7]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512 -; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]] +; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512 +; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -760,17 +762,19 @@ define void @mixed_load3_store3(i32* nocapture %A) { ; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10> ; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11> ; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]] -; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]] -; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]] -; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[NEXT_GEP]] to <12 x i32>* -; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> -; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> -; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> [[TMP7]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11> -; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP5]], align 4 +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[NEXT_GEP]], i64 2 +; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]] +; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]] +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP3]], i64 -2 +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <12 x i32>* +; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> +; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> +; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11> +; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 ; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4> -; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 -; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]] +; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 +; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -1315,21 +1319,23 @@ define void @PR27626_4(i32 *%a, i32 %x, i32 %y, i32 %z, i64 %n) { ; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 2 ; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 4 ; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 6 -; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP3]] -; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP4]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP5]] -; CHECK-NEXT: store i32 [[X:%.*]], i32* [[TMP6]], align 4 -; CHECK-NEXT: store i32 [[X]], i32* [[TMP7]], align 4 +; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[OFFSET_IDX]] +; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP3]] +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP4]] +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 -1 +; CHECK-NEXT: store i32 [[X:%.*]], i32* [[TMP7]], align 4 ; CHECK-NEXT: store i32 [[X]], i32* [[TMP8]], align 4 ; CHECK-NEXT: store i32 [[X]], i32* [[TMP9]], align 4 -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <8 x i32>* +; CHECK-NEXT: store i32 [[X]], i32* [[TMP10]], align 4 +; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[TMP11]], i64 [[TMP6]] +; CHECK-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <8 x i32>* ; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> [[BROADCAST_SPLAT2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> -; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP11]], align 4 +; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP13]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]] +; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]] </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-release-arm-next-allyesconfig - Build # 25 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/gnu-release-arm-next-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-release-arm-next-allyesconfig Culprit: <cut> commit ee1ba5beab143f3afcc89720bb18ac438c3241b3 Merge: 2fdba6b39b9b 54f9cb2466e1 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Wed Sep 8 10:07:28 2021 +1000 Merge remote-tracking branch 'pm/linux-next' </cut> Results regressed to (for first_bad == ee1ba5beab143f3afcc89720bb18ac438c3241b3) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19801 # First few build errors in logs: from (for last_good == 2fdba6b39b9bd8deafe182764414eb075032c31d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19887 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#626bf91…" Reproduce builds: <cut> mkdir investigate-linux-ee1ba5beab143f3afcc89720bb18ac438c3241b3 cd investigate-linux-ee1ba5beab143f3afcc89720bb18ac438c3241b3 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach ee1ba5beab143f3afcc89720bb18ac438c3241b3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 2fdba6b39b9bd8deafe182764414eb075032c31d ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allye… Full commit (up to 1000 lines): <cut> commit ee1ba5beab143f3afcc89720bb18ac438c3241b3 Merge: 2fdba6b39b9b 54f9cb2466e1 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Wed Sep 8 10:07:28 2021 +1000 Merge remote-tracking branch 'pm/linux-next' Documentation/admin-guide/acpi/ssdt-overlays.rst | 49 +- Documentation/cpu-freq/cpu-drivers.rst | 3 - .../devicetree/bindings/cpufreq/cpufreq-dt.txt | 2 +- .../bindings/cpufreq/cpufreq-mediatek-hw.yaml | 70 +++ .../bindings/cpufreq/cpufreq-mediatek.txt | 2 +- .../devicetree/bindings/cpufreq/cpufreq-st.txt | 6 +- .../bindings/cpufreq/nvidia,tegra20-cpufreq.txt | 2 +- .../devicetree/bindings/devfreq/rk3399_dmc.txt | 2 +- .../devicetree/bindings/gpu/arm,mali-bifrost.yaml | 2 +- .../devicetree/bindings/gpu/arm,mali-midgard.yaml | 2 +- .../bindings/interconnect/fsl,imx8m-noc.yaml | 4 +- .../opp/allwinner,sun50i-h6-operating-points.yaml | 4 + Documentation/devicetree/bindings/opp/opp-v1.yaml | 51 ++ .../devicetree/bindings/opp/opp-v2-base.yaml | 214 +++++++ Documentation/devicetree/bindings/opp/opp-v2.yaml | 475 ++++++++++++++++ Documentation/devicetree/bindings/opp/opp.txt | 622 --------------------- Documentation/devicetree/bindings/opp/qcom-opp.txt | 2 +- .../bindings/opp/ti-omap5-opp-supply.txt | 2 +- .../devicetree/bindings/power/power-domain.yaml | 2 +- .../translations/zh_CN/cpu-freq/cpu-drivers.rst | 2 - arch/arm/boot/dts/omap34xx.dtsi | 1 - arch/arm/boot/dts/omap36xx.dtsi | 1 - drivers/acpi/x86/s2idle.c | 67 ++- drivers/base/arch_topology.c | 2 + drivers/cpufreq/Kconfig.arm | 12 + drivers/cpufreq/Makefile | 1 + drivers/cpufreq/acpi-cpufreq.c | 14 +- drivers/cpufreq/cpufreq-dt-platdev.c | 4 + drivers/cpufreq/cpufreq-dt.c | 3 +- drivers/cpufreq/cpufreq.c | 17 +- drivers/cpufreq/imx6q-cpufreq.c | 2 +- drivers/cpufreq/intel_pstate.c | 39 -- drivers/cpufreq/mediatek-cpufreq-hw.c | 308 ++++++++++ drivers/cpufreq/mediatek-cpufreq.c | 3 +- drivers/cpufreq/omap-cpufreq.c | 2 +- drivers/cpufreq/qcom-cpufreq-hw.c | 151 ++++- drivers/cpufreq/scmi-cpufreq.c | 65 ++- drivers/cpufreq/scpi-cpufreq.c | 3 +- drivers/cpufreq/sh-cpufreq.c | 11 - drivers/cpufreq/vexpress-spc-cpufreq.c | 25 +- drivers/pci/controller/vmd.c | 55 ++ drivers/pci/host-bridge.c | 1 + drivers/pci/pci-acpi.c | 74 +++ include/linux/cpufreq.h | 75 ++- include/linux/pci-acpi.h | 3 + 45 files changed, 1638 insertions(+), 819 deletions(-) </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-master-arm-mainline-allmodconfig - Build # 28 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/gnu-master-arm-mainline-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-master-arm-mainline-allmodconfig Culprit: <cut> commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 Author: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Sun Sep 5 11:24:05 2021 -0700 Enable '-Werror' by default for all kernel builds ... but make it a config option so that broken environments can disable it when required. We really should always have a clean build, and will disable specific over-eager warnings as required, if we can't fix them. But while I fairly religiously enforce that in my own tree, it doesn't get enforced by various build robots that don't necessarily report warnings. So this just makes '-Werror' a default compiler flag, but allows people to disable it for their configuration if they have some particular issues. Occasionally, new compiler versions end up enabling new warnings, and it can take a while before we have them fixed (or the warnings disabled if that is what it takes), so the config option allows for that situation. Hopefully this will mean that I get fewer pull requests that have new warnings that were not noticed by various automation we have in place. Knock wood. Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> </cut> Results regressed to (for first_bad == 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21769 # First few build errors in logs: from (for last_good == fd47ff55c9c31101fcc06d20cb381da3d4089bd5) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29880 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 cd investigate-linux-3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fd47ff55c9c31101fcc06d20cb381da3d4089bd5 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-mainline-al… Full commit (up to 1000 lines): <cut> commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151 Author: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Sun Sep 5 11:24:05 2021 -0700 Enable '-Werror' by default for all kernel builds ... but make it a config option so that broken environments can disable it when required. We really should always have a clean build, and will disable specific over-eager warnings as required, if we can't fix them. But while I fairly religiously enforce that in my own tree, it doesn't get enforced by various build robots that don't necessarily report warnings. So this just makes '-Werror' a default compiler flag, but allows people to disable it for their configuration if they have some particular issues. Occasionally, new compiler versions end up enabling new warnings, and it can take a while before we have them fixed (or the warnings disabled if that is what it takes), so the config option allows for that situation. Hopefully this will mean that I get fewer pull requests that have new warnings that were not noticed by various automation we have in place. Knock wood. Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> --- Makefile | 3 +++ init/Kconfig | 14 ++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/Makefile b/Makefile index 6bc1c5b17a62..d45fc2edf186 100644 --- a/Makefile +++ b/Makefile @@ -785,6 +785,9 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG) := -fstack-protector-strong KBUILD_CFLAGS += $(stackp-flags-y) +KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror +KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) + ifdef CONFIG_CC_IS_CLANG KBUILD_CPPFLAGS += -Qunused-arguments # The kernel builds with '-std=gnu89' so use of GNU extensions is acceptable. diff --git a/init/Kconfig b/init/Kconfig index e708180e9a59..8cb97f141b70 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -137,6 +137,20 @@ config COMPILE_TEST here. If you are a user/distributor, say N here to exclude useless drivers to be distributed. +config WERROR + bool "Compile the kernel with warnings as errors" + default y + help + A kernel build should not cause any compiler warnings, and this + enables the '-Werror' flag to enforce that rule by default. + + However, if you have a new (or very old) compiler with odd and + unusual warnings, or you have some architecture with problems, + you may need to disable this config option in order to + successfully build the kernel. + + If in doubt, say Y. + config UAPI_HEADER_TEST bool "Compile test UAPI headers" depends on HEADERS_INSTALL && CC_CAN_LINK </cut>

3 years, 9 months

3
5
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-arm-spec2k6-Oz - Build # 7 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *binutils* in CI configuration tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz Culprit: <cut> commit f947f96797f8ec33aabf9cd7234c850778068445 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 14:34:03 2021 +0200 [gdb/cli] Don't assert on empty string for core-file With current gdb we run into: ... $ gdb -batch '' '' : No such file or directory. pathstuff.cc:132: internal-error: \ gdb::unique_xmalloc_ptr<char> gdb_abspath(const char*): \ Assertion `path != NULL && path[0] != '\0'' failed. ... Fix this by skipping the call to gdb_abspath in core_target_open in the empty-string case, such that we have instead: ... $ gdb -batch '' '' : No such file or directory. : No such file or directory. $ ... Tested on x86_64-linux. gdb/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb/corelow.c (core_target_open): Skip call to gdb_abspath in the empty-string case. gdb/testsuite/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb.base/batch-exit-status.exp: Add gdb '' and gdb '' '' tests. </cut> Results regressed to (for first_bad == f947f96797f8ec33aabf9cd7234c850778068445) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-f947f96797f8ec33aabf9cd7234c850778068445/results_id: 1 # 447.dealII,[.] contract<3> regressed by 200 from (for last_good == 9b9b1092f0a8e6b7d240ea05a74968a883b8a05c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-9b9b1092f0a8e6b7d240ea05a74968a883b8a05c/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4909 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4905 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-binutils-f947f96797f8ec33aabf9cd7234c850778068445 cd investigate-binutils-f947f96797f8ec33aabf9cd7234c850778068445 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach f947f96797f8ec33aabf9cd7234c850778068445 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b9b1092f0a8e6b7d240ea05a74968a883b8a05c ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit f947f96797f8ec33aabf9cd7234c850778068445 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 14:34:03 2021 +0200 [gdb/cli] Don't assert on empty string for core-file With current gdb we run into: ... $ gdb -batch '' '' : No such file or directory. pathstuff.cc:132: internal-error: \ gdb::unique_xmalloc_ptr<char> gdb_abspath(const char*): \ Assertion `path != NULL && path[0] != '\0'' failed. ... Fix this by skipping the call to gdb_abspath in core_target_open in the empty-string case, such that we have instead: ... $ gdb -batch '' '' : No such file or directory. : No such file or directory. $ ... Tested on x86_64-linux. gdb/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb/corelow.c (core_target_open): Skip call to gdb_abspath in the empty-string case. gdb/testsuite/ChangeLog: 2021-08-30 Tom de Vries <tdevries(a)suse.de> PR cli/28290 * gdb.base/batch-exit-status.exp: Add gdb '' and gdb '' '' tests. --- gdb/corelow.c | 3 ++- gdb/testsuite/gdb.base/batch-exit-status.exp | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/gdb/corelow.c b/gdb/corelow.c index eb785a08633..711e86c4cd4 100644 --- a/gdb/corelow.c +++ b/gdb/corelow.c @@ -428,7 +428,8 @@ core_target_open (const char *arg, int from_tty) } gdb::unique_xmalloc_ptr<char> filename (tilde_expand (arg)); - if (!IS_ABSOLUTE_PATH (filename.get ())) + if (strlen (filename.get ()) != 0 + && !IS_ABSOLUTE_PATH (filename.get ())) filename = gdb_abspath (filename.get ()); flags = O_BINARY | O_LARGEFILE; diff --git a/gdb/testsuite/gdb.base/batch-exit-status.exp b/gdb/testsuite/gdb.base/batch-exit-status.exp index 085dfc6ad56..9a080196bd6 100644 --- a/gdb/testsuite/gdb.base/batch-exit-status.exp +++ b/gdb/testsuite/gdb.base/batch-exit-status.exp @@ -76,3 +76,7 @@ test_exit_status 1 "-batch -x $good_commands -x $bad_commands" \ "-batch -x good-commands -x bad-commands" test_exit_status 1 "-batch -x $good_commands -ex \"set not-a-thing 4\"" \ "-batch -x good-commands -ex \"set not-a-thing 4\"" + +set no_such_re ": No such file or directory\\." +test_exit_status 1 "-batch \"\"" $no_such_re +test_exit_status 1 "-batch \"\" \"\"" [multi_line $no_such_re $no_such_re] </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-release-arm-next-allmodconfig - Build # 20 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/gnu-release-arm-next-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-release-arm-next-allmodconfig Culprit: <cut> commit 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 Merge: 907f2745370d 6f65d2319f21 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Tue Sep 7 10:00:35 2021 +1000 Merge remote-tracking branch 'pm/linux-next' </cut> Results regressed to (for first_bad == 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21778 # First few build errors in logs: from (for last_good == 907f2745370d3cfcc6efe7772def37c4eee4b960) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29889 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#4b93c54…" Reproduce builds: <cut> mkdir investigate-linux-4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 cd investigate-linux-4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 907f2745370d3cfcc6efe7772def37c4eee4b960 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-release-arm-next-allmo… Full commit (up to 1000 lines): <cut> commit 4d3b252a0a3aed2f6fc70aec3c37275a9ca179a4 Merge: 907f2745370d 6f65d2319f21 Author: Stephen Rothwell <sfr(a)canb.auug.org.au> Date: Tue Sep 7 10:00:35 2021 +1000 Merge remote-tracking branch 'pm/linux-next' Documentation/admin-guide/acpi/ssdt-overlays.rst | 49 +- Documentation/cpu-freq/cpu-drivers.rst | 3 - .../devicetree/bindings/cpufreq/cpufreq-dt.txt | 2 +- .../bindings/cpufreq/cpufreq-mediatek.txt | 2 +- .../devicetree/bindings/cpufreq/cpufreq-st.txt | 6 +- .../bindings/cpufreq/nvidia,tegra20-cpufreq.txt | 2 +- .../devicetree/bindings/devfreq/rk3399_dmc.txt | 2 +- .../devicetree/bindings/gpu/arm,mali-bifrost.yaml | 2 +- .../devicetree/bindings/gpu/arm,mali-midgard.yaml | 2 +- .../bindings/interconnect/fsl,imx8m-noc.yaml | 4 +- .../opp/allwinner,sun50i-h6-operating-points.yaml | 4 + Documentation/devicetree/bindings/opp/opp-v1.yaml | 51 ++ .../devicetree/bindings/opp/opp-v2-base.yaml | 214 +++++++ Documentation/devicetree/bindings/opp/opp-v2.yaml | 475 ++++++++++++++++ Documentation/devicetree/bindings/opp/opp.txt | 622 --------------------- Documentation/devicetree/bindings/opp/qcom-opp.txt | 2 +- .../bindings/opp/ti-omap5-opp-supply.txt | 2 +- .../devicetree/bindings/power/power-domain.yaml | 2 +- .../translations/zh_CN/cpu-freq/cpu-drivers.rst | 2 - arch/arm/boot/dts/omap34xx.dtsi | 1 - arch/arm/boot/dts/omap36xx.dtsi | 1 - drivers/acpi/x86/s2idle.c | 67 ++- drivers/base/arch_topology.c | 2 + drivers/cpufreq/acpi-cpufreq.c | 14 +- drivers/cpufreq/cpufreq-dt-platdev.c | 4 + drivers/cpufreq/cpufreq-dt.c | 3 +- drivers/cpufreq/cpufreq.c | 17 +- drivers/cpufreq/imx6q-cpufreq.c | 2 +- drivers/cpufreq/mediatek-cpufreq.c | 3 +- drivers/cpufreq/omap-cpufreq.c | 2 +- drivers/cpufreq/qcom-cpufreq-hw.c | 151 ++++- drivers/cpufreq/scmi-cpufreq.c | 65 ++- drivers/cpufreq/scpi-cpufreq.c | 3 +- drivers/cpufreq/sh-cpufreq.c | 11 - drivers/cpufreq/vexpress-spc-cpufreq.c | 25 +- drivers/pci/controller/vmd.c | 55 ++ drivers/pci/host-bridge.c | 1 + drivers/pci/pci-acpi.c | 74 +++ include/linux/cpufreq.h | 17 +- include/linux/pci-acpi.h | 3 + 40 files changed, 1190 insertions(+), 779 deletions(-) </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3 - Build # 8 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3 Culprit: <cut> commit 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. (cherry picked from commit 4389a413e2129d7d55ee779638b649aa852b6f8a) </cut> Results regressed to (for first_bad == 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-34f839fc9d4c0638e09c81e9981d4dacf69c3ed6/results_id: 1 # 470.lbm,lbm_base.default regressed by 109 # 444.namd,namd_base.default regressed by 104 # 447.dealII,dealII_base.default regressed by 106 # 447.dealII,[.] _ZNK12SparseMatrixIdE5vmultI6VectorIdES3_EEvRT regressed by 115 # 447.dealII,[.] _ZN16ConstraintMatrix8add_lineEj regressed by 112 # 433.milc,milc_base.default regressed by 104 # 433.milc,[.] mult_su3_mat_vec regressed by 115 from (for last_good == b643ee1b9c1a8e0b81e31908a066c71851292890) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-b643ee1b9c1a8e0b81e31908a066c71851292890/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/4881 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/4887 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 cd investigate-llvm-34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach b643ee1b9c1a8e0b81e31908a066c71851292890 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 34f839fc9d4c0638e09c81e9981d4dacf69c3ed6 Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. (cherry picked from commit 4389a413e2129d7d55ee779638b649aa852b6f8a) --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index aecd28e5e12a..20be01a5f40a 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 0e129e6f2fac..4c8ba8cdcd29 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2637,7 +2637,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2662,7 +2662,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2682,18 +2682,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2779,11 +2781,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2881,17 +2881,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz_LTO - Build # 6 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Culprit: <cut> commit 131b4620ee7847102479f399ce3e35a3c1cb5461 Author: Corentin Jabot <corentin.jabot(a)gmail.com> Date: Fri Aug 6 10:29:28 2021 -0400 Implement P1937 consteval in unevaluated contexts In an unevaluated contexts, consteval functions should not be immediately evaluated. </cut> Results regressed to (for first_bad == 131b4620ee7847102479f399ce3e35a3c1cb5461) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-131b4620ee7847102479f399ce3e35a3c1cb5461/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == 3c8e94bc20e5829ab5167d21d242b6b624dd934e) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-3c8e94bc20e5829ab5167d21d242b6b624dd934e/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/4879 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/4868 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-131b4620ee7847102479f399ce3e35a3c1cb5461 cd investigate-llvm-131b4620ee7847102479f399ce3e35a3c1cb5461 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 131b4620ee7847102479f399ce3e35a3c1cb5461 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3c8e94bc20e5829ab5167d21d242b6b624dd934e ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 131b4620ee7847102479f399ce3e35a3c1cb5461 Author: Corentin Jabot <corentin.jabot(a)gmail.com> Date: Fri Aug 6 10:29:28 2021 -0400 Implement P1937 consteval in unevaluated contexts In an unevaluated contexts, consteval functions should not be immediately evaluated. --- clang/lib/Sema/SemaExpr.cpp | 7 ++--- clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp | 33 +++++++++++++++++++++++- clang/test/SemaCXX/cxx2a-consteval.cpp | 18 +++++++++++++ clang/www/cxx_status.html | 3 ++- 4 files changed, 56 insertions(+), 5 deletions(-) diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index d316687c4cd8..8ef4a9d96320 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -16641,7 +16641,8 @@ void Sema::CheckUnusedVolatileAssignment(Expr *E) { } ExprResult Sema::CheckForImmediateInvocation(ExprResult E, FunctionDecl *Decl) { - if (!E.isUsable() || !Decl || !Decl->isConsteval() || isConstantEvaluated() || + if (isUnevaluatedContext() || !E.isUsable() || !Decl || + !Decl->isConsteval() || isConstantEvaluated() || RebuildingImmediateInvocation) return E; @@ -18758,8 +18759,8 @@ void Sema::MarkDeclRefReferenced(DeclRefExpr *E, const Expr *Base) { OdrUse = false; if (auto *FD = dyn_cast<FunctionDecl>(E->getDecl())) - if (!isConstantEvaluated() && FD->isConsteval() && - !RebuildingImmediateInvocation) + if (!isUnevaluatedContext() && !isConstantEvaluated() && + FD->isConsteval() && !RebuildingImmediateInvocation) ExprEvalContexts.back().ReferenceToConsteval.insert(E); MarkExprReferenced(*this, E->getLocation(), E->getDecl(), E, OdrUse, RefsMinusAssignments); diff --git a/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp b/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp index 55debe3ca731..fafcd127feec 100644 --- a/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp +++ b/clang/test/CXX/basic/basic.def.odr/p2-typeid.cpp @@ -1,4 +1,5 @@ // RUN: %clang_cc1 -fsyntax-only -verify %s +// RUN: %clang_cc1 -std=c++20 -fsyntax-only -verify %s // C++ [basic.def.odr]p2: // An expression is potentially evaluated unless it [...] is the @@ -16,7 +17,7 @@ struct Poly { struct NonPoly { }; -template<typename T, typename Result = T> +template<typename T, typename Result = T> struct X { Result f(T t) { return t + t; } // expected-error{{invalid operands}} @@ -34,3 +35,33 @@ void test(X<Poly> xp, X<Poly, Poly&> xpr, X<NonPoly> xnp, X<NonPoly, NonPoly&> x // Triggers an error (as it should); xpr.g(Poly()); // expected-note{{instantiation of member function}} } + +#if __cplusplus >= 202002L + +namespace unevaluated { + +struct S { + void f(); +}; +struct T { + virtual void f(); +}; + +consteval S *null_s() { return nullptr; } +consteval S *make_s() { return new S; } +consteval T *null_t() { return nullptr; } +consteval T *make_t() { return new T; } // #alloc + +void func() { + (void)typeid(*null_s()); + (void)typeid(*make_s()); + (void)typeid(*null_t()); // expected-warning {{expression with side effects will be evaluated despite being used as an operand to 'typeid'}} + (void)typeid(*make_t()); // expected-error {{call to consteval function 'unevaluated::make_t' is not a constant expression}} \ + expected-note {{pointer to heap-allocated object is not a constant expression}} \ + expected-note@#alloc {{heap allocation performed here}} \ + expected-warning {{expression with side effects will be evaluated despite being used as an operand to 'typeid'}} +} + +} // namespace unevaluated + +#endif diff --git a/clang/test/SemaCXX/cxx2a-consteval.cpp b/clang/test/SemaCXX/cxx2a-consteval.cpp index ecf8c1e0f5bd..04c8898aa5ba 100644 --- a/clang/test/SemaCXX/cxx2a-consteval.cpp +++ b/clang/test/SemaCXX/cxx2a-consteval.cpp @@ -594,3 +594,21 @@ void test() { } } // namespace special_ctor + +namespace unevaluated { + +template <typename T, typename U> struct is_same { static const bool value = false; }; +template <typename T> struct is_same<T, T> { static const bool value = true; }; + +long f(); // expected-note {{declared here}} +auto consteval g(auto a) { + return a; +} + +auto e = g(f()); // expected-error {{is not a constant expression}} + // expected-note@-1 {{non-constexpr function 'f' cannot be used in a constant expression}} + +using T = decltype(g(f())); +static_assert(is_same<long, T>::value); + +} // namespace unevaluated diff --git a/clang/www/cxx_status.html b/clang/www/cxx_status.html index 60ce69db9922..3cbee7026c5c 100755 --- a/clang/www/cxx_status.html +++ b/clang/www/cxx_status.html @@ -1105,10 +1105,11 @@ code. This issue is expected to be rectified soon. <tr> <td rowspan=2>Immediate functions (<tt>consteval</tt>)</td> <td><a href="https://wg21.link/p1073r3">P1073R3</a></td> - <td rowspan=2 class="none" align="center">No</td> + <td class="partial" align="center">Partial</td> </tr> <tr>  <td><a href="https://wg21.link/p1937r2">P1937R2</a></td> + <td class="unreleased" align="center">Clang 14</td> </tr> <tr> <td><tt>std::is_constant_evaluated</tt></td> </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3 - Build # 32 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3 Culprit: <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. </cut> Results regressed to (for first_bad == ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3 artifacts/build-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0/results_id: 1 # 429.mcf,mcf_base.default regressed by 106 # 429.mcf,[.] price_out_impl regressed by 174 from (for last_good == a0a0499b8bb920fdd98e791804812f001f0b4fe8) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3 artifacts/build-a0a0499b8bb920fdd98e791804812f001f0b4fe8/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3/4846 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3/4851 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 cd investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a0a0499b8bb920fdd98e791804812f001f0b4fe8 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. --- gcc/testsuite/gcc.dg/lto/pr101868_0.c | 33 +++++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_1.c | 23 +++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_2.c | 11 +++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_3.c | 8 ++++++++ gcc/tree-ssa-pre.c | 7 +++++++ 5 files changed, 82 insertions(+) diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_0.c b/gcc/testsuite/gcc.dg/lto/pr101868_0.c new file mode 100644 index 00000000000..c84d19b0267 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_0.c @@ -0,0 +1,33 @@ +/* { dg-lto-do run } */ +/* { dg-lto-options { "-O2 -fno-strict-aliasing -flto" } } */ + +typedef unsigned long VALUE; + +__attribute__ ((cold)) +void rb_check_type(VALUE, int); + +static VALUE +repro(VALUE dummy, VALUE hash) +{ + if (hash == 0) { + rb_check_type(hash, 1); + } + else if (*(long *)hash) { + rb_check_type(hash, 1); + } + + + return *(long *)hash; +} + +static VALUE (*that)(VALUE dummy, VALUE hash) = repro; + +int +main(int argc, char **argv) +{ + argc--; + that(0, argc); + + rb_check_type(argc, argc); + +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_1.c b/gcc/testsuite/gcc.dg/lto/pr101868_1.c new file mode 100644 index 00000000000..146c14abc76 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_1.c @@ -0,0 +1,23 @@ +typedef unsigned long VALUE; + + +__attribute__ ((noreturn)) void rexc_raise(VALUE mesg); + +VALUE rb_donothing(VALUE klass); + +static void +funexpected_type(VALUE x, int xt, int t) +{ + rexc_raise(rb_donothing(0)); +} + +__attribute__ ((cold)) +void +rb_check_type(VALUE x, int t) +{ + int xt; + + if (x == 0) { + funexpected_type(x, xt, t); + } +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_2.c b/gcc/testsuite/gcc.dg/lto/pr101868_2.c new file mode 100644 index 00000000000..e6f01b23f45 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_2.c @@ -0,0 +1,11 @@ +typedef unsigned long VALUE; + +static void thing(void) {} +static void (*ptr)(void) = &thing; + +VALUE +rb_donothing(VALUE klass) +{ + ptr(); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_3.c b/gcc/testsuite/gcc.dg/lto/pr101868_3.c new file mode 100644 index 00000000000..61217625be7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_3.c @@ -0,0 +1,8 @@ +typedef unsigned long VALUE; + +__attribute__((noreturn)) +void +rexc_raise(VALUE mesg) +{ + __builtin_exit(0); +} diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 04ec4fbaeec..2aedc31e1d7 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2070,6 +2070,13 @@ prune_clobbered_mems (bitmap_set_t set, basic_block block) && value_dies_in_block_x (expr, block)))) to_remove = i; } + /* If the REFERENCE may trap make sure the block does not contain + a possible exit point. + ??? This is overly conservative if we translate AVAIL_OUT + as the available expression might be after the exit point. */ + if (BB_MAY_NOTRETURN (block) + && vn_reference_may_trap (ref)) + to_remove = i; } else if (expr->kind == NARY) { </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3_LTO - Build # 32 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 19dc02e99f802922a3af69e802465bee0723b57a Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Sun Aug 22 18:15:55 2021 +0200 [MergeICmps] Allow sinking past non-load/store This is a followup to D106591. MergeICmps currently only allows sinking the loads past either instructions that don't write to memory at all, or simple loads/stores that don't modify the memory the loads access. The "simple loads/stores" part of this check doesn't seem necessary to me -- AA isModRef() already accurately models any operation that may clobber the memory. For example, in the adjusted test case the transform is still fine if the call to @foo() isn't readonly, but inaccessiblememonly -- in both cases, the call cannot modify the loaded memory. Differential Revision: https://reviews.llvm.org/D108517 </cut> Results regressed to (for first_bad == 19dc02e99f802922a3af69e802465bee0723b57a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-19dc02e99f802922a3af69e802465bee0723b57a/results_id: 1 # 464.h264ref,h264ref_base.default regressed by 105 from (for last_good == da12d88b1c5fc42b49b92fcf94917ca489dd677f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-da12d88b1c5fc42b49b92fcf94917ca489dd677f/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3_LTO/4822 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3_LTO/4807 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-19dc02e99f802922a3af69e802465bee0723b57a cd investigate-llvm-19dc02e99f802922a3af69e802465bee0723b57a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 19dc02e99f802922a3af69e802465bee0723b57a ../artifacts/test.sh # Reproduce last_good build git checkout --detach da12d88b1c5fc42b49b92fcf94917ca489dd677f ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 19dc02e99f802922a3af69e802465bee0723b57a Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Sun Aug 22 18:15:55 2021 +0200 [MergeICmps] Allow sinking past non-load/store This is a followup to D106591. MergeICmps currently only allows sinking the loads past either instructions that don't write to memory at all, or simple loads/stores that don't modify the memory the loads access. The "simple loads/stores" part of this check doesn't seem necessary to me -- AA isModRef() already accurately models any operation that may clobber the memory. For example, in the adjusted test case the transform is still fine if the call to @foo() isn't readonly, but inaccessiblememonly -- in both cases, the call cannot modify the loaded memory. Differential Revision: https://reviews.llvm.org/D108517 --- llvm/lib/Transforms/Scalar/MergeICmps.cpp | 14 +------------- .../Transforms/MergeICmps/X86/split-block-does-work.ll | 2 +- 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp index f13f24ad2027..34465c76dd3d 100644 --- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp +++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp @@ -66,15 +66,6 @@ namespace { #define DEBUG_TYPE "mergeicmps" -// Returns true if the instruction is a simple load or a simple store -static bool isSimpleLoadOrStore(const Instruction *I) { - if (const LoadInst *LI = dyn_cast<LoadInst>(I)) - return LI->isSimple(); - if (const StoreInst *SI = dyn_cast<StoreInst>(I)) - return SI->isSimple(); - return false; -} - // A BCE atom "Binary Compare Expression Atom" represents an integer load // that is a constant offset from a base value, e.g. `a` or `o.c` in the example // at the top. @@ -244,10 +235,7 @@ bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst, // If this instruction may clobber the loads and is in middle of the BCE cmp // block instructions, then bail for now. if (Inst->mayWriteToMemory()) { - // Bail if this is not a simple load or store - if (!isSimpleLoadOrStore(Inst)) - return false; - // Disallow stores that might alias the BCE operands + // Disallow instructions that might modify the BCE operands MemoryLocation LLoc = MemoryLocation::get(Cmp.Lhs.LoadI); MemoryLocation RLoc = MemoryLocation::get(Cmp.Rhs.LoadI); if (isModSet(AA.getModRefInfo(Inst, LLoc)) || diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll index 0b9663f44980..1e341b92918d 100644 --- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll +++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll @@ -3,7 +3,7 @@ %S = type { i32, i32, i32, i32 } -declare void @foo(...) readonly +declare void @foo(...) inaccessiblememonly ; We can split %entry and create a memcmp(16 bytes). define zeroext i1 @opeq1( </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-release-aarch64-spec2k6-Os - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os Culprit: <cut> commit 1828e57eb58685a6a7f6d4f4f698dfebf98ef789 Author: Sami Tolvanen <samitolvanen(a)google.com> Date: Tue Aug 3 10:56:56 2021 -0700 ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands 700d07f8ce6f2879610fd6b6968b05c6f17bb915 with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058 (cherry picked from commit 7ce1c4da7726577986535cb7766d782f325145fe) </cut> Results regressed to (for first_bad == 1828e57eb58685a6a7f6d4f4f698dfebf98ef789) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-1828e57eb58685a6a7f6d4f4f698dfebf98ef789/results_id: 1 # 453.povray,povray_base.default regressed by 102 # 470.lbm,lbm_base.default regressed by 103 # 470.lbm,[.] LBM_performStreamCollide regressed by 118 from (for last_good == 7161e4f3345fda1b640a8250a4b34d23c74b0489) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-7161e4f3345fda1b640a8250a4b34d23c74b0489/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-release-aarch64-spec2k6-Os/4779 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-release-aarch64-spec2k6-Os/4788 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-1828e57eb58685a6a7f6d4f4f698dfebf98ef789 cd investigate-llvm-1828e57eb58685a6a7f6d4f4f698dfebf98ef789 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1828e57eb58685a6a7f6d4f4f698dfebf98ef789 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 7161e4f3345fda1b640a8250a4b34d23c74b0489 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Full commit (up to 1000 lines): <cut> commit 1828e57eb58685a6a7f6d4f4f698dfebf98ef789 Author: Sami Tolvanen <samitolvanen(a)google.com> Date: Tue Aug 3 10:56:56 2021 -0700 ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands 700d07f8ce6f2879610fd6b6968b05c6f17bb915 with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058 (cherry picked from commit 7ce1c4da7726577986535cb7766d782f325145fe) --- llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | 21 +++++++++++++++++++++ llvm/test/ThinLTO/X86/devirt2.ll | 4 ++++ .../cfi-icall-static-inline-asm.ll | 22 ++++++++++++++++++++++ .../ThinLTOBitcodeWriter/split-internal2.ll | 3 +++ .../ThinLTOBitcodeWriter/split-vfunc-internal.ll | 3 +++ 5 files changed, 53 insertions(+) diff --git a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp index 37329b489555..eea848d3eb2f 100644 --- a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp +++ b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp @@ -33,6 +33,19 @@ using namespace llvm; namespace { +// Determine if a promotion alias should be created for a symbol name. +static bool allowPromotionAlias(const std::string &Name) { + // Promotion aliases are used only in inline assembly. It's safe to + // simply skip unusual names. Subset of MCAsmInfo::isAcceptableChar() + // and MCAsmInfoXCOFF::isAcceptableChar(). + for (const char &C : Name) { + if (isAlnum(C) || C == '_' || C == '.') + continue; + return false; + } + return true; +} + // Promote each local-linkage entity defined by ExportM and used by ImportM by // changing visibility and appending the given ModuleId. void promoteInternals(Module &ExportM, Module &ImportM, StringRef ModuleId, @@ -55,6 +68,7 @@ void promoteInternals(Module &ExportM, Module &ImportM, StringRef ModuleId, } } + std::string OldName = Name.str(); std::string NewName = (Name + ModuleId).str(); if (const auto *C = ExportGV.getComdat()) @@ -69,6 +83,13 @@ void promoteInternals(Module &ExportM, Module &ImportM, StringRef ModuleId, ImportGV->setName(NewName); ImportGV->setVisibility(GlobalValue::HiddenVisibility); } + + if (isa<Function>(&ExportGV) && allowPromotionAlias(OldName)) { + // Create a local alias with the original name to avoid breaking + // references from inline assembly. + std::string Alias = ".set " + OldName + "," + NewName + "\n"; + ExportM.appendModuleInlineAsm(Alias); + } } if (!RenamedComdats.empty()) diff --git a/llvm/test/ThinLTO/X86/devirt2.ll b/llvm/test/ThinLTO/X86/devirt2.ll index 42c15f1c1df5..6501a01a39df 100644 --- a/llvm/test/ThinLTO/X86/devirt2.ll +++ b/llvm/test/ThinLTO/X86/devirt2.ll @@ -131,10 +131,12 @@ ; RUN: -r=%t1.o,_ZN1D1mEi, \ ; RUN: -r=%t1.o,test2, \ ; RUN: -r=%t2.o,_ZN1A1nEi,p \ +; RUN: -r=%t2.o,_ZN1A1nEi, \ ; RUN: -r=%t2.o,_ZN1B1fEi,p \ ; RUN: -r=%t2.o,_ZN1C1fEi,p \ ; RUN: -r=%t2.o,_ZN1D1mEi,p \ ; RUN: -r=%t2.o,_ZN1E1mEi,p \ +; RUN: -r=%t2.o,_ZN1E1mEi, \ ; RUN: -r=%t2.o,_ZTV1B, \ ; RUN: -r=%t2.o,_ZTV1C, \ ; RUN: -r=%t2.o,_ZTV1D, \ @@ -167,10 +169,12 @@ ; RUN: -r=%t1.o,_ZN1D1mEi, \ ; RUN: -r=%t1.o,test2, \ ; RUN: -r=%t2.o,_ZN1A1nEi,p \ +; RUN: -r=%t2.o,_ZN1A1nEi, \ ; RUN: -r=%t2.o,_ZN1B1fEi,p \ ; RUN: -r=%t2.o,_ZN1C1fEi,p \ ; RUN: -r=%t2.o,_ZN1D1mEi,p \ ; RUN: -r=%t2.o,_ZN1E1mEi,p \ +; RUN: -r=%t2.o,_ZN1E1mEi, \ ; RUN: -r=%t2.o,_ZTV1B, \ ; RUN: -r=%t2.o,_ZTV1C, \ ; RUN: -r=%t2.o,_ZTV1D, \ diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-icall-static-inline-asm.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-icall-static-inline-asm.ll new file mode 100644 index 000000000000..c2de21ed4562 --- /dev/null +++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-icall-static-inline-asm.ll @@ -0,0 +1,22 @@ +; REQUIRES: x86-registered-target +; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o - %s | llvm-modextract -b -n 0 -o - | llvm-dis | FileCheck %s + +target triple = "x86_64-unknown-linux-gnu" + +; CHECK: module asm ".set a,a.[[HASH:[0-9a-f]+]]" + +define void @b() { + %f = alloca void ()*, align 8 + ; CHECK: store{{.*}} @a.[[HASH]],{{.*}} %f + store void ()* @a, void ()** %f, align 8 + ; CHECK: %1 = call void ()* asm sideeffect "leaq a(%rip) + %1 = call void ()* asm sideeffect "leaq a(%rip), $0\0A\09", "=r,~{dirflag},~{fpsr},~{flags}"() + ret void +} + +; CHECK: define{{.*}} @a.[[HASH]](){{.*}} !type +define internal void @a() !type !0 { + ret void +} + +!0 = !{i64 0, !"typeid1"} diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll index 98cc80e557f9..f50fe3f93b08 100644 --- a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll +++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll @@ -1,3 +1,4 @@ +; REQUIRES: x86-registered-target ; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o %t %s ; RUN: llvm-modextract -b -n 0 -o %t0 %t ; RUN: llvm-modextract -b -n 1 -o %t1 %t @@ -7,6 +8,8 @@ ; RUN: llvm-bcanalyzer -dump %t0 | FileCheck --check-prefix=BCA0 %s ; RUN: llvm-bcanalyzer -dump %t1 | FileCheck --check-prefix=BCA1 %s +target triple = "x86_64-unknown-linux-gnu" + ; ERROR: llvm-modextract: error: module index out of range; bitcode file contains 2 module(s) ; BCA0: <GLOBALVAL_SUMMARY_BLOCK diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll index d17cbefb0fb1..0d67b74ca5fc 100644 --- a/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll +++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll @@ -1,7 +1,10 @@ +; REQUIRES: x86-registered-target ; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o %t %s ; RUN: llvm-modextract -b -n 0 -o - %t | llvm-dis | FileCheck --check-prefix=M0 %s ; RUN: llvm-modextract -b -n 1 -o - %t | llvm-dis | FileCheck --check-prefix=M1 %s +target triple = "x86_64-unknown-linux-gnu" + define [1 x i8*]* @source() { ret [1 x i8*]* @g } </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2 - Build # 17 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Culprit: <cut> commit d39d3a327b1303012370e47d991459ffbfce45ef Author: Peyton, Jonathan L <jonathan.l.peyton(a)intel.com> Date: Fri Aug 20 16:06:13 2021 -0500 [OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488 </cut> Results regressed to (for first_bad == d39d3a327b1303012370e47d991459ffbfce45ef) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-d39d3a327b1303012370e47d991459ffbfce45ef/results_id: 1 # 447.dealII,dealII_base.default regressed by 105 from (for last_good == f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4734 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4757 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-d39d3a327b1303012370e47d991459ffbfce45ef cd investigate-llvm-d39d3a327b1303012370e47d991459ffbfce45ef git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d39d3a327b1303012370e47d991459ffbfce45ef ../artifacts/test.sh # Reproduce last_good build git checkout --detach f77174d4b8cfba3c0a53c78e53edbbaf57e37fc5 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit d39d3a327b1303012370e47d991459ffbfce45ef Author: Peyton, Jonathan L <jonathan.l.peyton(a)intel.com> Date: Fri Aug 20 16:06:13 2021 -0500 [OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488 --- openmp/runtime/test/api/omp_get_wtime.c | 75 ++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 16 deletions(-) diff --git a/openmp/runtime/test/api/omp_get_wtime.c b/openmp/runtime/test/api/omp_get_wtime.c index e2bb211e0ce4..a862e07fc5a2 100644 --- a/openmp/runtime/test/api/omp_get_wtime.c +++ b/openmp/runtime/test/api/omp_get_wtime.c @@ -4,30 +4,73 @@ #include "omp_testsuite.h" #include "omp_my_sleep.h" -int test_omp_get_wtime() -{ +#define NTIMES 100 + +// This is the error % threshold. Be generous with the error threshold since +// this test may be run in parallel with many other tests it may throw off the +// sleep timing. +#define THRESHOLD 33.0 + +double test_omp_get_wtime(double desired_wait_time) { double start; double end; - double measured_time; - double wait_time = 0.2; start = 0; end = 0; start = omp_get_wtime(); - my_sleep (wait_time); + my_sleep(desired_wait_time); end = omp_get_wtime(); - measured_time = end-start; - return ((measured_time > 0.97 * wait_time) && (measured_time < 1.03 * wait_time)) ; + return end - start; } -int main() -{ - int i; - int num_failed=0; +int compare_times(const void *lhs, const void *rhs) { + const double *a = (const double *)lhs; + const double *b = (const double *)rhs; + return *a - *b; +} + +int main() { + int i, final_count; + double percent_off; + double *begin, *end, *ptr; + double wait_time = 0.01; + double average = 0.0; + double n = 0.0; + double *times = (double *)malloc(sizeof(double) * NTIMES); + + // Get each timing + for (i = 0; i < NTIMES; i++) { + times[i] = test_omp_get_wtime(wait_time); + } + + // Remove approx the "worst" tenth of the timings + qsort(times, NTIMES, sizeof(double), compare_times); + begin = times; + end = times + NTIMES; + for (i = 0; i < NTIMES / 10; ++i) { + if (i % 2 == 0) + begin++; + else + end--; + } + + // Get the average of the remaining timings + for (ptr = begin, final_count = 0; ptr != end; ++ptr, ++final_count) + average += times[i]; + average /= (double)final_count; + free(times); + + // Calculate the percent off of desired wait time + percent_off = (average - wait_time) / wait_time * 100.0; + // Should always be positive, but just in case + if (percent_off < 0) + percent_off = -percent_off; - for(i = 0; i < REPETITIONS; i++) { - if(!test_omp_get_wtime()) { - num_failed++; - } + if (percent_off > (double)THRESHOLD) { + fprintf(stderr, "error: average of %d runs (%lf) is of by %lf%%\n", NTIMES, + average, percent_off); + return EXIT_FAILURE; } - return num_failed; + printf("pass: average of %d runs (%lf) is only off by %lf%%\n", NTIMES, + average, percent_off); + return EXIT_SUCCESS; } </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-release-arm-spec2k6-Oz - Build # 4 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Oz Culprit: <cut> commit 876de062f94650f9ded56a22b062236f711fcd18 Author: Marius Brehler <marius.brehler(a)iml.fraunhofer.de> Date: Wed Jun 9 13:38:10 2021 +0000 [mlir] Add EmitC dialect This upstreams the EmitC dialect and the corresponding Cpp target, both initially presented with [1], from [2] to MLIR core. For the related discussion, see [3]. [1] https://reviews.llvm.org/D76571 [2] https://github.com/iml130/mlir-emitc [3] https://llvm.discourse.group/t/emitc-generating-c-c-from-mlir/3388 Co-authored-by: Jacques Pienaar <jpienaar(a)google.com> Co-authored-by: Simon Camphausen <simon.camphausen(a)iml.fraunhofer.de> Co-authored-by: Oliver Scherf <oliver.scherf(a)iml.fraunhofer.de> Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D103969 </cut> Results regressed to (for first_bad == 876de062f94650f9ded56a22b062236f711fcd18) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-876de062f94650f9ded56a22b062236f711fcd18/results_id: 1 # 470.lbm,lbm_base.default regressed by 107 from (for last_good == 1bd4085e0bbc14ec61ab69c83464098622b2df56) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-1bd4085e0bbc14ec61ab69c83464098622b2df56/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-release-arm-spec2k6-Oz/4752 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-release-arm-spec2k6-Oz/4718 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-876de062f94650f9ded56a22b062236f711fcd18 cd investigate-llvm-876de062f94650f9ded56a22b062236f711fcd18 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 876de062f94650f9ded56a22b062236f711fcd18 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1bd4085e0bbc14ec61ab69c83464098622b2df56 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Full commit (up to 1000 lines): <cut> commit 876de062f94650f9ded56a22b062236f711fcd18 Author: Marius Brehler <marius.brehler(a)iml.fraunhofer.de> Date: Wed Jun 9 13:38:10 2021 +0000 [mlir] Add EmitC dialect This upstreams the EmitC dialect and the corresponding Cpp target, both initially presented with [1], from [2] to MLIR core. For the related discussion, see [3]. [1] https://reviews.llvm.org/D76571 [2] https://github.com/iml130/mlir-emitc [3] https://llvm.discourse.group/t/emitc-generating-c-c-from-mlir/3388 Co-authored-by: Jacques Pienaar <jpienaar(a)google.com> Co-authored-by: Simon Camphausen <simon.camphausen(a)iml.fraunhofer.de> Co-authored-by: Oliver Scherf <oliver.scherf(a)iml.fraunhofer.de> Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D103969 --- mlir/include/mlir/Dialect/CMakeLists.txt | 1 + mlir/include/mlir/Dialect/EmitC/CMakeLists.txt | 1 + mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt | 7 + mlir/include/mlir/Dialect/EmitC/IR/EmitC.h | 32 ++++ mlir/include/mlir/Dialect/EmitC/IR/EmitC.td | 148 ++++++++++++++ .../mlir/Dialect/EmitC/IR/EmitCAttributes.td | 45 +++++ mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td | 28 +++ mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td | 46 +++++ mlir/include/mlir/InitAllDialects.h | 2 + mlir/lib/Dialect/CMakeLists.txt | 1 + mlir/lib/Dialect/EmitC/CMakeLists.txt | 1 + mlir/lib/Dialect/EmitC/IR/CMakeLists.txt | 14 ++ mlir/lib/Dialect/EmitC/IR/EmitC.cpp | 212 +++++++++++++++++++++ mlir/test/Dialect/EmitC/invalid_ops.mlir | 79 ++++++++ mlir/test/Dialect/EmitC/ops.mlir | 24 +++ mlir/test/Dialect/EmitC/types.mlir | 18 ++ mlir/test/mlir-opt/commandline.mlir | 1 + 17 files changed, 660 insertions(+) diff --git a/mlir/include/mlir/Dialect/CMakeLists.txt b/mlir/include/mlir/Dialect/CMakeLists.txt index 2d6d04a52a9d..44a9249cef83 100644 --- a/mlir/include/mlir/Dialect/CMakeLists.txt +++ b/mlir/include/mlir/Dialect/CMakeLists.txt @@ -5,6 +5,7 @@ add_subdirectory(ArmSVE) add_subdirectory(AMX) add_subdirectory(Complex) add_subdirectory(DLTI) +add_subdirectory(EmitC) add_subdirectory(GPU) add_subdirectory(Math) add_subdirectory(Linalg) diff --git a/mlir/include/mlir/Dialect/EmitC/CMakeLists.txt b/mlir/include/mlir/Dialect/EmitC/CMakeLists.txt new file mode 100644 index 000000000000..f33061b2d87c --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/CMakeLists.txt @@ -0,0 +1 @@ +add_subdirectory(IR) diff --git a/mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt b/mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt new file mode 100644 index 000000000000..09a9f7a2ec1c --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/CMakeLists.txt @@ -0,0 +1,7 @@ +add_mlir_dialect(EmitC emitc) +add_mlir_doc(EmitC EmitC Dialects/ -gen-dialect-doc) + +set(LLVM_TARGET_DEFINITIONS EmitCAttributes.td) +mlir_tablegen(EmitCAttributes.h.inc -gen-attrdef-decls) +mlir_tablegen(EmitCAttributes.cpp.inc -gen-attrdef-defs) +add_public_tablegen_target(MLIREmitCAttributesIncGen) diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h new file mode 100644 index 000000000000..857d1430f941 --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h @@ -0,0 +1,32 @@ +//===- EmitC.h - EmitC Dialect ----------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file declares EmitC in MLIR. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITC_H +#define MLIR_DIALECT_EMITC_IR_EMITC_H + +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/BuiltinTypes.h" +#include "mlir/IR/Dialect.h" +#include "mlir/Interfaces/SideEffectInterfaces.h" + +#include "mlir/Dialect/EmitC/IR/EmitCDialect.h.inc" + +#define GET_ATTRDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCAttributes.h.inc" + +#define GET_TYPEDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCTypes.h.inc" + +#define GET_OP_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitC.h.inc" + +#endif // MLIR_DIALECT_EMITC_IR_EMITC_H diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td new file mode 100644 index 000000000000..78c682a80671 --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td @@ -0,0 +1,148 @@ +//===- EmitC.td - EmitC operations--------------------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC operations. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITC +#define MLIR_DIALECT_EMITC_IR_EMITC + +include "mlir/Dialect/EmitC/IR/EmitCAttributes.td" +include "mlir/Dialect/EmitC/IR/EmitCTypes.td" + +include "mlir/Interfaces/SideEffectInterfaces.td" + +//===----------------------------------------------------------------------===// +// EmitC op definitions +//===----------------------------------------------------------------------===// + +// Base class for EmitC dialect ops. +class EmitC_Op<string mnemonic, list<OpTrait> traits = []> + : Op<EmitC_Dialect, mnemonic, traits> { + let verifier = "return ::verify(*this);"; +} + +def EmitC_ApplyOp : EmitC_Op<"apply", []> { + let summary = "Apply operation"; + let description = [{ + With the `apply` operation the operators & (address of) and * (contents of) + can be applied to a single operand. + + Example: + + ```mlir + // Custom form of applying the & operator. + %0 = emitc.apply "&"(%arg0) : (i32) -> !emitc.opaque<"int32_t*"> + + // Generic form of the same operation. + %0 = "emitc.apply"(%arg0) {applicableOperator = "&"} + : (i32) -> !emitc.opaque<"int32_t*"> + + ``` + }]; + let arguments = (ins + Arg<StrAttr, "the operator to apply">:$applicableOperator, + AnyType:$operand + ); + let results = (outs AnyType:$result); + let assemblyFormat = [{ + $applicableOperator `(` $operand `)` attr-dict `:` functional-type($operand, results) + }]; +} + +def EmitC_CallOp : EmitC_Op<"call", []> { + let summary = "Call operation"; + let description = [{ + The `call` operation represents a C++ function call. The call allows + specifying order of operands and attributes in the call as follows: + + - integer value of index type refers to an operand; + - attribute which will get lowered to constant value in call; + + Example: + + ```mlir + // Custom form defining a call to `foo()`. + %0 = emitc.call "foo" () : () -> i32 + + // Generic form of the same operation. + %0 = "emitc.call"() {callee = "foo"} : () -> i32 + ``` + }]; + let arguments = (ins + Arg<StrAttr, "the C++ function to call">:$callee, + Arg<OptionalAttr<ArrayAttr>, "the order of operands and further attributes">:$args, + Arg<OptionalAttr<ArrayAttr>, "template arguments">:$template_args, + Variadic<AnyType>:$operands + ); + let results = (outs Variadic<AnyType>); + let assemblyFormat = [{ + $callee `(` $operands `)` attr-dict `:` functional-type($operands, results) + }]; +} + +def EmitC_ConstantOp : EmitC_Op<"constant", [ConstantLike]> { + let summary = "Constant operation"; + let description = [{ + The `constant` operation produces an SSA value equal to some constant + specified by an attribute. This can be used to form simple integer and + floating point constants, as well as more exotic things like tensor + constants. The `constant` operation also supports the EmitC opaque + attribute and the EmitC opaque type. + + Example: + + ```mlir + // Integer constant + %0 = "emitc.constant"(){value = 42 : i32} : () -> i32 + + // Constant emitted as `int32_t* = NULL;` + %1 = "emitc.constant"() + {value = #emitc.opaque<"NULL"> : !emitc.opaque<"int32_t*">} + : () -> !emitc.opaque<"int32_t*"> + ``` + }]; + + let arguments = (ins AnyAttr:$value); + let results = (outs AnyType); + + let hasFolder = 1; +} + +def EmitC_IncludeOp + : EmitC_Op<"include", [NoSideEffect, HasParent<"ModuleOp">]> { + let summary = "Include operation"; + let description = [{ + The `include` operation allows to define a source file inclusion via the + `#include` directive. + + Example: + + ```mlir + // Custom form defining the inclusion of `<myheader>`. + emitc.include "myheader.h" is_standard_include + + // Generic form of the same operation. + "emitc.include" (){include = "myheader.h", is_standard_include} : () -> () + + // Generic form defining the inclusion of `"myheader"`. + "emitc.include" (){include = "myheader.h"} : () -> () + ``` + }]; + let arguments = (ins + Arg<StrAttr, "source file to include">:$include, + UnitAttr:$is_standard_include + ); + let assemblyFormat = [{ + $include attr-dict (`is_standard_include` $is_standard_include^)? + }]; + let verifier = ?; +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITC diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitCAttributes.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitCAttributes.td new file mode 100644 index 000000000000..2dd782ba49bf --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitCAttributes.td @@ -0,0 +1,45 @@ +//===- EmitCAttributes.td - EmitC attributes ---------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC attributes. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITCATTRIBUTES +#define MLIR_DIALECT_EMITC_IR_EMITCATTRIBUTES + +include "mlir/Dialect/EmitC/IR/EmitCBase.td" + +//===----------------------------------------------------------------------===// +// EmitC attribute definitions +//===----------------------------------------------------------------------===// + +class EmitC_Attr<string name, string attrMnemonic> + : AttrDef<EmitC_Dialect, name> { + let mnemonic = attrMnemonic; +} + +def EmitC_OpaqueAttr : EmitC_Attr<"Opaque", "opaque"> { + let summary = "An opaque attribute"; + + let description = [{ + An opaque attribute of which the value gets emitted as is. + + Example: + + ```mlir + #emitc.opaque<""> + #emitc.opaque<"NULL"> + #emitc.opaque<"nullptr"> + ``` + }]; + + let parameters = (ins StringRefParameter<"the opaque value">:$value); +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITCATTRIBUTES diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td new file mode 100644 index 000000000000..5b7e81e2833a --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitCBase.td @@ -0,0 +1,28 @@ +//===- EmitCBase.td - EmitC dialect ------------------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC dialect. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_DIALECT_EMITC_IR_EMITCBASE +#define MLIR_DIALECT_EMITC_IR_EMITCBASE + +include "mlir/IR/OpBase.td" + +//===----------------------------------------------------------------------===// +// EmitC dialect definition +//===----------------------------------------------------------------------===// + +def EmitC_Dialect : Dialect { + let name = "emitc"; + let cppNamespace = "::mlir::emitc"; + let hasConstantMaterializer = 1; +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITCBASE diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td new file mode 100644 index 000000000000..d6fdd0fbf82d --- /dev/null +++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitCTypes.td @@ -0,0 +1,46 @@ +//===- EmitCTypes.td - EmitC types -------------------------*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines the MLIR EmitC types. +// +//===----------------------------------------------------------------------===// + + +#ifndef MLIR_DIALECT_EMITC_IR_EMITCTYPES +#define MLIR_DIALECT_EMITC_IR_EMITCTYPES + +include "mlir/Dialect/EmitC/IR/EmitCBase.td" + +//===----------------------------------------------------------------------===// +// EmitC type definitions +//===----------------------------------------------------------------------===// + +class EmitC_Type<string name, string typeMnemonic> + : TypeDef<EmitC_Dialect, name> { + let mnemonic = typeMnemonic; +} + +def EmitC_OpaqueType : EmitC_Type<"Opaque", "opaque"> { + let summary = "An opaque type"; + + let description = [{ + An opaque data type of which the value gets emitted as is. + + Example: + + ```mlir + !emitc.opaque<"int"> + !emitc.opaque<"float *"> + !emitc.opaque<"std::vector<std::string>"> + ``` + }]; + + let parameters = (ins StringRefParameter<"the opaque value">:$value); +} + +#endif // MLIR_DIALECT_EMITC_IR_EMITCTYPES diff --git a/mlir/include/mlir/InitAllDialects.h b/mlir/include/mlir/InitAllDialects.h index e44f2e8f1ae0..c52dae3fd1b5 100644 --- a/mlir/include/mlir/InitAllDialects.h +++ b/mlir/include/mlir/InitAllDialects.h @@ -21,6 +21,7 @@ #include "mlir/Dialect/Async/IR/Async.h" #include "mlir/Dialect/Complex/IR/Complex.h" #include "mlir/Dialect/DLTI/DLTI.h" +#include "mlir/Dialect/EmitC/IR/EmitC.h" #include "mlir/Dialect/GPU/GPUDialect.h" #include "mlir/Dialect/LLVMIR/LLVMDialect.h" #include "mlir/Dialect/LLVMIR/NVVMDialect.h" @@ -57,6 +58,7 @@ inline void registerAllDialects(DialectRegistry &registry) { async::AsyncDialect, complex::ComplexDialect, DLTIDialect, + emitc::EmitCDialect, gpu::GPUDialect, LLVM::LLVMDialect, linalg::LinalgDialect, diff --git a/mlir/lib/Dialect/CMakeLists.txt b/mlir/lib/Dialect/CMakeLists.txt index f5124f7d138f..de946beef0d9 100644 --- a/mlir/lib/Dialect/CMakeLists.txt +++ b/mlir/lib/Dialect/CMakeLists.txt @@ -5,6 +5,7 @@ add_subdirectory(Async) add_subdirectory(AMX) add_subdirectory(Complex) add_subdirectory(DLTI) +add_subdirectory(EmitC) add_subdirectory(GPU) add_subdirectory(Linalg) add_subdirectory(LLVMIR) diff --git a/mlir/lib/Dialect/EmitC/CMakeLists.txt b/mlir/lib/Dialect/EmitC/CMakeLists.txt new file mode 100644 index 000000000000..f33061b2d87c --- /dev/null +++ b/mlir/lib/Dialect/EmitC/CMakeLists.txt @@ -0,0 +1 @@ +add_subdirectory(IR) diff --git a/mlir/lib/Dialect/EmitC/IR/CMakeLists.txt b/mlir/lib/Dialect/EmitC/IR/CMakeLists.txt new file mode 100644 index 000000000000..6283441fdadf --- /dev/null +++ b/mlir/lib/Dialect/EmitC/IR/CMakeLists.txt @@ -0,0 +1,14 @@ +add_mlir_dialect_library(MLIREmitC + EmitC.cpp + + ADDITIONAL_HEADER_DIRS + ${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/EmitC + + DEPENDS + MLIREmitCIncGen + MLIREmitCAttributesIncGen + + LINK_LIBS PUBLIC + MLIRIR + MLIRSideEffectInterfaces + ) diff --git a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp new file mode 100644 index 000000000000..364c247f75e4 --- /dev/null +++ b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp @@ -0,0 +1,212 @@ +//===- EmitC.cpp - EmitC Dialect ------------------------------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "mlir/Dialect/EmitC/IR/EmitC.h" +#include "mlir/IR/Builders.h" +#include "mlir/IR/DialectImplementation.h" +#include "llvm/ADT/TypeSwitch.h" + +using namespace mlir; +using namespace mlir::emitc; + +//===----------------------------------------------------------------------===// +// EmitCDialect +//===----------------------------------------------------------------------===// + +void EmitCDialect::initialize() { + addOperations< +#define GET_OP_LIST +#include "mlir/Dialect/EmitC/IR/EmitC.cpp.inc" + >(); + addTypes< +#define GET_TYPEDEF_LIST +#include "mlir/Dialect/EmitC/IR/EmitCTypes.cpp.inc" + >(); + addAttributes< +#define GET_ATTRDEF_LIST +#include "mlir/Dialect/EmitC/IR/EmitCAttributes.cpp.inc" + >(); +} + +/// Materialize a single constant operation from a given attribute value with +/// the desired resultant type. +Operation *EmitCDialect::materializeConstant(OpBuilder &builder, + Attribute value, Type type, + Location loc) { + return builder.create<ConstantOp>(loc, type, value); +} + +//===----------------------------------------------------------------------===// +// ApplyOp +//===----------------------------------------------------------------------===// + +static LogicalResult verify(ApplyOp op) { + StringRef applicableOperator = op.applicableOperator(); + + // Applicable operator must not be empty. + if (applicableOperator.empty()) + return op.emitOpError("applicable operator must not be empty"); + + // Only `*` and `&` are supported. + if (applicableOperator != "&" && applicableOperator != "*") + return op.emitOpError("applicable operator is illegal"); + + return success(); +} + +//===----------------------------------------------------------------------===// +// CallOp +//===----------------------------------------------------------------------===// + +static LogicalResult verify(emitc::CallOp op) { + // Callee must not be empty. + if (op.callee().empty()) + return op.emitOpError("callee must not be empty"); + + if (Optional<ArrayAttr> argsAttr = op.args()) { + for (Attribute arg : argsAttr.getValue()) { + if (arg.getType().isa<IndexType>()) { + int64_t index = arg.cast<IntegerAttr>().getInt(); + // Args with elements of type index must be in range + // [0..operands.size). + if ((index < 0) || (index >= static_cast<int64_t>(op.getNumOperands()))) + return op.emitOpError("index argument is out of range"); + + // Args with elements of type ArrayAttr must have a type. + } else if (arg.isa<ArrayAttr>() && arg.getType().isa<NoneType>()) { + return op.emitOpError("array argument has no type"); + } + } + } + + if (Optional<ArrayAttr> templateArgsAttr = op.template_args()) { + for (Attribute tArg : templateArgsAttr.getValue()) { + if (!tArg.isa<TypeAttr>() && !tArg.isa<IntegerAttr>() && + !tArg.isa<FloatAttr>() && !tArg.isa<emitc::OpaqueAttr>()) + return op.emitOpError("template argument has invalid type"); + } + } + + return success(); +} + +//===----------------------------------------------------------------------===// +// ConstantOp +//===----------------------------------------------------------------------===// + +/// The constant op requires that the attribute's type matches the return type. +static LogicalResult verify(emitc::ConstantOp &op) { + Attribute value = op.value(); + Type type = op.getType(); + if (!value.getType().isa<NoneType>() && type != value.getType()) + return op.emitOpError() << "requires attribute's type (" << value.getType() + << ") to match op's return type (" << type << ")"; + return success(); +} + +OpFoldResult emitc::ConstantOp::fold(ArrayRef<Attribute> operands) { + assert(operands.empty() && "constant has no operands"); + return value(); +} + +//===----------------------------------------------------------------------===// +// TableGen'd op method definitions +//===----------------------------------------------------------------------===// + +#define GET_OP_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitC.cpp.inc" + +//===----------------------------------------------------------------------===// +// EmitC Attributes +//===----------------------------------------------------------------------===// + +#define GET_ATTRDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCAttributes.cpp.inc" + +Attribute emitc::OpaqueAttr::parse(MLIRContext *context, + DialectAsmParser &parser, Type type) { + if (parser.parseLess()) + return Attribute(); + StringRef value; + llvm::SMLoc loc = parser.getCurrentLocation(); + if (parser.parseOptionalString(&value)) { + parser.emitError(loc) << "expected string"; + return Attribute(); + } + if (parser.parseGreater()) + return Attribute(); + return get(context, value); +} + +Attribute EmitCDialect::parseAttribute(DialectAsmParser &parser, + Type type) const { + llvm::SMLoc typeLoc = parser.getCurrentLocation(); + StringRef mnemonic; + if (parser.parseKeyword(&mnemonic)) + return Attribute(); + Attribute genAttr; + OptionalParseResult parseResult = + generatedAttributeParser(getContext(), parser, mnemonic, type, genAttr); + if (parseResult.hasValue()) + return genAttr; + parser.emitError(typeLoc, "unknown attribute in EmitC dialect"); + return Attribute(); +} + +void EmitCDialect::printAttribute(Attribute attr, DialectAsmPrinter &os) const { + if (failed(generatedAttributePrinter(attr, os))) + llvm_unreachable("unexpected 'EmitC' attribute kind"); +} + +void emitc::OpaqueAttr::print(DialectAsmPrinter &printer) const { + printer << "opaque<\"" << getValue() << "\">"; +} + +//===----------------------------------------------------------------------===// +// EmitC Types +//===----------------------------------------------------------------------===// + +#define GET_TYPEDEF_CLASSES +#include "mlir/Dialect/EmitC/IR/EmitCTypes.cpp.inc" + +Type emitc::OpaqueType::parse(MLIRContext *context, DialectAsmParser &parser) { + if (parser.parseLess()) + return Type(); + StringRef value; + llvm::SMLoc loc = parser.getCurrentLocation(); + if (parser.parseOptionalString(&value) || value.empty()) { + parser.emitError(loc) << "expected non empty string"; + return Type(); + } + if (parser.parseGreater()) + return Type(); + return get(context, value); +} + +Type EmitCDialect::parseType(DialectAsmParser &parser) const { + llvm::SMLoc typeLoc = parser.getCurrentLocation(); + StringRef mnemonic; + if (parser.parseKeyword(&mnemonic)) + return Type(); + Type genType; + OptionalParseResult parseResult = + generatedTypeParser(getContext(), parser, mnemonic, genType); + if (parseResult.hasValue()) + return genType; + parser.emitError(typeLoc, "unknown type in EmitC dialect"); + return Type(); +} + +void EmitCDialect::printType(Type type, DialectAsmPrinter &os) const { + if (failed(generatedTypePrinter(type, os))) + llvm_unreachable("unexpected 'EmitC' type kind"); +} + +void emitc::OpaqueType::print(DialectAsmPrinter &printer) const { + printer << "opaque<\"" << getValue() << "\">"; +} diff --git a/mlir/test/Dialect/EmitC/invalid_ops.mlir b/mlir/test/Dialect/EmitC/invalid_ops.mlir new file mode 100644 index 000000000000..e86664627c36 --- /dev/null +++ b/mlir/test/Dialect/EmitC/invalid_ops.mlir @@ -0,0 +1,79 @@ +// RUN: mlir-opt %s -split-input-file -verify-diagnostics + +func @const_attribute_return_type_1() { + // expected-error @+1 {{'emitc.constant' op requires attribute's type ('i64') to match op's return type ('i32')}} + %c0 = "emitc.constant"(){value = 42: i64} : () -> i32 + return +} + +// ----- + +func @const_attribute_return_type_2() { + // expected-error @+1 {{'emitc.constant' op requires attribute's type ('!emitc.opaque<"int32_t*">') to match op's return type ('!emitc.opaque<"int32_t">')}} + %c0 = "emitc.constant"(){value = "nullptr" : !emitc.opaque<"int32_t*">} : () -> !emitc.opaque<"int32_t"> + return +} + +// ----- + +func @index_args_out_of_range_1() { + // expected-error @+1 {{'emitc.call' op index argument is out of range}} + emitc.call "test" () {args = [0 : index]} : () -> () + return +} + +// ----- + +func @index_args_out_of_range_2(%arg : i32) { + // expected-error @+1 {{'emitc.call' op index argument is out of range}} + emitc.call "test" (%arg, %arg) {args = [2 : index]} : (i32, i32) -> () + return +} + +// ----- + +func @empty_callee() { + // expected-error @+1 {{'emitc.call' op callee must not be empty}} + emitc.call "" () : () -> () + return +} + +// ----- + +func @nonetype_arg(%arg : i32) { + // expected-error @+1 {{'emitc.call' op array argument has no type}} + emitc.call "nonetype_arg"(%arg) {args = [0 : index, [0, 1, 2]]} : (i32) -> i32 + return +} + +// ----- + +func @array_template_arg(%arg : i32) { + // expected-error @+1 {{'emitc.call' op template argument has invalid type}} + emitc.call "nonetype_template_arg"(%arg) {template_args = [[0, 1, 2]]} : (i32) -> i32 + return +} + +// ----- + +func @dense_template_argument(%arg : i32) { + // expected-error @+1 {{'emitc.call' op template argument has invalid type}} + emitc.call "dense_template_argument"(%arg) {template_args = [dense<[1.0, 1.0]> : tensor<2xf32>]} : (i32) -> i32 + return +} + +// ----- + +func @empty_operator(%arg : i32) { + // expected-error @+1 {{'emitc.apply' op applicable operator must not be empty}} + %2 = emitc.apply ""(%arg) : (i32) -> !emitc.opaque<"int32_t*"> + return +} + +// ----- + +func @illegal_operator(%arg : i32) { + // expected-error @+1 {{'emitc.apply' op applicable operator is illegal}} + %2 = emitc.apply "+"(%arg) : (i32) -> !emitc.opaque<"int32_t*"> + return +} diff --git a/mlir/test/Dialect/EmitC/ops.mlir b/mlir/test/Dialect/EmitC/ops.mlir new file mode 100644 index 000000000000..3a48ff447e1c --- /dev/null +++ b/mlir/test/Dialect/EmitC/ops.mlir @@ -0,0 +1,24 @@ +// RUN: mlir-opt -verify-diagnostics %s | FileCheck %s + +"emitc.include" (){include = "test.h", is_standard_include} : () -> () +emitc.include "test.h" is_standard_include + +// CHECK-LABEL: func @f(%{{.*}}: i32, %{{.*}}: !emitc.opaque<"int32_t">) { +func @f(%arg0: i32, %f: !emitc.opaque<"int32_t">) { + %1 = "emitc.call"() {callee = "blah"} : () -> i64 + emitc.call "foo" (%1) {args = [ + 0 : index, dense<[0, 1]> : tensor<2xi32>, 0 : index + ]} : (i64) -> () + return +} + +func @c(%arg0: i32) { + %1 = "emitc.constant"(){value = 42 : i32} : () -> i32 + return +} + +func @a(%arg0: i32, %arg1: i32) { + %1 = "emitc.apply"(%arg0) {applicableOperator = "&"} : (i32) -> !emitc.opaque<"int32_t*"> + %2 = emitc.apply "&"(%arg1) : (i32) -> !emitc.opaque<"int32_t*"> + return +} diff --git a/mlir/test/Dialect/EmitC/types.mlir b/mlir/test/Dialect/EmitC/types.mlir new file mode 100644 index 000000000000..f1ffce74e4c2 --- /dev/null +++ b/mlir/test/Dialect/EmitC/types.mlir @@ -0,0 +1,18 @@ +// RUN: mlir-opt -verify-diagnostics %s | FileCheck %s +// check parser +// RUN: mlir-opt -verify-diagnostics %s | mlir-opt -verify-diagnostics | FileCheck %s + +// CHECK-LABEL: func @opaque_types() { +func @opaque_types() { + // CHECK-NEXT: !emitc.opaque<"int"> + emitc.call "f"() {args = [!emitc<"opaque<\"int\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"byte"> + emitc.call "f"() {args = [!emitc<"opaque<\"byte\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"unsigned"> + emitc.call "f"() {args = [!emitc<"opaque<\"unsigned\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"status_t"> + emitc.call "f"() {args = [!emitc<"opaque<\"status_t\">">]} : () -> () + // CHECK-NEXT: !emitc.opaque<"std::vector<std::string>"> + emitc.call "f"() {args = [!emitc.opaque<"std::vector<std::string>">]} : () -> () + return +} diff --git a/mlir/test/mlir-opt/commandline.mlir b/mlir/test/mlir-opt/commandline.mlir index 125d6b1d950b..95c476a84163 100644 --- a/mlir/test/mlir-opt/commandline.mlir +++ b/mlir/test/mlir-opt/commandline.mlir @@ -8,6 +8,7 @@ // CHECK-NEXT: async // CHECK-NEXT: complex // CHECK-NEXT: dlti +// CHECK-NEXT: emitc // CHECK-NEXT: gpu // CHECK-NEXT: linalg // CHECK-NEXT: llvm </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3_LTO - Build # 35 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 7a3248463c2095ba112a31809f2965d04bed03b3 Author: Mark Eggleston <markeggleston(a)gcc.gnu.org> Date: Mon Oct 7 09:13:16 2019 +0000 Delete auto-in_equiv.f90 forgot to use svn delete the first time. From-SVN: r276651 </cut> Results regressed to (for first_bad == 7a3248463c2095ba112a31809f2965d04bed03b3) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-7a3248463c2095ba112a31809f2965d04bed03b3/results_id: 1 # 436.cactusADM,cactusADM_base.default regressed by 104 from (for last_good == 9b0365879b3c4917f5a2485a1fca8bb678484bfe) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-9b0365879b3c4917f5a2485a1fca8bb678484bfe/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4731 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4733 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-7a3248463c2095ba112a31809f2965d04bed03b3 cd investigate-gcc-7a3248463c2095ba112a31809f2965d04bed03b3 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 7a3248463c2095ba112a31809f2965d04bed03b3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b0365879b3c4917f5a2485a1fca8bb678484bfe ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit 7a3248463c2095ba112a31809f2965d04bed03b3 Author: Mark Eggleston <markeggleston(a)gcc.gnu.org> Date: Mon Oct 7 09:13:16 2019 +0000 Delete auto-in_equiv.f90 forgot to use svn delete the first time. From-SVN: r276651 --- gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 | 63 --------------------------- 1 file changed, 63 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 b/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 deleted file mode 100644 index 57c384d1772..00000000000 --- a/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 +++ /dev/null @@ -1,63 +0,0 @@ -! { dg-do run } -! { dg-options "-fdec-static -fno-automatic" } - -! Contributed by Mark Eggleston <mark.eggleston(a)codethink.com> - -! Storage is NOT on the static unless explicitly specified using the -! DEC extension "automatic". The address of the first local variable -! is used to determine that storage for the automatic local variable -! is different to that of a local variable with no attributes. The -! contents of the local variable in suba should be overwritten by the -! call to subb. -! -program test - integer :: dummy - integer, parameter :: address = kind(loc(dummy)) - integer(address) :: ad1 - integer(address) :: ad2 - integer(address) :: ad3 - logical :: ok - - call suba(0, ad1) - call subb(0, ad2) - call suba(1, ad1) - call subc(0, ad3) - ok = (ad1.eq.ad3).and.(ad1.ne.ad2) - if (.not.ok) stop 4 - -contains - subroutine suba(option, addr) - integer, intent(in) :: option - integer(address), intent(out) :: addr - integer, automatic :: a - integer :: b - equivalence (a, b) - addr = loc(a) - if (option.eq.0) then - ! initialise a and c - a = 9 - if (a.ne.b) stop 1 - if (loc(a).ne.loc(b)) stop 2 - else - ! a should've been overwritten - if (a.eq.9) stop 3 - end if - end subroutine suba - - subroutine subb(dummy, addr) - integer, intent(in) :: dummy - integer(address), intent(out) :: addr - integer :: x - addr = loc(x) - x = 77 - end subroutine subb - - subroutine subc(dummy, addr) - integer, intent(in) :: dummy - integer(address), intent(out) :: addr - integer, automatic :: y - addr = loc(y) - y = 77 - end subroutine subc - -end program test </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 21 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 4aeeb91a9249282231cdd35773c17110e05a870d Author: MaheshRavishankar <ravishankarm(a)google.com> Date: Mon Aug 23 10:15:35 2021 -0700 [mlir][Linalg] Allow all build methods of Structured ops to specify additional attributes. Differential Revision: https://reviews.llvm.org/D108338 </cut> Results regressed to (for first_bad == 4aeeb91a9249282231cdd35773c17110e05a870d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-4aeeb91a9249282231cdd35773c17110e05a870d/results_id: 1 # 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 111 from (for last_good == 19dc02e99f802922a3af69e802465bee0723b57a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-19dc02e99f802922a3af69e802465bee0723b57a/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4687 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4669 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4aeeb91a9249282231cdd35773c17110e05a870d cd investigate-llvm-4aeeb91a9249282231cdd35773c17110e05a870d git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4aeeb91a9249282231cdd35773c17110e05a870d ../artifacts/test.sh # Reproduce last_good build git checkout --detach 19dc02e99f802922a3af69e802465bee0723b57a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4aeeb91a9249282231cdd35773c17110e05a870d Author: MaheshRavishankar <ravishankarm(a)google.com> Date: Mon Aug 23 10:15:35 2021 -0700 [mlir][Linalg] Allow all build methods of Structured ops to specify additional attributes. Differential Revision: https://reviews.llvm.org/D108338 --- .../mlir/Dialect/Linalg/IR/LinalgStructuredOps.td | 12 ++++++++---- mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 19 ++++++++++++------- mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc | 3 ++- .../tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp | 12 +++++++++--- .../mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp | 8 ++++++-- 5 files changed, 37 insertions(+), 17 deletions(-) diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td index 33f4992e41f9..1d4e6d546067 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td @@ -620,18 +620,22 @@ def GenericOp : LinalgStructuredBase_Op<"generic", [ "ValueRange":$outputs, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, "StringRef":$doc, "StringRef":$libraryCall, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)>, + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>, OpBuilder<(ins "ValueRange":$inputs, "ValueRange":$outputBuffers, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, "StringRef":$doc, "StringRef":$libraryCall, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)>, + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>, OpBuilder<(ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, "ValueRange":$outputs, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)>, + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>, OpBuilder<(ins "ValueRange":$inputs, "ValueRange":$outputBuffers, "ArrayRef<AffineMap>":$indexingMaps, "ArrayRef<StringRef>":$iteratorTypes, - CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">)> + CArg<"function_ref<void(OpBuilder &, Location, ValueRange)>", "nullptr">, + CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)> ]; let extraClassDeclaration = structuredOpsBaseDecls # [{ diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp index 6b65a9ecd9e5..f4750ca390a8 100644 --- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp +++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp @@ -502,13 +502,15 @@ void GenericOp::build( OpBuilder &builder, OperationState &result, TypeRange resultTensorTypes, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, StringRef doc, StringRef libraryCall, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, resultTensorTypes, inputs, outputs, builder.getAffineMapArrayAttr(indexingMaps), builder.getStrArrayAttr(iteratorTypes), doc.empty() ? StringAttr() : builder.getStringAttr(doc), libraryCall.empty() ? StringAttr() : builder.getStringAttr(libraryCall)); + result.addAttributes(attributes); if (!bodyBuild) return; @@ -527,30 +529,33 @@ void GenericOp::build( OpBuilder &builder, OperationState &result, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, StringRef doc, StringRef libraryCall, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, TypeRange{}, inputs, outputs, indexingMaps, - iteratorTypes, doc, libraryCall, bodyBuild); + iteratorTypes, doc, libraryCall, bodyBuild, attributes); } void GenericOp::build( OpBuilder &builder, OperationState &result, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, inputs, outputs, indexingMaps, iteratorTypes, /*doc=*/"", - /*libraryCall=*/"", bodyBuild); + /*libraryCall=*/"", bodyBuild, attributes); } void GenericOp::build( OpBuilder &builder, OperationState &result, TypeRange resultTensorTypes, ValueRange inputs, ValueRange outputs, ArrayRef<AffineMap> indexingMaps, ArrayRef<StringRef> iteratorTypes, - function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild) { + function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuild, + ArrayRef<NamedAttribute> attributes) { build(builder, result, resultTensorTypes, inputs, outputs, indexingMaps, iteratorTypes, /*doc=*/"", - /*libraryCall=*/"", bodyBuild); + /*libraryCall=*/"", bodyBuild, attributes); } static void print(OpAsmPrinter &p, GenericOp op) { diff --git a/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc b/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc index 471961f837bf..743bdbdb12d6 100644 --- a/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc +++ b/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc @@ -169,7 +169,8 @@ It has one output. // ODS-LABEL: def Test7Op // ODS: OpBuilder< // ODS: (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, -// ODS: "ValueRange":$outputs, "Attribute":$attr_a, "Attribute":$attr_b) +// ODS: "ValueRange":$outputs, "Attribute":$attr_a, "Attribute":$attr_b, +// ODS: CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes) // ODS: $_state.addAttribute("attr_a", attr_a); // ODS: $_state.addAttribute("attr_b", attr_b); // diff --git a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp index 1bdb5b8806d0..590f17fdedfa 100644 --- a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp +++ b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp @@ -1910,7 +1910,8 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, let skipDefaultBuilders = 1; let builders = [ OpBuilder< - (ins "ValueRange":$inputs, "ValueRange":$outputs), + (ins "ValueRange":$inputs, "ValueRange":$outputs, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -1919,6 +1920,7 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, @@ -1927,7 +1929,8 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, }]>, OpBuilder< (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, - "ValueRange":$outputs), + "ValueRange":$outputs, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -1937,6 +1940,7 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, @@ -2020,7 +2024,8 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, const char *builderFmt = R"FMT( , OpBuilder< (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, - "ValueRange":$outputs, {1}), + "ValueRange":$outputs, {1}, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -2030,6 +2035,7 @@ void TCParser::printODS(llvm::raw_ostream &os, StringRef cppOpName, $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, diff --git a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp index a0eb1dea8860..98e90b69d631 100644 --- a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp +++ b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp @@ -457,7 +457,8 @@ def {0} : LinalgStructuredBase_Op<"{1}", !listconcat([ let skipDefaultBuilders = 1; let builders = [ OpBuilder< - (ins "ValueRange":$inputs, "ValueRange":$outputs), + (ins "ValueRange":$inputs, "ValueRange":$outputs, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -471,6 +472,7 @@ def {0} : LinalgStructuredBase_Op<"{1}", !listconcat([ $_builder.getI32VectorAttr({{ static_cast<int32_t>(inputs.size()), static_cast<int32_t>(outputs.size())})); + $_state.addAttributes(attributes); createAndFillStructuredOpRegion<{0}>( $_builder, $_state, @@ -539,7 +541,8 @@ def {0} : LinalgStructuredBase_Op<"{1}", !listconcat([ static const char structuredOpBuilderFormat[] = R"FMT( , OpBuilder< (ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs, - "ValueRange":$outputs, {1}), + "ValueRange":$outputs, {1}, + CArg<"ArrayRef<NamedAttribute>", "{{}">:$attributes), [{{ $_state.addOperands(inputs); $_state.addOperands(outputs); @@ -555,6 +558,7 @@ static const char structuredOpBuilderFormat[] = R"FMT( TypeRange(inputs), TypeRange(outputs)); {2} + $_state.addAttributes(attributes); }]> )FMT"; </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3_LTO - Build # 34 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b Author: Ian Lance Taylor <ian(a)gcc.gnu.org> Date: Fri Oct 4 13:50:07 2019 +0000 compiler: adjust code to avoid shadowing local variables Also add a couple of missing calls to free after mpz_get_str. This should make the code clean with respect to -Wshadow=local. Based on patch by Bernd Edlinger. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/198837 From-SVN: r276579 </cut> Results regressed to (for first_bad == 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 0ced79bc4c9925c574177cb6345c26e4aad4155f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-0ced79bc4c9925c574177cb6345c26e4aad4155f/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4662 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4665 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b cd investigate-gcc-3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0ced79bc4c9925c574177cb6345c26e4aad4155f ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit 3694418a6d57c5b48383af8a5c6d1b1c2e3cec9b Author: Ian Lance Taylor <ian(a)gcc.gnu.org> Date: Fri Oct 4 13:50:07 2019 +0000 compiler: adjust code to avoid shadowing local variables Also add a couple of missing calls to free after mpz_get_str. This should make the code clean with respect to -Wshadow=local. Based on patch by Bernd Edlinger. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/198837 From-SVN: r276579 --- gcc/go/gofrontend/MERGE | 2 +- gcc/go/gofrontend/ast-dump.cc | 8 +++--- gcc/go/gofrontend/escape.cc | 1 - gcc/go/gofrontend/expressions.cc | 54 ++++++++++++++++++---------------------- gcc/go/gofrontend/gogo.cc | 28 ++++++++++----------- gcc/go/gofrontend/parse.cc | 26 +++++++++---------- gcc/go/gofrontend/statements.cc | 17 +++++++------ gcc/go/gofrontend/types.cc | 40 ++++++++++++++--------------- 8 files changed, 85 insertions(+), 91 deletions(-) diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 54c682a8b78..bb509943d6e 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -a3aef6b6df932ea6c7094d074695bc0b033a3d17 +441f3f1f350b532707c48273d7f454cf1c4e959f The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/gcc/go/gofrontend/ast-dump.cc b/gcc/go/gofrontend/ast-dump.cc index b20f7e4e725..a3cbda9debc 100644 --- a/gcc/go/gofrontend/ast-dump.cc +++ b/gcc/go/gofrontend/ast-dump.cc @@ -135,11 +135,11 @@ Ast_dump_traverse_blocks_and_functions::function(Named_object* no) { if (it != res->begin()) this->ast_dump_context_->ostream() << ","; - Named_object* no = (*it); + Named_object* rno = (*it); - this->ast_dump_context_->ostream() << no->name() << " "; - go_assert(no->is_result_variable()); - Result_variable* resvar = no->result_var_value(); + this->ast_dump_context_->ostream() << rno->name() << " "; + go_assert(rno->is_result_variable()); + Result_variable* resvar = rno->result_var_value(); this->ast_dump_context_->dump_type(resvar->type()); diff --git a/gcc/go/gofrontend/escape.cc b/gcc/go/gofrontend/escape.cc index bfd1a39d7e4..f8e07f73cd2 100644 --- a/gcc/go/gofrontend/escape.cc +++ b/gcc/go/gofrontend/escape.cc @@ -1541,7 +1541,6 @@ Escape_analysis_assign::expression(Expression** pexpr) if (debug_level > 1) { - Node* n = Node::make_node(*pexpr); std::string fn_name = this->context_->current_function_name(); go_debug((*pexpr)->location(), "[%d] %s esc: %s", this->context_->loop_depth(), fn_name.c_str(), diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc index a72ba243f37..9babc348595 100644 --- a/gcc/go/gofrontend/expressions.cc +++ b/gcc/go/gofrontend/expressions.cc @@ -4104,9 +4104,11 @@ Type_conversion_expression::do_get_backend(Translate_context* context) x = mpz_get_ui(intval); else { - char* s = mpz_get_str(NULL, 16, intval); + char* ms = mpz_get_str(NULL, 16, intval); go_warning_at(loc, 0, - "unicode code point 0x%s out of range in string", s); + "unicode code point 0x%s out of range in string", + ms); + free(ms); x = 0xfffd; } Lex::append_char(x, true, &s, loc); @@ -8016,14 +8018,14 @@ Bound_method_expression::do_flatten(Gogo* gogo, Named_object*, Expression* ret = Expression::make_struct_composite_literal(st, vals, loc); ret = Expression::make_heap_expression(ret, loc); - Node* n = Node::make_node(this); - if ((n->encoding() & ESCAPE_MASK) == Node::ESCAPE_NONE) + Node* node = Node::make_node(this); + if ((node->encoding() & ESCAPE_MASK) == Node::ESCAPE_NONE) ret->heap_expression()->set_allocate_on_stack(); else if (gogo->compiling_runtime() && gogo->package_name() == "runtime" && !saw_errors()) go_error_at(loc, "%s escapes to heap, not allowed in runtime", - n->ast_format(gogo).c_str()); + node->ast_format(gogo).c_str()); // If necessary, check whether the expression or any embedded // pointers are nil. @@ -8741,8 +8743,6 @@ Builtin_call_expression::lower_make(Statement_inserter* inserter) Expression::make_nil(loc)); else { - Numeric_constant nclen; - unsigned long vlen; if (len_arg->numeric_constant_value(&nclen) && nclen.to_unsigned_long(&vlen) == Numeric_constant::NC_UL_VALID && vlen <= Map_type::bucket_size) @@ -9053,8 +9053,7 @@ Builtin_call_expression::flatten_append(Gogo* gogo, Named_object* function, else { Type* int32_type = Type::lookup_integer_type("int32"); - Expression* zero = - Expression::make_integer_ul(0, int32_type, loc); + zero = Expression::make_integer_ul(0, int32_type, loc); call = Runtime::make_call(Runtime::BUILTIN_MEMSET, loc, 3, a1, zero, a2); } @@ -9064,15 +9063,12 @@ Builtin_call_expression::flatten_append(Gogo* gogo, Named_object* function, // For a slice containing pointers, growslice already zeroed // the memory. We only need to zero in non-growing case. // Note: growslice does not zero the memory in non-pointer case. - Expression* left = - Expression::make_temporary_reference(ntmp, loc); - left = Expression::make_cast(uint_type, left, loc); - Expression* right = - Expression::make_temporary_reference(c1tmp, loc); - right = Expression::make_cast(uint_type, right, loc); - Expression* cond = - Expression::make_binary(OPERATOR_GT, left, right, loc); - Expression* zero = Expression::make_integer_ul(0, int_type, loc); + ref = Expression::make_temporary_reference(ntmp, loc); + ref = Expression::make_cast(uint_type, ref, loc); + ref2 = Expression::make_temporary_reference(c1tmp, loc); + ref2 = Expression::make_cast(uint_type, ref2, loc); + cond = Expression::make_binary(OPERATOR_GT, ref, ref2, loc); + zero = Expression::make_integer_ul(0, int_type, loc); call = Expression::make_conditional(cond, call, zero, loc); } } @@ -10877,9 +10873,7 @@ Call_expression::do_lower(Gogo* gogo, Named_object* function, if (this->result_count() > 1 && this->call_temp_ == NULL) { Struct_field_list* sfl = new Struct_field_list(); - Function_type* fntype = this->get_function_type(); const Typed_identifier_list* results = fntype->results(); - Location loc = this->location(); int i = 0; char buf[20]; @@ -12295,10 +12289,10 @@ Call_expression::do_get_backend(Translate_context* context) } else { - Expression* first_arg; - fn = this->interface_method_function(interface_method, &first_arg, + Expression* arg0; + fn = this->interface_method_function(interface_method, &arg0, location); - fn_args[0] = first_arg->get_backend(context); + fn_args[0] = arg0->get_backend(context); } Bexpression* bclosure = NULL; @@ -16453,11 +16447,11 @@ Composite_literal_expression::lower_array(Type* type) traverse_order = new std::vector<unsigned long>(); traverse_order->reserve(v.size()); - for (V::const_iterator p = v.begin(); p != v.end(); ++p) + for (V::const_iterator pv = v.begin(); pv != v.end(); ++pv) { - indexes->push_back(p->index); - vals->push_back(p->expr); - traverse_order->push_back(p->traversal_order); + indexes->push_back(pv->index); + vals->push_back(pv->expr); + traverse_order->push_back(pv->traversal_order); } } @@ -17771,9 +17765,9 @@ Interface_info_expression::do_type() Interface_type* itype = this->iface_->type()->interface_type(); - Hashtable::const_iterator p = result_types.find(itype); - if (p != result_types.end()) - return p->second; + Hashtable::const_iterator pr = result_types.find(itype); + if (pr != result_types.end()) + return pr->second; Type* pdt = Type::make_type_descriptor_ptr_type(); if (itype->is_empty()) diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc index e7af673c8df..a79cfc3a9a7 100644 --- a/gcc/go/gofrontend/gogo.cc +++ b/gcc/go/gofrontend/gogo.cc @@ -518,11 +518,11 @@ Gogo::import_package(const std::string& filename, else if (ln == ".") { Bindings* bindings = package->bindings(); - for (Bindings::const_declarations_iterator p = + for (Bindings::const_declarations_iterator pd = bindings->begin_declarations(); - p != bindings->end_declarations(); - ++p) - this->add_dot_import_object(p->second); + pd != bindings->end_declarations(); + ++pd) + this->add_dot_import_object(pd->second); std::string dot_alias = "." + package->package_name(); package->add_alias(dot_alias, location); } @@ -678,8 +678,8 @@ Gogo::recompute_init_priorities() pci != ii->precursors().end(); ++pci) { - Import_init* ii = this->lookup_init(*pci); - nonroots.insert(ii); + Import_init* ii_init = this->lookup_init(*pci); + nonroots.insert(ii_init); } } @@ -2613,11 +2613,11 @@ Gogo::define_global_names() { if (no->type_declaration_value()->has_methods()) { - for (std::vector<Named_object*>::const_iterator p = + for (std::vector<Named_object*>::const_iterator pm = no->type_declaration_value()->methods()->begin(); - p != no->type_declaration_value()->methods()->end(); - p++) - go_error_at((*p)->location(), + pm != no->type_declaration_value()->methods()->end(); + pm++) + go_error_at((*pm)->location(), "may not define methods on non-local type"); } no->set_type_value(global_no->type_value()); @@ -6550,8 +6550,8 @@ Function::build(Gogo* gogo, Named_object* named_function) // Build the backend representation for all the statements in the // function. - Translate_context context(gogo, named_function, NULL, NULL); - Bblock* code_block = this->block_->get_backend(&context); + Translate_context bcontext(gogo, named_function, NULL, NULL); + Bblock* code_block = this->block_->get_backend(&bcontext); // Initialize variables if necessary. Translate_context icontext(gogo, named_function, this->block_, @@ -6608,8 +6608,8 @@ Function::build(Gogo* gogo, Named_object* named_function) // If we created a descriptor for the function, make sure we emit it. if (this->descriptor_ != NULL) { - Translate_context context(gogo, NULL, NULL, NULL); - this->descriptor_->get_backend(&context); + Translate_context dcontext(gogo, NULL, NULL, NULL); + this->descriptor_->get_backend(&dcontext); } } diff --git a/gcc/go/gofrontend/parse.cc b/gcc/go/gofrontend/parse.cc index 52371b2b032..e50af616421 100644 --- a/gcc/go/gofrontend/parse.cc +++ b/gcc/go/gofrontend/parse.cc @@ -836,7 +836,7 @@ Parse::parameter_list(bool* is_varargs) { std::string name = token->identifier(); bool is_exported = token->is_identifier_exported(); - Location location = token->location(); + Location id_location = token->location(); token = this->advance_token(); if (!token->is_op(OPERATOR_COMMA)) { @@ -861,7 +861,7 @@ Parse::parameter_list(bool* is_varargs) } this->unget_token(Token::make_identifier_token(name, is_exported, - location)); + id_location)); } else { @@ -872,15 +872,15 @@ Parse::parameter_list(bool* is_varargs) // commas as we can. std::string id_name = this->gogo_->pack_hidden_name(name, is_exported); - ret->push_back(Typed_identifier(id_name, NULL, location)); + ret->push_back(Typed_identifier(id_name, NULL, id_location)); bool just_saw_comma = true; while (this->advance_token()->is_identifier()) { name = this->peek_token()->identifier(); is_exported = this->peek_token()->is_identifier_exported(); - location = this->peek_token()->location(); + id_location = this->peek_token()->location(); id_name = this->gogo_->pack_hidden_name(name, is_exported); - ret->push_back(Typed_identifier(id_name, NULL, location)); + ret->push_back(Typed_identifier(id_name, NULL, id_location)); if (!this->advance_token()->is_op(OPERATOR_COMMA)) { just_saw_comma = false; @@ -909,7 +909,7 @@ Parse::parameter_list(bool* is_varargs) // names. parameters_have_names = false; this->unget_token(Token::make_identifier_token(name, is_exported, - location)); + id_location)); ret->pop_back(); just_saw_comma = true; } @@ -2808,7 +2808,7 @@ Parse::composite_lit(Type* type, int depth, Location location) { std::string identifier = token->identifier(); bool is_exported = token->is_identifier_exported(); - Location location = token->location(); + Location id_location = token->location(); if (this->advance_token()->is_op(OPERATOR_COLON)) { @@ -2820,14 +2820,14 @@ Parse::composite_lit(Type* type, int depth, Location location) Gogo* gogo = this->gogo_; val = this->id_to_expression(gogo->pack_hidden_name(identifier, is_exported), - location, false); + id_location, false); is_name = true; } else { this->unget_token(Token::make_identifier_token(identifier, is_exported, - location)); + id_location)); val = this->expression(PRECEDENCE_NORMAL, false, true, NULL, NULL); } @@ -2923,14 +2923,14 @@ Parse::composite_lit(Type* type, int depth, Location location) go_error_at(this->location(), "expected %<,%> or %<}%>"); this->gogo_->mark_locals_used(); - int depth = 0; + int edepth = 0; while (!token->is_eof() - && (depth > 0 || !token->is_op(OPERATOR_RCURLY))) + && (edepth > 0 || !token->is_op(OPERATOR_RCURLY))) { if (token->is_op(OPERATOR_LCURLY)) - ++depth; + ++edepth; else if (token->is_op(OPERATOR_RCURLY)) - --depth; + --edepth; token = this->advance_token(); } if (token->is_op(OPERATOR_RCURLY)) diff --git a/gcc/go/gofrontend/statements.cc b/gcc/go/gofrontend/statements.cc index 3dc394ab32b..f52b33d665c 100644 --- a/gcc/go/gofrontend/statements.cc +++ b/gcc/go/gofrontend/statements.cc @@ -2938,7 +2938,7 @@ Thunk_statement::build_thunk(Gogo* gogo, const std::string& thunk_name) go_assert(call_statement->classification() == STATEMENT_EXPRESSION); Expression_statement* es = static_cast<Expression_statement*>(call_statement); - Call_expression* ce = es->expr()->call_expression(); + ce = es->expr()->call_expression(); if (ce == NULL) go_assert(saw_errors()); else @@ -5972,10 +5972,11 @@ Select_statement::lower_two_case(Block* b) // if selectnbrecv2(&lhs, &ok, chan) { body } else { default body } Type* booltype = Type::make_boolean_type(); - Temporary_statement* ts = Statement::make_temporary(booltype, NULL, loc); - b->add_statement(ts); + Temporary_statement* okts = Statement::make_temporary(booltype, NULL, + loc); + b->add_statement(okts); - okref = Expression::make_temporary_reference(ts, loc); + okref = Expression::make_temporary_reference(okts, loc); Expression* okaddr = Expression::make_unary(OPERATOR_AND, okref, loc); call = Runtime::make_call(Runtime::SELECTNBRECV2, loc, 3, addr, okaddr, chanref); @@ -6595,7 +6596,7 @@ For_range_statement::lower_range_array(Gogo* gogo, iter_init = new Block(body_block, loc); ref = Expression::make_temporary_reference(range_temp, loc); - Expression* ref2 = Expression::make_temporary_reference(index_temp, loc); + ref2 = Expression::make_temporary_reference(index_temp, loc); Expression* index = Expression::make_index(ref, ref2, NULL, NULL, loc); tref = Expression::make_temporary_reference(value_temp, loc); @@ -6693,7 +6694,7 @@ For_range_statement::lower_range_slice(Gogo* gogo, iter_init = new Block(body_block, loc); ref = Expression::make_temporary_reference(for_temp, loc); - Expression* ref2 = Expression::make_temporary_reference(index_temp, loc); + ref2 = Expression::make_temporary_reference(index_temp, loc); Expression* index = Expression::make_index(ref, ref2, NULL, NULL, loc); tref = Expression::make_temporary_reference(value_temp, loc); @@ -7179,9 +7180,9 @@ For_range_statement::lower_array_range_clear(Gogo* gogo, else { Type* int32_type = Type::lookup_integer_type("int32"); - Expression* zero = Expression::make_integer_ul(0, int32_type, loc); + Expression* zero32 = Expression::make_integer_ul(0, int32_type, loc); call = Runtime::make_call(Runtime::BUILTIN_MEMSET, loc, 3, ptr_arg, - zero, sz_arg); + zero32, sz_arg); } Statement* cs3 = Statement::make_statement(call, true); b->add_statement(cs3); diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc index eeae9fa4c0e..e02b832df14 100644 --- a/gcc/go/gofrontend/types.cc +++ b/gcc/go/gofrontend/types.cc @@ -6410,12 +6410,11 @@ Struct_type::do_type_descriptor(Gogo* gogo, Named_type* name) fvals->push_back(Expression::make_nil(bloc)); else { - std::string n; if (is_embedded_builtin) n = gogo->package_name(); else n = Gogo::hidden_name_pkgpath(pf->field_name()); - Expression* s = Expression::make_string(n, bloc); + s = Expression::make_string(n, bloc); fvals->push_back(Expression::make_unary(OPERATOR_AND, s, bloc)); } @@ -6429,7 +6428,7 @@ Struct_type::do_type_descriptor(Gogo* gogo, Named_type* name) fvals->push_back(Expression::make_nil(bloc)); else { - Expression* s = Expression::make_string(pf->tag(), bloc); + s = Expression::make_string(pf->tag(), bloc); fvals->push_back(Expression::make_unary(OPERATOR_AND, s, bloc)); } @@ -6635,22 +6634,22 @@ Struct_type::do_reflection(Gogo* gogo, std::string* ret) const { const std::string& tag(p->tag()); ret->append(" \""); - for (std::string::const_iterator p = tag.begin(); - p != tag.end(); - ++p) + for (std::string::const_iterator pt = tag.begin(); + pt != tag.end(); + ++pt) { - if (*p == '\0') + if (*pt == '\0') ret->append("\\x00"); - else if (*p == '\n') + else if (*pt == '\n') ret->append("\\n"); - else if (*p == '\t') + else if (*pt == '\t') ret->append("\\t"); - else if (*p == '"') + else if (*pt == '"') ret->append("\\\""); - else if (*p == '\\') + else if (*pt == '\\') ret->append("\\\\"); else - ret->push_back(*p); + ret->push_back(*pt); } ret->push_back('"'); } @@ -7197,11 +7196,11 @@ Array_type::verify_length() return false; case Numeric_constant::NC_UL_BIG: { - mpz_t val; - if (!nc.to_int(&val)) + mpz_t mval; + if (!nc.to_int(&mval)) go_unreachable(); - unsigned int bits = mpz_sizeinbase(val, 2); - mpz_clear(val); + unsigned int bits = mpz_sizeinbase(mval, 2); + mpz_clear(mval); if (bits >= tbits) { go_error_at(this->length_->location(), "array bound overflows"); @@ -7704,6 +7703,7 @@ Array_type::do_export(Export* exp) const } char* s = mpz_get_str(NULL, 10, val); exp->write_string(s); + free(s); exp->write_string(" "); mpz_clear(val); } @@ -9752,7 +9752,7 @@ Interface_type::do_import(Import* imp) parameters = new Typed_identifier_list; while (true) { - std::string name = imp->read_name(); + std::string pname = imp->read_name(); imp->require_c_string(" "); if (imp->match_c_string("...")) @@ -9764,7 +9764,7 @@ Interface_type::do_import(Import* imp) Type* ptype = imp->read_type(); if (is_varargs) ptype = Type::make_array_type(ptype, NULL); - parameters->push_back(Typed_identifier(name, ptype, + parameters->push_back(Typed_identifier(pname, ptype, imp->location())); if (imp->peek_char() != ',') break; @@ -9791,10 +9791,10 @@ Interface_type::do_import(Import* imp) imp->advance(1); while (true) { - std::string name = imp->read_name(); + std::string rname = imp->read_name(); imp->require_c_string(" "); Type* rtype = imp->read_type(); - results->push_back(Typed_identifier(name, rtype, + results->push_back(Typed_identifier(rname, rtype, imp->location())); if (imp->peek_char() != ',') break; </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O2 - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2 Culprit: <cut> commit 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 Author: Stephen Kelly <steveire(a)gmail.com> Date: Mon Apr 26 18:28:50 2021 +0100 [AST] Fix DeclarationNameInfo introspection Some AST classes return `const DeclarationNameInfo &` instead of returning by value (eg CXXDependentScopeMemberExpr). </cut> Results regressed to (for first_bad == 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-50b523cb2ceee4ca7279b4ce22ddb0d0b05df313/results_id: 1 # 433.milc,[.] mult_su3_mat_vec regressed by 115 from (for last_good == 10038d0b3dfcfa6abf8a710612899f859ef1534b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-10038d0b3dfcfa6abf8a710612899f859ef1534b/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2/4634 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2/4631 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 cd investigate-llvm-50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 10038d0b3dfcfa6abf8a710612899f859ef1534b ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 50b523cb2ceee4ca7279b4ce22ddb0d0b05df313 Author: Stephen Kelly <steveire(a)gmail.com> Date: Mon Apr 26 18:28:50 2021 +0100 [AST] Fix DeclarationNameInfo introspection Some AST classes return `const DeclarationNameInfo &` instead of returning by value (eg CXXDependentScopeMemberExpr). --- clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp | 3 + .../unittests/Introspection/IntrospectionTest.cpp | 66 ++++++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp b/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp index 0aeb3a7703f7..0a7fb9b52f23 100644 --- a/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp +++ b/clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp @@ -225,6 +225,9 @@ void ASTSrcLocProcessor::run(const MatchFinder::MatchResult &Result) { CaptureMethods("class clang::NestedNameSpecifierLoc", ASTClass, Result); CD.DeclNameInfos = CaptureMethods("struct clang::DeclarationNameInfo", ASTClass, Result); + auto DI = CaptureMethods("const struct clang::DeclarationNameInfo &", + ASTClass, Result); + CD.DeclNameInfos.insert(CD.DeclNameInfos.end(), DI.begin(), DI.end()); if (const auto *DerivedFrom = Result.Nodes.getNodeAs<clang::CXXRecordDecl>("derivedFrom")) { diff --git a/clang/unittests/Introspection/IntrospectionTest.cpp b/clang/unittests/Introspection/IntrospectionTest.cpp index 521520c9a7c7..d4f626bfeb74 100644 --- a/clang/unittests/Introspection/IntrospectionTest.cpp +++ b/clang/unittests/Introspection/IntrospectionTest.cpp @@ -1456,6 +1456,72 @@ getNamedTypeInfo()->getTypeLoc().getAs<clang::TypeSpecTypeLoc>().getNameLoc()), STRING_LOCATION_PAIR((&NI), getSourceRange()))); } +TEST(Introspection, SourceLocations_DeclarationNameInfo_CRef) { + if (!NodeIntrospection::hasIntrospectionSupport()) + return; + + auto AST = buildASTFromCodeWithArgs( + R"cpp( +template<typename T> +struct MyContainer +{ + template <typename U> + void pushBack(); +}; + +template<typename T> +void foo() +{ + MyContainer<T> mc; + mc.template pushBack<int>(); +} +)cpp", + {"-fno-delayed-template-parsing"}, "foo.cpp", "clang-tool", + std::make_shared<PCHContainerOperations>()); + + auto &Ctx = AST->getASTContext(); + auto &TU = *Ctx.getTranslationUnitDecl(); + + auto BoundNodes = ast_matchers::match( + decl(hasDescendant(cxxDependentScopeMemberExpr(hasMemberName("pushBack")).bind("member"))), TU, + Ctx); + + EXPECT_EQ(BoundNodes.size(), 1u); + + const auto *Member = BoundNodes[0].getNodeAs<CXXDependentScopeMemberExpr>("member"); + auto Result = NodeIntrospection::GetLocations(Member); + + auto ExpectedLocations = + FormatExpected<SourceLocation>(Result.LocationAccessors); + + llvm::sort(ExpectedLocations); + + EXPECT_EQ( + llvm::makeArrayRef(ExpectedLocations), + (ArrayRef<std::pair<std::string, SourceLocation>>{ + STRING_LOCATION_STDPAIR(Member, getBeginLoc()), + STRING_LOCATION_STDPAIR(Member, getEndLoc()), + STRING_LOCATION_STDPAIR(Member, getExprLoc()), + STRING_LOCATION_STDPAIR(Member, getLAngleLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberNameInfo().getBeginLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberNameInfo().getEndLoc()), + STRING_LOCATION_STDPAIR(Member, getMemberNameInfo().getLoc()), + STRING_LOCATION_STDPAIR(Member, getOperatorLoc()), + STRING_LOCATION_STDPAIR(Member, getRAngleLoc()), + STRING_LOCATION_STDPAIR(Member, getTemplateKeywordLoc()) + })); + + auto ExpectedRanges = FormatExpected<SourceRange>(Result.RangeAccessors); + + EXPECT_THAT( + ExpectedRanges, + UnorderedElementsAre( + STRING_LOCATION_PAIR(Member, getMemberNameInfo().getSourceRange()), + STRING_LOCATION_PAIR(Member, getSourceRange()) + )); +} + TEST(Introspection, SourceLocations_DeclarationNameInfo_ConvOp) { if (!NodeIntrospection::hasIntrospectionSupport()) return; </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz_LTO - Build # 5 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Culprit: <cut> commit 02b1c3f0529e525a4ffa671478050f4704b3f472 Author: Dmitry Preobrazhensky <dmitry.preobrazhensky(a)amd.com> Date: Fri Aug 6 15:49:52 2021 +0300 [AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description. Corrected sendmsg description (bug https://bugs.llvm.org/show_bug.cgi?id=49648). </cut> Results regressed to (for first_bad == 02b1c3f0529e525a4ffa671478050f4704b3f472) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-02b1c3f0529e525a4ffa671478050f4704b3f472/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == 4aafd5f00c2a772337ec065d4542ef158453a343) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-baseline/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/baseline-llvm-master-aarch64-spec2k6-Oz_LTO/4560 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/4618 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-02b1c3f0529e525a4ffa671478050f4704b3f472 cd investigate-llvm-02b1c3f0529e525a4ffa671478050f4704b3f472 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 02b1c3f0529e525a4ffa671478050f4704b3f472 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 4aafd5f00c2a772337ec065d4542ef158453a343 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 02b1c3f0529e525a4ffa671478050f4704b3f472 Author: Dmitry Preobrazhensky <dmitry.preobrazhensky(a)amd.com> Date: Fri Aug 6 15:49:52 2021 +0300 [AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description. Corrected sendmsg description (bug https://bugs.llvm.org/show_bug.cgi?id=49648). --- llvm/docs/AMDGPU/gfx10_msg.rst | 41 +++++++++++++++++++++++------------------ llvm/docs/AMDGPU/gfx8_msg.rst | 1 + llvm/docs/AMDGPU/gfx90a_msg.rst | 41 +++++++++++++++++++++++------------------ llvm/docs/AMDGPU/gfx9_msg.rst | 41 +++++++++++++++++++++++------------------ 4 files changed, 70 insertions(+), 54 deletions(-) diff --git a/llvm/docs/AMDGPU/gfx10_msg.rst b/llvm/docs/AMDGPU/gfx10_msg.rst index 3e6c532dd85a..c0774d85a62e 100644 --- a/llvm/docs/AMDGPU/gfx10_msg.rst +++ b/llvm/docs/AMDGPU/gfx10_msg.rst @@ -47,24 +47,29 @@ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. Each message type supports specific operations: - ================= ========== ============================== ============ ========== - Message name Message Id Supported Operations Operation Id Stream Id - ================= ========== ============================== ============ ========== - MSG_INTERRUPT 1 \- \- \- - MSG_GS 2 GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_DONE 3 GS_OP_NOP 0 \- - \ GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_ALLOC_REQ 9 \- \- \- - MSG_GET_DOORBELL 10 \- \- \- - MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- - \ SYSMSG_OP_REG_RD 2 \- - \ SYSMSG_OP_HOST_TRAP_ACK 3 \- - \ SYSMSG_OP_TTRACE_PC 4 \- - ================= ========== ============================== ============ ========== + =================== ========== ============================== ============ ========== + Message name Message Id Supported Operations Operation Id Stream Id + =================== ========== ============================== ============ ========== + MSG_INTERRUPT 1 \- \- \- + MSG_GS 2 GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_GS_DONE 3 GS_OP_NOP 0 \- + \ GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- + MSG_STALL_WAVE_GEN 5 \- \- \- + MSG_HALT_WAVES 6 \- \- \- + MSG_ORDERED_PS_DONE 7 \- \- \- + MSG_GS_ALLOC_REQ 9 \- \- \- + MSG_GET_DOORBELL 10 \- \- \- + MSG_GET_DDID 11 \- \- \- + MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- + \ SYSMSG_OP_REG_RD 2 \- + \ SYSMSG_OP_HOST_TRAP_ACK 3 \- + \ SYSMSG_OP_TTRACE_PC 4 \- + =================== ========== ============================== ============ ========== *Sendmsg* arguments are validated depending on how *type* value is specified: diff --git a/llvm/docs/AMDGPU/gfx8_msg.rst b/llvm/docs/AMDGPU/gfx8_msg.rst index 0b0b2f307482..f32033dd944c 100644 --- a/llvm/docs/AMDGPU/gfx8_msg.rst +++ b/llvm/docs/AMDGPU/gfx8_msg.rst @@ -58,6 +58,7 @@ Each message type supports specific operations: \ GS_OP_CUT 1 Optional \ GS_OP_EMIT 2 Optional \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- \ SYSMSG_OP_REG_RD 2 \- \ SYSMSG_OP_HOST_TRAP_ACK 3 \- diff --git a/llvm/docs/AMDGPU/gfx90a_msg.rst b/llvm/docs/AMDGPU/gfx90a_msg.rst index aa44d3b64f49..37f945464e58 100644 --- a/llvm/docs/AMDGPU/gfx90a_msg.rst +++ b/llvm/docs/AMDGPU/gfx90a_msg.rst @@ -47,24 +47,29 @@ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. Each message type supports specific operations: - ================= ========== ============================== ============ ========== - Message name Message Id Supported Operations Operation Id Stream Id - ================= ========== ============================== ============ ========== - MSG_INTERRUPT 1 \- \- \- - MSG_GS 2 GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_DONE 3 GS_OP_NOP 0 \- - \ GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_ALLOC_REQ 9 \- \- \- - MSG_GET_DOORBELL 10 \- \- \- - MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- - \ SYSMSG_OP_REG_RD 2 \- - \ SYSMSG_OP_HOST_TRAP_ACK 3 \- - \ SYSMSG_OP_TTRACE_PC 4 \- - ================= ========== ============================== ============ ========== + ====================== ========== ============================== ============ ========== + Message name Message Id Supported Operations Operation Id Stream Id + ====================== ========== ============================== ============ ========== + MSG_INTERRUPT 1 \- \- \- + MSG_GS 2 GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_GS_DONE 3 GS_OP_NOP 0 \- + \ GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- + MSG_STALL_WAVE_GEN 5 \- \- \- + MSG_HALT_WAVES 6 \- \- \- + MSG_ORDERED_PS_DONE 7 \- \- \- + MSG_EARLY_PRIM_DEALLOC 8 \- \- \- + MSG_GS_ALLOC_REQ 9 \- \- \- + MSG_GET_DOORBELL 10 \- \- \- + MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- + \ SYSMSG_OP_REG_RD 2 \- + \ SYSMSG_OP_HOST_TRAP_ACK 3 \- + \ SYSMSG_OP_TTRACE_PC 4 \- + ====================== ========== ============================== ============ ========== *Sendmsg* arguments are validated depending on how *type* value is specified: diff --git a/llvm/docs/AMDGPU/gfx9_msg.rst b/llvm/docs/AMDGPU/gfx9_msg.rst index efb95e5a97db..34be1c8a24c5 100644 --- a/llvm/docs/AMDGPU/gfx9_msg.rst +++ b/llvm/docs/AMDGPU/gfx9_msg.rst @@ -47,24 +47,29 @@ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. Each message type supports specific operations: - ================= ========== ============================== ============ ========== - Message name Message Id Supported Operations Operation Id Stream Id - ================= ========== ============================== ============ ========== - MSG_INTERRUPT 1 \- \- \- - MSG_GS 2 GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_DONE 3 GS_OP_NOP 0 \- - \ GS_OP_CUT 1 Optional - \ GS_OP_EMIT 2 Optional - \ GS_OP_EMIT_CUT 3 Optional - MSG_GS_ALLOC_REQ 9 \- \- \- - MSG_GET_DOORBELL 10 \- \- \- - MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- - \ SYSMSG_OP_REG_RD 2 \- - \ SYSMSG_OP_HOST_TRAP_ACK 3 \- - \ SYSMSG_OP_TTRACE_PC 4 \- - ================= ========== ============================== ============ ========== + ====================== ========== ============================== ============ ========== + Message name Message Id Supported Operations Operation Id Stream Id + ====================== ========== ============================== ============ ========== + MSG_INTERRUPT 1 \- \- \- + MSG_GS 2 GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_GS_DONE 3 GS_OP_NOP 0 \- + \ GS_OP_CUT 1 Optional + \ GS_OP_EMIT 2 Optional + \ GS_OP_EMIT_CUT 3 Optional + MSG_SAVEWAVE 4 \- \- \- + MSG_STALL_WAVE_GEN 5 \- \- \- + MSG_HALT_WAVES 6 \- \- \- + MSG_ORDERED_PS_DONE 7 \- \- \- + MSG_EARLY_PRIM_DEALLOC 8 \- \- \- + MSG_GS_ALLOC_REQ 9 \- \- \- + MSG_GET_DOORBELL 10 \- \- \- + MSG_SYSMSG 15 SYSMSG_OP_ECC_ERR_INTERRUPT 1 \- + \ SYSMSG_OP_REG_RD 2 \- + \ SYSMSG_OP_HOST_TRAP_ACK 3 \- + \ SYSMSG_OP_TTRACE_PC 4 \- + ====================== ========== ============================== ============ ========== *Sendmsg* arguments are validated depending on how *type* value is specified: </cut>

3 years, 9 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O3_LTO - Build # 33 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 53329d29274fa4af5af7ab155947fe84b9684e39 Author: Rainer Orth <ro(a)CeBiTec.Uni-Bielefeld.DE> Date: Tue May 21 16:59:39 2019 +0000 Fix dg-require-* syntax * gcc.dg/Wattribute-alias.c: Pass emtpy arg to dg-require-ifunc. * gcc.c-torture/execute/20030125-1.c: Pass emtpy arg to dg-require-weak. * gcc.dg/torture/ftrapv-2.c: Pass empty arg to dg-require-fork. * gcc.target/i386/pr84723-1.c: Remove dg-require-ifunc. * gcc.target/i386/pr84723-2.c: Likewise. * gcc.target/i386/pr84723-3.c: Likewise. * gcc.target/i386/pr84723-4.c: Likewise. * gcc.target/i386/pr84723-5.c: Likewise. From-SVN: r271476 </cut> Results regressed to (for first_bad == 53329d29274fa4af5af7ab155947fe84b9684e39) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-53329d29274fa4af5af7ab155947fe84b9684e39/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 104 from (for last_good == b33a3c6451ecc09ac5f1c7ccdac9b19eb0bd1a48) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-b33a3c6451ecc09ac5f1c7ccdac9b19eb0bd1a48/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4568 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O3_LTO/4564 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-53329d29274fa4af5af7ab155947fe84b9684e39 cd investigate-gcc-53329d29274fa4af5af7ab155947fe84b9684e39 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 53329d29274fa4af5af7ab155947fe84b9684e39 ../artifacts/test.sh # Reproduce last_good build git checkout --detach b33a3c6451ecc09ac5f1c7ccdac9b19eb0bd1a48 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit 53329d29274fa4af5af7ab155947fe84b9684e39 Author: Rainer Orth <ro(a)CeBiTec.Uni-Bielefeld.DE> Date: Tue May 21 16:59:39 2019 +0000 Fix dg-require-* syntax * gcc.dg/Wattribute-alias.c: Pass emtpy arg to dg-require-ifunc. * gcc.c-torture/execute/20030125-1.c: Pass emtpy arg to dg-require-weak. * gcc.dg/torture/ftrapv-2.c: Pass empty arg to dg-require-fork. * gcc.target/i386/pr84723-1.c: Remove dg-require-ifunc. * gcc.target/i386/pr84723-2.c: Likewise. * gcc.target/i386/pr84723-3.c: Likewise. * gcc.target/i386/pr84723-4.c: Likewise. * gcc.target/i386/pr84723-5.c: Likewise. From-SVN: r271476 --- gcc/testsuite/ChangeLog | 14 ++++++++++++++ gcc/testsuite/gcc.c-torture/execute/20030125-1.c | 2 +- gcc/testsuite/gcc.dg/Wattribute-alias.c | 2 +- gcc/testsuite/gcc.dg/torture/ftrapv-2.c | 2 +- gcc/testsuite/gcc.target/i386/pr84723-1.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-2.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-3.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-4.c | 1 - gcc/testsuite/gcc.target/i386/pr84723-5.c | 1 - 9 files changed, 17 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 8ec3ed1a513..4e8e73cb52f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,17 @@ +2019-05-21 Rainer Orth <ro(a)CeBiTec.Uni-Bielefeld.DE> + + * gcc.dg/Wattribute-alias.c: Pass emtpy arg to dg-require-ifunc. + + * gcc.c-torture/execute/20030125-1.c: Pass emtpy arg to dg-require-weak. + + * gcc.dg/torture/ftrapv-2.c: Pass empty arg to dg-require-fork. + + * gcc.target/i386/pr84723-1.c: Remove dg-require-ifunc. + * gcc.target/i386/pr84723-2.c: Likewise. + * gcc.target/i386/pr84723-3.c: Likewise. + * gcc.target/i386/pr84723-4.c: Likewise. + * gcc.target/i386/pr84723-5.c: Likewise. + 2019-05-21 Iain Sandoe <iain(a)sandoe.co.uk> PR testsuite/67958 diff --git a/gcc/testsuite/gcc.c-torture/execute/20030125-1.c b/gcc/testsuite/gcc.c-torture/execute/20030125-1.c index 960552c3c3a..39578e51d15 100644 --- a/gcc/testsuite/gcc.c-torture/execute/20030125-1.c +++ b/gcc/testsuite/gcc.c-torture/execute/20030125-1.c @@ -1,6 +1,6 @@ /* Verify whether math functions are simplified. */ /* { dg-require-effective-target c99_runtime } */ -/* { dg-require-weak } */ +/* { dg-require-weak "" } */ double sin(double); double floor(double); float diff --git a/gcc/testsuite/gcc.dg/Wattribute-alias.c b/gcc/testsuite/gcc.dg/Wattribute-alias.c index 228c1be82fc..12774e82834 100644 --- a/gcc/testsuite/gcc.dg/Wattribute-alias.c +++ b/gcc/testsuite/gcc.dg/Wattribute-alias.c @@ -1,6 +1,6 @@ /* PR middle-end/81824 - Warn for missing attributes with function aliases { dg-do compile } - { dg-require-ifunc "require ifunc support" } + { dg-require-ifunc "" } { dg-options "-Wall -Wattribute-alias=2" } */ #define ATTR(...) __attribute__ ((__VA_ARGS__)) diff --git a/gcc/testsuite/gcc.dg/torture/ftrapv-2.c b/gcc/testsuite/gcc.dg/torture/ftrapv-2.c index 8065ee0461a..75e464fe557 100644 --- a/gcc/testsuite/gcc.dg/torture/ftrapv-2.c +++ b/gcc/testsuite/gcc.dg/torture/ftrapv-2.c @@ -3,7 +3,7 @@ /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */ /* { dg-additional-options "-ftrapv" } */ /* { dg-require-effective-target trapping } */ -/* { dg-require-fork unused } */ +/* { dg-require-fork "" } */ #include <stdlib.h> #include <unistd.h> diff --git a/gcc/testsuite/gcc.target/i386/pr84723-1.c b/gcc/testsuite/gcc.target/i386/pr84723-1.c index 0264ecb1159..1357b1d5f46 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-1.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-1.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-2.c b/gcc/testsuite/gcc.target/i386/pr84723-2.c index 6456d6d256f..d092e676b62 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-2.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-2.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-3.c b/gcc/testsuite/gcc.target/i386/pr84723-3.c index bb8e7cabc88..7bb8eb29815 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-3.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-3.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-4.c b/gcc/testsuite/gcc.target/i386/pr84723-4.c index 9df1008497c..f30567dfae3 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-4.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-4.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) diff --git a/gcc/testsuite/gcc.target/i386/pr84723-5.c b/gcc/testsuite/gcc.target/i386/pr84723-5.c index c7aa92804fa..0167df39850 100644 --- a/gcc/testsuite/gcc.target/i386/pr84723-5.c +++ b/gcc/testsuite/gcc.target/i386/pr84723-5.c @@ -1,6 +1,5 @@ /* PR middle-end/84723 */ /* { dg-do compile } */ -/* { dg-require-ifunc } */ /* { dg-options "-O2" } */ __attribute__((target_clones ("avx", "default"))) </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2 - Build # 16 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Culprit: <cut> commit 7de439b2be4a046da541b625812f2fe34c54c4b9 Author: Rob Suderman <rob.suderman(a)gmail.com> Date: Wed Aug 11 11:05:08 2021 -0700 [mlir][tosa] Migrate tosa to more efficient linalg.conv Existing linalg.conv2d is not well optimized for performance. Changed to a version that is more aligned for optimziation. Include the corresponding transposes to use this optimized version. This also splits the conv and depthwise conv into separate implementations to avoid overly complex lowerings. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D107504 </cut> Results regressed to (for first_bad == 7de439b2be4a046da541b625812f2fe34c54c4b9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-7de439b2be4a046da541b625812f2fe34c54c4b9/results_id: 1 # 447.dealII,[.] _ZNK12SparseMatrixIdE5vmultI6VectorIdES3_EEvRT regressed by 111 # 433.milc,[.] mult_su3_mat_vec regressed by 112 from (for last_good == c1a8f12873783e8f4827437f6b2dddadfc58109d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-c1a8f12873783e8f4827437f6b2dddadfc58109d/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4556 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/4553 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-7de439b2be4a046da541b625812f2fe34c54c4b9 cd investigate-llvm-7de439b2be4a046da541b625812f2fe34c54c4b9 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 7de439b2be4a046da541b625812f2fe34c54c4b9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach c1a8f12873783e8f4827437f6b2dddadfc58109d ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 7de439b2be4a046da541b625812f2fe34c54c4b9 Author: Rob Suderman <rob.suderman(a)gmail.com> Date: Wed Aug 11 11:05:08 2021 -0700 [mlir][tosa] Migrate tosa to more efficient linalg.conv Existing linalg.conv2d is not well optimized for performance. Changed to a version that is more aligned for optimziation. Include the corresponding transposes to use this optimized version. This also splits the conv and depthwise conv into separate implementations to avoid overly complex lowerings. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D107504 --- .../Linalg/IR/LinalgNamedStructuredOps.yaml | 261 +++++++++--------- mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp | 303 +++++++++++++-------- .../dialects/linalg/opdsl/ops/core_named_ops.py | 79 +++--- .../Conversion/TosaToLinalg/tosa-to-linalg.mlir | 40 ++- mlir/test/Dialect/Linalg/named-ops.mlir | 14 - 5 files changed, 378 insertions(+), 319 deletions(-) diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml index 53b54e1bff9f..3e1fcabc8cb9 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml @@ -628,10 +628,10 @@ structured_op: !LinalgStructuredOpConfig scalar_arg: B --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: conv_2d_input_nhwc_filter_ohwi_poly - cpp_class_name: Conv2DInputNhwcFilterOhwiPolyOp + name: conv_2d_nchw + cpp_class_name: Conv2DNchwOp doc: |- - Performs a 2-D convolution. + Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting them to the same data type as the accumulator/output. @@ -648,13 +648,13 @@ structured_op: !LinalgStructuredOpConfig usage: InputOperand type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s4, s5, s6, s3)> + -> (s4, s1, s5, s6)> - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s7, s8, s4)> + -> (s0, s4, s7, s8, s1)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -670,18 +670,18 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d6)> + s9, s10, s11, s12] -> (d0, d4, d2 * s9 + d5 * s11, d3 * s10 + d6 * s12)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d5, d3, d4, d6)> + s9, s10, s11, s12] -> (d1, d4, d5, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d5)> + s9, s10, s11, s12] -> (d0, d1, d2, d3)> iterator_types: - parallel - parallel - parallel + - parallel - reduction - reduction - - parallel - reduction assignments: - !ScalarAssign @@ -710,14 +710,13 @@ structured_op: !LinalgStructuredOpConfig scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: conv_2d_input_nhwc_filter_ohwi_poly_q - cpp_class_name: Conv2DInputNhwcFilterOhwiPolyQOp + name: conv_2d_nhwc_hwcf + cpp_class_name: Conv2DNhwcHwcfOp doc: |- - Performs a 2-D quantized convolution. + Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. Includes zero point - adjustment for quantization. + them to the same data type as the accumulator/output. structured_op: !LinalgStructuredOpConfig args: - !LinalgOperandDefConfig @@ -731,21 +730,13 @@ structured_op: !LinalgStructuredOpConfig usage: InputOperand type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s4, s5, s6, s3)> - - !LinalgOperandDefConfig - name: IZp - usage: InputOperand - type_var: I32 - - !LinalgOperandDefConfig - name: KZp - usage: InputOperand - type_var: I32 + -> (s4, s5, s3, s6)> - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s7, s8, s4)> + -> (s0, s7, s8, s6)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -761,22 +752,18 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d6)> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d5, d3, d4, d6)> + s9, s10, s11, s12] -> (d0, d1 * s9 + d4 * s11, d2 * s10 + d5 * s12, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> + s9, s10, s11, s12] -> (d4, d5, d6, d3)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d5)> + s9, s10, s11, s12] -> (d0, d1, d2, d3)> iterator_types: - parallel - parallel - parallel + - parallel - reduction - reduction - - parallel - reduction assignments: - !ScalarAssign @@ -792,37 +779,17 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: I - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: IZp + scalar_arg: I - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: K - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: KZp + scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata name: depthwise_conv_2d_input_nhwc_filter_hwc_poly @@ -906,13 +873,14 @@ structured_op: !LinalgStructuredOpConfig scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: depthwise_conv_2D_nchw - cpp_class_name: DepthwiseConv2DNchwOp + name: conv_2d_nhwc_hwcf_q + cpp_class_name: Conv2DNhwcHwcfQOp doc: |- - Performs depth-wise 2-D convolution. + Performs 2-D convolution with zero point offsets. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. + them to the same data type as the accumulator/output. This includes the zero + point offsets common to quantized operations. structured_op: !LinalgStructuredOpConfig args: - !LinalgOperandDefConfig @@ -927,12 +895,20 @@ structured_op: !LinalgStructuredOpConfig type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (s4, s5, s3, s6)> + - !LinalgOperandDefConfig + name: IZp + usage: InputOperand + type_var: I32 + - !LinalgOperandDefConfig + name: KZp + usage: InputOperand + type_var: I32 - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s7, s8, s3, s6)> + -> (s0, s7, s8, s6)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -948,19 +924,23 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d5)> + s9, s10, s11, s12] -> (d0, d1 * s9 + d4 * s11, d2 * s10 + d5 * s12, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d3, d4, d5, d6)> + s9, s10, s11, s12] -> (d4, d5, d6, d3)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d5, d6)> + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> (d0, d1, d2, d3)> iterator_types: - parallel - parallel - parallel + - parallel + - reduction - reduction - reduction - - parallel - - parallel assignments: - !ScalarAssign arg: O @@ -975,21 +955,41 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: I + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: I + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: IZp - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: K + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: K + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: KZp --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: depthwise_conv2D_nchw_q - cpp_class_name: DepthwiseConv2DNchwQOp + name: depthwise_conv2D_nchw + cpp_class_name: DepthwiseConv2DNchwOp doc: |- Performs depth-wise 2-D convolution. @@ -1009,14 +1009,6 @@ structured_op: !LinalgStructuredOpConfig type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (s4, s5, s3, s6)> - - !LinalgOperandDefConfig - name: IZp - usage: InputOperand - type_var: I32 - - !LinalgOperandDefConfig - name: KZp - usage: InputOperand - type_var: I32 - !LinalgOperandDefConfig name: O usage: OutputOperand @@ -1041,10 +1033,6 @@ structured_op: !LinalgStructuredOpConfig s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d5)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (d3, d4, d5, d6)> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> - - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> ()> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] -> (d0, d1, d2, d5, d6)> iterator_types: @@ -1069,43 +1057,23 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: I - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: IZp + scalar_arg: I - !ScalarExpression - scalar_apply: - fn_name: sub + symbolic_cast: + type_var: U operands: - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: K - - !ScalarExpression - symbolic_cast: - type_var: U - operands: - - !ScalarExpression - scalar_arg: KZp + scalar_arg: K --- !LinalgOpConfig metadata: !LinalgOpMetadata - name: conv_2d_nchw - cpp_class_name: Conv2DNchwOp + name: depthwise_conv2D_nchw_q + cpp_class_name: DepthwiseConv2DNchwQOp doc: |- - Performs 2-D convolution. + Performs depth-wise 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting them to the same data type as the accumulator/output. @@ -1122,13 +1090,21 @@ structured_op: !LinalgStructuredOpConfig usage: InputOperand type_var: T2 shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s4, s1, s5, s6)> + -> (s4, s5, s3, s6)> + - !LinalgOperandDefConfig + name: IZp + usage: InputOperand + type_var: I32 + - !LinalgOperandDefConfig + name: KZp + usage: InputOperand + type_var: I32 - !LinalgOperandDefConfig name: O usage: OutputOperand type_var: U shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12] - -> (s0, s4, s7, s8, s1)> + -> (s0, s7, s8, s3, s6)> - !LinalgOperandDefConfig name: strides usage: IndexAttribute @@ -1144,19 +1120,23 @@ structured_op: !LinalgStructuredOpConfig indexing_maps: !LinalgIndexingMapsConfig static_indexing_maps: - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d4, d2 * s9 + d5 * s11, d3 * s10 + d6 * s12)> + s9, s10, s11, s12] -> (d0, d1 * s9 + d3 * s11, d2 * s10 + d4 * s12, d5)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d1, d4, d5, d6)> + s9, s10, s11, s12] -> (d3, d4, d5, d6)> - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, - s9, s10, s11, s12] -> (d0, d1, d2, d3)> + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> ()> + - affine_map<(d0, d1, d2, d3, d4, d5, d6)[s0, s1, s2, s3, s4, s5, s6, s7, s8, + s9, s10, s11, s12] -> (d0, d1, d2, d5, d6)> iterator_types: - parallel - parallel - parallel - - parallel - - reduction - reduction - reduction + - parallel + - parallel assignments: - !ScalarAssign arg: O @@ -1171,17 +1151,37 @@ structured_op: !LinalgStructuredOpConfig fn_name: mul operands: - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: I + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: I + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: IZp - !ScalarExpression - symbolic_cast: - type_var: U + scalar_apply: + fn_name: sub operands: - !ScalarExpression - scalar_arg: K + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: K + - !ScalarExpression + symbolic_cast: + type_var: U + operands: + - !ScalarExpression + scalar_arg: KZp --- !LinalgOpConfig metadata: !LinalgOpMetadata name: pooling_nhwc_sum @@ -1896,3 +1896,4 @@ structured_op: !LinalgStructuredOpConfig operands: - !ScalarExpression scalar_arg: I + diff --git a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp index 37687337e10b..8e24f03a0dac 100644 --- a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp +++ b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp @@ -849,104 +849,213 @@ static LogicalResult reduceMatchAndRewriteHelper(Operation *op, uint64_t axis, return success(); } -static LogicalResult -convolutionMatchAndRewriterHelper(Operation *op, - ConversionPatternRewriter &rewriter) { - Location loc = op->getLoc(); - Value input = op->getOperand(0); - Value weight = op->getOperand(1); - Value bias = op->getOperand(2); +namespace { - ShapedType inputTy = input.getType().cast<ShapedType>(); - ShapedType weightTy = weight.getType().cast<ShapedType>(); - ShapedType biasTy = bias.getType().cast<ShapedType>(); - ShapedType resultTy = op->getResult(0).getType().cast<ShapedType>(); +template <typename SrcOp> +class PointwiseConverter : public OpRewritePattern<SrcOp> { +public: + using OpRewritePattern<SrcOp>::OpRewritePattern; - Type inputETy = inputTy.getElementType(); - Type resultETy = resultTy.getElementType(); - - auto padAttr = op->getAttr("pad").cast<ArrayAttr>(); - auto strideTosaAttr = op->getAttr("stride").cast<ArrayAttr>(); - auto dilationTosaAttr = op->getAttr("dilation").cast<ArrayAttr>(); - - bool isQuantized = op->hasAttr("quantization_info"); - IntegerAttr iZp; - IntegerAttr kZp; - if (isQuantized) { - auto quantizationInfo = - op->getAttr("quantization_info").cast<tosa::ConvOpQuantizationAttr>(); - iZp = rewriter.getI32IntegerAttr( - quantizationInfo.input_zp().getValue().getSExtValue()); - kZp = rewriter.getI32IntegerAttr( - quantizationInfo.weight_zp().getValue().getSExtValue()); + LogicalResult matchAndRewrite(SrcOp op, + PatternRewriter &rewriter) const final { + return elementwiseMatchAndRewriteHelper(op, rewriter); } +}; - if (!inputTy.hasStaticShape() || !weightTy.hasStaticShape() || - !biasTy.hasStaticShape() || !resultTy.hasStaticShape()) - return rewriter.notifyMatchFailure(op, - "tosa.conv ops require static shapes"); +class ConvConverter : public OpConversionPattern<tosa::Conv2DOp> { +public: + using OpConversionPattern<tosa::Conv2DOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(tosa::Conv2DOp op, ArrayRef<Value> args, + ConversionPatternRewriter &rewriter) const final { + Location loc = op->getLoc(); + Value input = op->getOperand(0); + Value weight = op->getOperand(1); + Value bias = op->getOperand(2); - auto weightShape = weightTy.getShape(); - auto resultShape = resultTy.getShape(); + ShapedType inputTy = input.getType().cast<ShapedType>(); + ShapedType weightTy = weight.getType().cast<ShapedType>(); + ShapedType biasTy = bias.getType().cast<ShapedType>(); + ShapedType resultTy = op->getResult(0).getType().cast<ShapedType>(); - // Apply padding as necessary. - Attribute zeroAttr = rewriter.getZeroAttr(inputETy); - llvm::SmallVector<int64_t> pad; - pad.resize(2, 0); - getValuesFromIntArrayAttribute(padAttr, pad); - pad.resize(pad.size() + 2, 0); + Type inputETy = inputTy.getElementType(); + Type resultETy = resultTy.getElementType(); - input = applyPad(loc, input, pad, zeroAttr, rewriter); + auto padAttr = op->getAttr("pad").cast<ArrayAttr>(); + auto strideTosaAttr = op->getAttr("stride").cast<ArrayAttr>(); + auto dilationTosaAttr = op->getAttr("dilation").cast<ArrayAttr>(); + bool isQuantized = op->hasAttr("quantization_info"); - // Broadcast the initial value to the output tensor before convolving. - SmallVector<AffineMap, 4> indexingMaps; - indexingMaps.push_back(AffineMap::get( - /*dimCount=*/resultTy.getRank(), /*symbolCount=*/0, - {rewriter.getAffineDimExpr(3)}, rewriter.getContext())); - indexingMaps.push_back(rewriter.getMultiDimIdentityMap(resultTy.getRank())); + if (!inputTy.hasStaticShape() || !weightTy.hasStaticShape() || + !biasTy.hasStaticShape() || !resultTy.hasStaticShape()) + return rewriter.notifyMatchFailure(op, + "tosa.conv ops require static shapes"); - Value initTensor = rewriter.create<linalg::InitTensorOp>( - loc, resultTy.getShape(), resultTy.getElementType()); + auto weightShape = weightTy.getShape(); - Value biasBroadcast = - rewriter - .create<linalg::GenericOp>( - loc, resultTy, bias, initTensor, indexingMaps, - getNParallelLoopsAttrs(resultTy.getRank()), - [&](OpBuilder &nestedBuilder, Location nestedLoc, - ValueRange args) { - nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]); - }) - .getResult(0); - - // Extract the attributes for convolution. - llvm::SmallVector<int64_t> stride, dilation; - getValuesFromIntArrayAttribute(strideTosaAttr, stride); - getValuesFromIntArrayAttribute(dilationTosaAttr, dilation); - - // Create the convolution op. - auto strideAttr = DenseIntElementsAttr::get( - RankedTensorType::get({2}, rewriter.getI64Type()), stride); - auto dilationAttr = DenseIntElementsAttr::get( - RankedTensorType::get({2}, rewriter.getI64Type()), dilation); - - if (isa<tosa::Conv2DOp>(op) && !isQuantized) { - rewriter.replaceOpWithNewOp<linalg::Conv2DInputNhwcFilterOhwiPolyOp>( + // Apply padding as necessary. + Attribute zeroAttr = rewriter.getZeroAttr(inputETy); + llvm::SmallVector<int64_t> pad; + pad.resize(2, 0); + getValuesFromIntArrayAttribute(padAttr, pad); + pad.resize(pad.size() + 2, 0); + input = applyPad(loc, input, pad, zeroAttr, rewriter); + + // Transpose the kernel to match dimension ordering of the linalg + // convolution operation. + // TODO(suderman): See if this can be efficiently folded - check whether + // the input is used anywhere else, if not fold the constant. + SmallVector<int64_t> weightPerm{1, 2, 3, 0}; + SmallVector<int64_t> newWeightShape{weightShape[1], weightShape[2], + weightShape[3], weightShape[0]}; + auto weightPermAttr = DenseIntElementsAttr::get( + RankedTensorType::get({4}, rewriter.getI64Type()), weightPerm); + Value weightPermValue = rewriter.create<ConstantOp>(loc, weightPermAttr); + Type newWeightTy = + RankedTensorType::get(newWeightShape, weightTy.getElementType()); + weight = rewriter.create<tosa::TransposeOp>(loc, newWeightTy, weight, + weightPermValue); + + // Broadcast the initial value to the output tensor before convolving. + SmallVector<AffineMap, 4> indexingMaps; + indexingMaps.push_back(AffineMap::get( + /*dimCount=*/resultTy.getRank(), /*symbolCount=*/0, + {rewriter.getAffineDimExpr(3)}, rewriter.getContext())); + indexingMaps.push_back(rewriter.getMultiDimIdentityMap(resultTy.getRank())); + + Value initTensor = rewriter.create<linalg::InitTensorOp>( + loc, resultTy.getShape(), resultETy); + + Value biasBroadcast = + rewriter + .create<linalg::GenericOp>( + loc, resultTy, bias, initTensor, indexingMaps, + getNParallelLoopsAttrs(resultTy.getRank()), + [&](OpBuilder &nestedBuilder, Location nestedLoc, + ValueRange args) { + nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]); + }) + .getResult(0); + + // Extract the attributes for convolution. + llvm::SmallVector<int64_t> stride, dilation; + getValuesFromIntArrayAttribute(strideTosaAttr, stride); + getValuesFromIntArrayAttribute(dilationTosaAttr, dilation); + + // Create the convolution op. + auto strideAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), stride); + auto dilationAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), dilation); + + Value conv; + if (isQuantized) { + auto quantizationInfo = + op->getAttr("quantization_info").cast<tosa::ConvOpQuantizationAttr>(); + auto iZp = rewriter.getI32IntegerAttr( + quantizationInfo.input_zp().getValue().getSExtValue()); + auto kZp = rewriter.getI32IntegerAttr( + quantizationInfo.weight_zp().getValue().getSExtValue()); + + auto iZpVal = rewriter.create<ConstantOp>(loc, iZp); + auto kZpVal = rewriter.create<ConstantOp>(loc, kZp); + rewriter.replaceOpWithNewOp<linalg::Conv2DNhwcHwcfQOp>( + op, resultTy, ValueRange{input, weight, iZpVal, kZpVal}, + ValueRange{biasBroadcast}, strideAttr, dilationAttr); + return success(); + } + + rewriter.replaceOpWithNewOp<linalg::Conv2DNhwcHwcfOp>( op, resultTy, ValueRange{input, weight}, ValueRange{biasBroadcast}, strideAttr, dilationAttr); return success(); } +}; - if (isa<tosa::Conv2DOp>(op) && isQuantized) { - auto iZpVal = rewriter.create<ConstantOp>(loc, iZp); - auto kZpVal = rewriter.create<ConstantOp>(loc, kZp); - rewriter.replaceOpWithNewOp<linalg::Conv2DInputNhwcFilterOhwiPolyQOp>( - op, resultTy, ValueRange{input, weight, iZpVal, kZpVal}, - ValueRange{biasBroadcast}, strideAttr, dilationAttr); - return success(); - } +class DepthwiseConvConverter + : public OpConversionPattern<tosa::DepthwiseConv2DOp> { +public: + using OpConversionPattern<tosa::DepthwiseConv2DOp>::OpConversionPattern; + LogicalResult + matchAndRewrite(tosa::DepthwiseConv2DOp op, ArrayRef<Value> args, + ConversionPatternRewriter &rewriter) const final { + Location loc = op->getLoc(); + Value input = op->getOperand(0); + Value weight = op->getOperand(1); + Value bias = op->getOperand(2); + + ShapedType inputTy = input.getType().cast<ShapedType>(); + ShapedType weightTy = weight.getType().cast<ShapedType>(); + ShapedType biasTy = bias.getType().cast<ShapedType>(); + ShapedType resultTy = op->getResult(0).getType().cast<ShapedType>(); - if (isa<tosa::DepthwiseConv2DOp>(op)) { + Type inputETy = inputTy.getElementType(); + Type resultETy = resultTy.getElementType(); + + auto padAttr = op->getAttr("pad").cast<ArrayAttr>(); + auto strideTosaAttr = op->getAttr("stride").cast<ArrayAttr>(); + auto dilationTosaAttr = op->getAttr("dilation").cast<ArrayAttr>(); + + bool isQuantized = op->hasAttr("quantization_info"); + IntegerAttr iZp; + IntegerAttr kZp; + if (isQuantized) { + auto quantizationInfo = + op->getAttr("quantization_info").cast<tosa::ConvOpQuantizationAttr>(); + iZp = rewriter.getI32IntegerAttr( + quantizationInfo.input_zp().getValue().getSExtValue()); + kZp = rewriter.getI32IntegerAttr( + quantizationInfo.weight_zp().getValue().getSExtValue()); + } + + if (!inputTy.hasStaticShape() || !weightTy.hasStaticShape() || + !biasTy.hasStaticShape() || !resultTy.hasStaticShape()) + return rewriter.notifyMatchFailure(op, + "tosa.conv ops require static shapes"); + + auto weightShape = weightTy.getShape(); + auto resultShape = resultTy.getShape(); + + // Apply padding as necessary. + Attribute zeroAttr = rewriter.getZeroAttr(inputETy); + llvm::SmallVector<int64_t> pad; + pad.resize(2, 0); + getValuesFromIntArrayAttribute(padAttr, pad); + pad.resize(pad.size() + 2, 0); + + input = applyPad(loc, input, pad, zeroAttr, rewriter); + + // Broadcast the initial value to the output tensor before convolving. + SmallVector<AffineMap, 4> indexingMaps; + indexingMaps.push_back(AffineMap::get( + /*dimCount=*/resultTy.getRank(), /*symbolCount=*/0, + {rewriter.getAffineDimExpr(3)}, rewriter.getContext())); + indexingMaps.push_back(rewriter.getMultiDimIdentityMap(resultTy.getRank())); + + Value initTensor = + rewriter.create<linalg::InitTensorOp>(loc, resultShape, resultETy); + + Value biasBroadcast = + rewriter + .create<linalg::GenericOp>( + loc, resultTy, bias, initTensor, indexingMaps, + getNParallelLoopsAttrs(resultTy.getRank()), + [&](OpBuilder &nestedBuilder, Location nestedLoc, + ValueRange args) { + nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]); + }) + .getResult(0); + + // Extract the attributes for convolution. + llvm::SmallVector<int64_t> stride, dilation; + getValuesFromIntArrayAttribute(strideTosaAttr, stride); + getValuesFromIntArrayAttribute(dilationTosaAttr, dilation); + + // Create the convolution op. + auto strideAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), stride); + auto dilationAttr = DenseIntElementsAttr::get( + RankedTensorType::get({2}, rewriter.getI64Type()), dilation); ShapedType linalgConvTy = RankedTensorType::get({resultShape[0], resultShape[1], resultShape[2], weightShape[2], weightShape[3]}, @@ -976,32 +1085,6 @@ convolutionMatchAndRewriterHelper(Operation *op, rewriter.replaceOp(op, reshape); return success(); } - - return failure(); -} - -namespace { - -template <typename SrcOp> -class PointwiseConverter : public OpRewritePattern<SrcOp> { -public: - using OpRewritePattern<SrcOp>::OpRewritePattern; - - LogicalResult matchAndRewrite(SrcOp op, - PatternRewriter &rewriter) const final { - return elementwiseMatchAndRewriteHelper(op, rewriter); - } -}; - -template <typename T> -class ConvConverter : public OpConversionPattern<T> { -public: - using OpConversionPattern<T>::OpConversionPattern; - LogicalResult - matchAndRewrite(T op, ArrayRef<Value> args, - ConversionPatternRewriter &rewriter) const final { - return convolutionMatchAndRewriterHelper(op, rewriter); - } }; class TransposeConvConverter @@ -2528,8 +2611,8 @@ void mlir::tosa::populateTosaToLinalgOnTensorsConversionPatterns( ReduceConverter<tosa::ReduceProdOp>, ArgMaxConverter, ConcatConverter, - ConvConverter<tosa::Conv2DOp>, - ConvConverter<tosa::DepthwiseConv2DOp>, + ConvConverter, + DepthwiseConvConverter, TransposeConvConverter, GatherConverter, PadConverter, diff --git a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py index fc92c196a059..b9faeeb831df 100644 --- a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py +++ b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py @@ -144,49 +144,39 @@ def dot( implements(ContractionOpInterface) C[None] += cast(U, A[D.m]) * cast(U, B[D.m]) - @linalg_structured_op -def conv_2d_input_nhwc_filter_ohwi_poly( - I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), - K=TensorDef(T2, S.OC, S.KH, S.KW, S.IC), - O=TensorDef(U, S.N, S.OH, S.OW, S.OC, output=True), +def conv_2d_nchw( + I=TensorDef(T1, S.N, S.C, S.IH, S.IW), + K=TensorDef(T2, S.F, S.C, S.KH, S.KW), + O=TensorDef(U, S.N, S.F, S.OH, S.OW, S.C, output=True), strides=AttributeDef(S.SH, S.SW), dilations=AttributeDef(S.DH, S.DW)): - """Performs a 2-D convolution. + """Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting them to the same data type as the accumulator/output. """ - domain(D.n, D.oh, D.ow, D.kh, D.kw, D.oc, D.ic) - O[D.n, D.oh, D.ow, D.oc] += cast( - U, I[D.n, - D.oh * S.SH + D.kh * S.DH, - D.ow * S.SW + D.kw * S.DW, - D.ic]) * cast(U, K[D.oc, D.kh, D.kw, D.ic]) + domain(D.n, D.f, D.oh, D.ow, D.c, D.kh, D.kw) + O[D.n, D.f, D.oh, D.ow] += cast( + U, I[D.n, D.c, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, + ]) * cast(U, K[D.f, D.c, D.kh, D.kw]) @linalg_structured_op -def conv_2d_input_nhwc_filter_ohwi_poly_q( - I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), - K=TensorDef(T2, S.OC, S.KH, S.KW, S.IC), - IZp=ScalarDef(I32), - KZp=ScalarDef(I32), - O=TensorDef(U, S.N, S.OH, S.OW, S.OC, output=True), +def conv_2d_nhwc_hwcf( + I=TensorDef(T1, S.N, S.IH, S.IW, S.C), + K=TensorDef(T2, S.KH, S.KW, S.C, S.F), + O=TensorDef(U, S.N, S.OH, S.OW, S.F, output=True), strides=AttributeDef(S.SH, S.SW), dilations=AttributeDef(S.DH, S.DW)): - """Performs a 2-D quantized convolution. + """Performs 2-D convolution. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. Includes zero point - adjustment for quantization. + them to the same data type as the accumulator/output. """ - domain(D.n, D.oh, D.ow, D.kh, D.kw, D.oc, D.ic) - O[D.n, D.oh, D.ow, D.oc] += ((cast( - U, I[D.n, - D.oh * S.SH + D.kh * S.DH, - D.ow * S.SW + D.kw * S.DW, - D.ic]) - cast(U, IZp)) * - (cast(U, K[D.oc, D.kh, D.kw, D.ic]) - cast(U, KZp))) - + domain(D.n, D.oh, D.ow, D.f, D.kh, D.kw, D.c) + O[D.n, D.oh, D.ow, D.f] += cast( + U, I[D.n, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, D.c + ]) * cast(U, K[D.kh, D.kw, D.c, D.f]) @linalg_structured_op def depthwise_conv_2d_input_nhwc_filter_hwc_poly( @@ -206,24 +196,27 @@ def depthwise_conv_2d_input_nhwc_filter_hwc_poly( D.c]) * cast(U, K[D.kh, D.kw, D.c]) @linalg_structured_op -def conv_2d_nchw( - I=TensorDef(T1, S.N, S.C, S.IH, S.IW), - K=TensorDef(T2, S.F, S.C, S.KH, S.KW), - O=TensorDef(U, S.N, S.F, S.OH, S.OW, S.C, output=True), +def conv_2d_nhwc_hwcf_q( + I=TensorDef(T1, S.N, S.IH, S.IW, S.C), + K=TensorDef(T2, S.KH, S.KW, S.C, S.F), + IZp=ScalarDef(I32), + KZp=ScalarDef(I32), + O=TensorDef(U, S.N, S.OH, S.OW, S.F, output=True), strides=AttributeDef(S.SH, S.SW), dilations=AttributeDef(S.DH, S.DW)): - """Performs 2-D convolution. + """Performs 2-D convolution with zero point offsets. Numeric casting is performed on the operands to the inner multiply, promoting - them to the same data type as the accumulator/output. + them to the same data type as the accumulator/output. This includes the zero + point offsets common to quantized operations. """ - domain(D.n, D.f, D.oh, D.ow, D.c, D.kh, D.kw) - O[D.n, D.f, D.oh, D.ow] += cast( - U, I[D.n, D.c, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, - ]) * cast(U, K[D.f, D.c, D.kh, D.kw]) + domain(D.n, D.oh, D.ow, D.f, D.kh, D.kw, D.c) + O[D.n, D.oh, D.ow, D.f] += (cast( + U, I[D.n, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, D.c + ]) - cast(U, IZp)) * (cast(U, K[D.kh, D.kw, D.c, D.f]) - cast(U, KZp)) - -def depthwise_conv2D_nchw( #TODO: Fix name +@linalg_structured_op +def depthwise_conv2D_nchw( I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), K=TensorDef(T2, S.KH, S.KW, S.IC, S.CM), O=TensorDef(U, S.N, S.OH, S.OW, S.IC, S.CM, output=True), @@ -239,8 +232,8 @@ def depthwise_conv2D_nchw( #TODO: Fix name U, I[D.n, D.oh * S.SH + D.kh * S.DH, D.ow * S.SW + D.kw * S.DW, D.ic]) * cast(U, K[D.kh, D.kw, D.ic, D.cm]) - -def depthwise_conv2D_nchw_q( #TODO: Fix name +@linalg_structured_op +def depthwise_conv2D_nchw_q( I=TensorDef(T1, S.N, S.IH, S.IW, S.IC), K=TensorDef(T2, S.KH, S.KW, S.IC, S.CM), IZp=ScalarDef(I32), diff --git a/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir b/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir index 309846d66c94..3c89de395187 100644 --- a/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir +++ b/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir @@ -1176,14 +1176,19 @@ func @avg_pool(%arg0: tensor<1x6x34x62xf32>) -> (tensor<1x5x33x62xf32>) { // ----- -// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2, d3) -> (d3)> -// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> +// CHECK: #[[MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (d3, d0, d1, d2)> +// CHECK: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> +// CHECK: #[[MAP2:.+]] = affine_map<(d0, d1, d2, d3) -> (d3)> -// CHECK-LABEL: @conv2d_f32 +// CHECK-LABEL @conv2d_f32 func @conv2d_f32(%input: tensor<1x49x42x27xf32>, %weights: tensor<28x3x3x27xf32>, %bias: tensor<28xf32>) -> () { - // CHECK: %[[INIT:.+]] = linalg.init_tensor [1, 45, 40, 28] - // CHECK: %[[BROADCAST:.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<28xf32>) outs(%[[INIT]] : tensor<1x45x40x28xf32>) - // CHECK: linalg.conv_2d_input_nhwc_filter_ohwi_poly {dilations = dense<[2, 1]> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %arg1 : tensor<1x49x42x27xf32>, tensor<28x3x3x27xf32>) outs(%[[BROADCAST]] : tensor<1x45x40x28xf32>) + // CHECK: %[[W_IN:.+]] = linalg.init_tensor [3, 3, 27, 28] + // CHECK: %[[W:.+]] = linalg.generic {indexing_maps = [#[[MAP0]], #[[MAP1]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg1 : tensor<28x3x3x27xf32>) outs(%[[W_IN]] : tensor<3x3x27x28xf32>) + // CHECK: linalg.yield %arg3 : f32 + // CHECK: %[[B_IN:.+]] = linalg.init_tensor [1, 45, 40, 28] + // CHECK: %[[B:.+]] = linalg.generic {indexing_maps = [#[[MAP2]], #[[MAP1]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<28xf32>) outs(%[[B_IN]] : tensor<1x45x40x28xf32>) + // CHECK: linalg.yield %arg3 : f32 + // CHECK: %[[CONV:.+]] = linalg.conv_2d_nhwc_hwcf {dilations = dense<[2, 1]> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %1 : tensor<1x49x42x27xf32>, tensor<3x3x27x28xf32>) outs(%[[B]] : tensor<1x45x40x28xf32>) %0 = "tosa.conv2d"(%input, %weights, %bias) {pad = [0, 0, 0, 0], stride = [1, 1], dilation = [2, 1]} : (tensor<1x49x42x27xf32>, tensor<28x3x3x27xf32>, tensor<28xf32>) -> (tensor<1x45x40x28xf32>) return } @@ -1192,26 +1197,17 @@ func @conv2d_f32(%input: tensor<1x49x42x27xf32>, %weights: tensor<28x3x3x27xf32> // CHECK-LABEL: @conv2d_padded_f32 func @conv2d_padded_f32(%input: tensor<1x47x40x28xf32>, %weights: tensor<28x3x3x28xf32>, %bias: tensor<28xf32>) -> () { - // CHECK: linalg.pad_tensor %arg0 - // CHECK: linalg.conv_2d_input_nhwc_filter_ohwi_poly </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig - Build # 16 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allyesconfig Culprit: <cut> commit 342f43af70dbc74f8629381998f92c060e1763a2 Author: Maurizio Lombardi <mlombard(a)redhat.com> Date: Thu Jul 29 15:52:50 2021 +0200 iscsi_ibft: fix crash due to KASLR physical memory remapping Starting with commit a799c2bd29d1 ("x86/setup: Consolidate early memory reservations") memory reservations have been moved earlier during the boot process, before the execution of the Kernel Address Space Layout Randomization code. setup_arch() calls the iscsi_ibft's find_ibft_region() function to find and reserve the memory dedicated to the iBFT and this function also saves a virtual pointer to the iBFT table for later use. The problem is that if KALSR is active, the physical memory gets remapped somewhere else in the virtual address space and the pointer is no longer valid, this will cause a kernel panic when the iscsi driver tries to dereference it. iBFT detected. BUG: unable to handle page fault for address: ffff888000099fd8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ..snip.. Call Trace: ? ibft_create_kobject+0x1d2/0x1d2 [iscsi_ibft] do_one_initcall+0x44/0x1d0 ? kmem_cache_alloc_trace+0x119/0x220 do_init_module+0x5c/0x270 __do_sys_init_module+0x12e/0x1b0 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this bug by saving the address of the physical location of the ibft; later the driver will use isa_bus_to_virt() to get the correct virtual address. N.B. On each reboot KASLR randomizes the virtual addresses so assuming phys_to_virt before KASLR does its deed is incorrect. Simplify the code by renaming find_ibft_region() to reserve_ibft_region() and remove all the wrappers. Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com> Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad(a)kernel.org> </cut> Results regressed to (for first_bad == 342f43af70dbc74f8629381998f92c060e1763a2) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 19722 # First few build errors in logs: from (for last_good == 62fb9874f5da54fdb243003b386128037319b219) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 19795 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-342f43af70dbc74f8629381998f92c060e1763a2 cd investigate-linux-342f43af70dbc74f8629381998f92c060e1763a2 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 342f43af70dbc74f8629381998f92c060e1763a2 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 62fb9874f5da54fdb243003b386128037319b219 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Full commit (up to 1000 lines): <cut> commit 342f43af70dbc74f8629381998f92c060e1763a2 Author: Maurizio Lombardi <mlombard(a)redhat.com> Date: Thu Jul 29 15:52:50 2021 +0200 iscsi_ibft: fix crash due to KASLR physical memory remapping Starting with commit a799c2bd29d1 ("x86/setup: Consolidate early memory reservations") memory reservations have been moved earlier during the boot process, before the execution of the Kernel Address Space Layout Randomization code. setup_arch() calls the iscsi_ibft's find_ibft_region() function to find and reserve the memory dedicated to the iBFT and this function also saves a virtual pointer to the iBFT table for later use. The problem is that if KALSR is active, the physical memory gets remapped somewhere else in the virtual address space and the pointer is no longer valid, this will cause a kernel panic when the iscsi driver tries to dereference it. iBFT detected. BUG: unable to handle page fault for address: ffff888000099fd8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ..snip.. Call Trace: ? ibft_create_kobject+0x1d2/0x1d2 [iscsi_ibft] do_one_initcall+0x44/0x1d0 ? kmem_cache_alloc_trace+0x119/0x220 do_init_module+0x5c/0x270 __do_sys_init_module+0x12e/0x1b0 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this bug by saving the address of the physical location of the ibft; later the driver will use isa_bus_to_virt() to get the correct virtual address. N.B. On each reboot KASLR randomizes the virtual addresses so assuming phys_to_virt before KASLR does its deed is incorrect. Simplify the code by renaming find_ibft_region() to reserve_ibft_region() and remove all the wrappers. Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com> Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad(a)kernel.org> --- arch/x86/kernel/setup.c | 10 -------- drivers/firmware/iscsi_ibft.c | 10 +++++--- drivers/firmware/iscsi_ibft_find.c | 48 ++++++++++++++------------------------ include/linux/iscsi_ibft.h | 18 ++++++-------- 4 files changed, 32 insertions(+), 54 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 1e720626069a..b6a62af06a9f 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -571,16 +571,6 @@ void __init reserve_standard_io_resources(void) } -static __init void reserve_ibft_region(void) -{ - unsigned long addr, size = 0; - - addr = find_ibft_region(&size); - - if (size) - memblock_reserve(addr, size); -} - static bool __init snb_gfx_workaround_needed(void) { #ifdef CONFIG_PCI diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c index 7127a04bca19..612a59e213df 100644 --- a/drivers/firmware/iscsi_ibft.c +++ b/drivers/firmware/iscsi_ibft.c @@ -84,8 +84,10 @@ MODULE_DESCRIPTION("sysfs interface to BIOS iBFT information"); MODULE_LICENSE("GPL"); MODULE_VERSION(IBFT_ISCSI_VERSION); +static struct acpi_table_ibft *ibft_addr; + #ifndef CONFIG_ISCSI_IBFT_FIND -struct acpi_table_ibft *ibft_addr; +phys_addr_t ibft_phys_addr; #endif struct ibft_hdr { @@ -858,11 +860,13 @@ static int __init ibft_init(void) int rc = 0; /* - As on UEFI systems the setup_arch()/find_ibft_region() + As on UEFI systems the setup_arch()/reserve_ibft_region() is called before ACPI tables are parsed and it only does legacy finding. */ - if (!ibft_addr) + if (ibft_phys_addr) + ibft_addr = isa_bus_to_virt(ibft_phys_addr); + else acpi_find_ibft_region(); if (ibft_addr) { diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c index 64bb94523281..a0594590847d 100644 --- a/drivers/firmware/iscsi_ibft_find.c +++ b/drivers/firmware/iscsi_ibft_find.c @@ -31,8 +31,8 @@ /* * Physical location of iSCSI Boot Format Table. */ -struct acpi_table_ibft *ibft_addr; -EXPORT_SYMBOL_GPL(ibft_addr); +phys_addr_t ibft_phys_addr; +EXPORT_SYMBOL_GPL(ibft_phys_addr); static const struct { char *sign; @@ -47,13 +47,24 @@ static const struct { #define VGA_MEM 0xA0000 /* VGA buffer */ #define VGA_SIZE 0x20000 /* 128kB */ -static int __init find_ibft_in_mem(void) +/* + * Routine used to find and reserve the iSCSI Boot Format Table + */ +void __init reserve_ibft_region(void) { unsigned long pos; unsigned int len = 0; void *virt; int i; + ibft_phys_addr = 0; + + /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will + * only use ACPI for this + */ + if (efi_enabled(EFI_BOOT)) + return; + for (pos = IBFT_START; pos < IBFT_END; pos += 16) { /* The table can't be inside the VGA BIOS reserved space, * so skip that area */ @@ -70,35 +81,12 @@ static int __init find_ibft_in_mem(void) /* if the length of the table extends past 1M, * the table cannot be valid. */ if (pos + len <= (IBFT_END-1)) { - ibft_addr = (struct acpi_table_ibft *)virt; - pr_info("iBFT found at 0x%lx.\n", pos); - goto done; + ibft_phys_addr = pos; + memblock_reserve(ibft_phys_addr, PAGE_ALIGN(len)); + pr_info("iBFT found at 0x%lx.\n", ibft_phys_addr); + return; } } } } -done: - return len; -} -/* - * Routine used to find the iSCSI Boot Format Table. The logical - * kernel address is set in the ibft_addr global variable. - */ -unsigned long __init find_ibft_region(unsigned long *sizep) -{ - ibft_addr = NULL; - - /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will - * only use ACPI for this */ - - if (!efi_enabled(EFI_BOOT)) - find_ibft_in_mem(); - - if (ibft_addr) { - *sizep = PAGE_ALIGN(ibft_addr->header.length); - return (u64)virt_to_phys(ibft_addr); - } - - *sizep = 0; - return 0; } diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h index b7b45ca82bea..790e7fcfc1a6 100644 --- a/include/linux/iscsi_ibft.h +++ b/include/linux/iscsi_ibft.h @@ -13,26 +13,22 @@ #ifndef ISCSI_IBFT_H #define ISCSI_IBFT_H -#include <linux/acpi.h> +#include <linux/types.h> /* - * Logical location of iSCSI Boot Format Table. - * If the value is NULL there is no iBFT on the machine. + * Physical location of iSCSI Boot Format Table. + * If the value is 0 there is no iBFT on the machine. */ -extern struct acpi_table_ibft *ibft_addr; +extern phys_addr_t ibft_phys_addr; /* * Routine used to find and reserve the iSCSI Boot Format Table. The - * mapped address is set in the ibft_addr variable. + * physical address is set in the ibft_phys_addr variable. */ #ifdef CONFIG_ISCSI_IBFT_FIND -unsigned long find_ibft_region(unsigned long *sizep); +void reserve_ibft_region(void); #else -static inline unsigned long find_ibft_region(unsigned long *sizep) -{ - *sizep = 0; - return 0; -} +static inline void reserve_ibft_region(void) {} #endif #endif /* ISCSI_IBFT_H */ </cut>

3 years, 10 months

2
1
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 20 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 4cd8dd3fe05e099792e1494dedd074eb5ba289b6 Author: Amy Kwan <amy.kwan1(a)ibm.com> Date: Sun Aug 22 13:46:52 2021 -0500 [scudo][standalone] Link tests against libatomic if libatomic exists It is possible that libatomic does not exist on some systems. This patch updates the scudo standalone tests to link against libatomic if the library exists. This is an update to the original patch: https://reviews.llvm.org/D64134 and aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431. Differential Revision: https://reviews.llvm.org/D108503 </cut> Results regressed to (for first_bad == 4cd8dd3fe05e099792e1494dedd074eb5ba289b6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-4cd8dd3fe05e099792e1494dedd074eb5ba289b6/results_id: 1 # 447.dealII,dealII_base.default regressed by 103 from (for last_good == d8d84c9df82fc114f2b22a533a8183065ca1a2e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-d8d84c9df82fc114f2b22a533a8183065ca1a2e0/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4515 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4510 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4cd8dd3fe05e099792e1494dedd074eb5ba289b6 cd investigate-llvm-4cd8dd3fe05e099792e1494dedd074eb5ba289b6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4cd8dd3fe05e099792e1494dedd074eb5ba289b6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d8d84c9df82fc114f2b22a533a8183065ca1a2e0 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4cd8dd3fe05e099792e1494dedd074eb5ba289b6 Author: Amy Kwan <amy.kwan1(a)ibm.com> Date: Sun Aug 22 13:46:52 2021 -0500 [scudo][standalone] Link tests against libatomic if libatomic exists It is possible that libatomic does not exist on some systems. This patch updates the scudo standalone tests to link against libatomic if the library exists. This is an update to the original patch: https://reviews.llvm.org/D64134 and aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431. Differential Revision: https://reviews.llvm.org/D108503 --- compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt index f4186eba1688..eaa47a04a179 100644 --- a/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt +++ b/compiler-rt/lib/scudo/standalone/tests/CMakeLists.txt @@ -39,7 +39,10 @@ foreach(lib ${SANITIZER_TEST_CXX_LIBRARIES}) endforeach() list(APPEND LINK_FLAGS -pthread) # Linking against libatomic is required with some compilers -list(APPEND LINK_FLAGS -latomic) +check_library_exists(atomic __atomic_load_8 "" COMPILER_RT_HAS_LIBATOMIC) +if (COMPILER_RT_HAS_LIBATOMIC) + list(APPEND LINK_FLAGS -latomic) +endif() set(SCUDO_TEST_HEADERS scudo_unit_test.h </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/gnu-release-aarch64-spec2k6-Os_LTO - Build # 3 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_apm/gnu-release-aarch64-spec2k6-Os_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_apm/gnu-release-aarch64-spec2k6-Os_LTO Culprit: <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. </cut> Results regressed to (for first_bad == ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os_LTO artifacts/build-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0/results_id: 1 # 450.soplex,soplex_base.default regressed by 102 from (for last_good == a0a0499b8bb920fdd98e791804812f001f0b4fe8) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os_LTO artifacts/build-a0a0499b8bb920fdd98e791804812f001f0b4fe8/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Results ID of last_good: apm_64/tcwg_bmk_gnu_apm/bisect-gnu-release-aarch64-spec2k6-Os_LTO/4497 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Results ID of first_bad: apm_64/tcwg_bmk_gnu_apm/bisect-gnu-release-aarch64-spec2k6-Os_LTO/4482 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 cd investigate-gcc-ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a0a0499b8bb920fdd98e791804812f001f0b4fe8 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-release-a… Full commit (up to 1000 lines): <cut> commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0 Author: Richard Biener <rguenther(a)suse.de> Date: Tue Aug 17 08:38:35 2021 +0200 tree-optimization/101868 - avoid PRE of trapping mems across calls This backports a fix for the omission of a check of trapping mems when hoisting them across calls that might not return. This was originally done as part of a fix to handle const functions that throw properly. 2021-08-17 Richard Biener <rguenther(a)suse.de> PR tree-optimization/101373 PR tree-optimization/101868 * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping references when the BB may not return. * gcc.dg/lto/pr101868_0.c: New testcase. * gcc.dg/lto/pr101868_1.c: Likewise. * gcc.dg/lto/pr101868_2.c: Likewise. * gcc.dg/lto/pr101868_3.c: Likewise. --- gcc/testsuite/gcc.dg/lto/pr101868_0.c | 33 +++++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_1.c | 23 +++++++++++++++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_2.c | 11 +++++++++++ gcc/testsuite/gcc.dg/lto/pr101868_3.c | 8 ++++++++ gcc/tree-ssa-pre.c | 7 +++++++ 5 files changed, 82 insertions(+) diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_0.c b/gcc/testsuite/gcc.dg/lto/pr101868_0.c new file mode 100644 index 00000000000..c84d19b0267 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_0.c @@ -0,0 +1,33 @@ +/* { dg-lto-do run } */ +/* { dg-lto-options { "-O2 -fno-strict-aliasing -flto" } } */ + +typedef unsigned long VALUE; + +__attribute__ ((cold)) +void rb_check_type(VALUE, int); + +static VALUE +repro(VALUE dummy, VALUE hash) +{ + if (hash == 0) { + rb_check_type(hash, 1); + } + else if (*(long *)hash) { + rb_check_type(hash, 1); + } + + + return *(long *)hash; +} + +static VALUE (*that)(VALUE dummy, VALUE hash) = repro; + +int +main(int argc, char **argv) +{ + argc--; + that(0, argc); + + rb_check_type(argc, argc); + +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_1.c b/gcc/testsuite/gcc.dg/lto/pr101868_1.c new file mode 100644 index 00000000000..146c14abc76 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_1.c @@ -0,0 +1,23 @@ +typedef unsigned long VALUE; + + +__attribute__ ((noreturn)) void rexc_raise(VALUE mesg); + +VALUE rb_donothing(VALUE klass); + +static void +funexpected_type(VALUE x, int xt, int t) +{ + rexc_raise(rb_donothing(0)); +} + +__attribute__ ((cold)) +void +rb_check_type(VALUE x, int t) +{ + int xt; + + if (x == 0) { + funexpected_type(x, xt, t); + } +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_2.c b/gcc/testsuite/gcc.dg/lto/pr101868_2.c new file mode 100644 index 00000000000..e6f01b23f45 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_2.c @@ -0,0 +1,11 @@ +typedef unsigned long VALUE; + +static void thing(void) {} +static void (*ptr)(void) = &thing; + +VALUE +rb_donothing(VALUE klass) +{ + ptr(); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/lto/pr101868_3.c b/gcc/testsuite/gcc.dg/lto/pr101868_3.c new file mode 100644 index 00000000000..61217625be7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr101868_3.c @@ -0,0 +1,8 @@ +typedef unsigned long VALUE; + +__attribute__((noreturn)) +void +rexc_raise(VALUE mesg) +{ + __builtin_exit(0); +} diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 04ec4fbaeec..2aedc31e1d7 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2070,6 +2070,13 @@ prune_clobbered_mems (bitmap_set_t set, basic_block block) && value_dies_in_block_x (expr, block)))) to_remove = i; } + /* If the REFERENCE may trap make sure the block does not contain + a possible exit point. + ??? This is overly conservative if we translate AVAIL_OUT + as the available expression might be after the exit point. */ + if (BB_MAY_NOTRETURN (block) + && vn_reference_may_trap (ref)) + to_remove = i; } else if (expr->kind == NARY) { </cut>

3 years, 10 months

1
0
0 0

[ACTIVITY] report week ending 3 Sep

by Peter Maydell

Progress (short week, 2 days) * UM-2 [QEMU upstream maintainership] + Lots of code review and getting things upstream after trunk reopened + Wrote up a first draft of how to handle merging pullreqs, so that other people can share this job with me + Sent a patchset that allows board models to mark some buses as not suitable for plugging in user-created devices -- this avoids problems with i2c devices appearing on buses that are supposed to be for on-board devices only in the MPS2/MPS3 machines * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + MVE is now enabled upstream. (There are still some loose ends to do under this JIRA task, though.) + Sent a patchset that makes some of the easier codegen optimizations for the no-predication case. (Code review spotted an issue which might be painful to sort out -- we'll see next week...) -- PMM

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-arm-spec2k6-Oz - Build # 4 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *binutils* in CI configuration tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz Culprit: <cut> commit 590d3faada8a12bf0937bbf68413956dc6a339a9 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 10:30:26 2021 +0200 [gdb/testsuite] Improve argument syntax of proc arange The current syntax of proc arange is: ... proc arange { arange_start arange_length {comment ""} {seg_sel ""} } { ... and a typical call looks like: ... arange $start $len ... This style is somewhat annoying because if you want to specify the last parameter, you need to give the default values of all the other optional ones before as well: ... arange $start $len "" $seg_sel ... Update the syntax to: ... proc arange { options arange_start arange_length } { parse_options { { comment "" } { seg_sel "" } } ... such that a typical call looks like: ... arange {} $start $len ... and a call using seg_sel looks like: ... arange { seg_sel $seg_sel } $start $len ... Also update proc aranges, which already has an options argument, to use the new proc parse_options. Tested on x86_64-linux. Co-Authored-By: Simon Marchi <simon.marchi(a)polymtl.ca> </cut> Results regressed to (for first_bad == 590d3faada8a12bf0937bbf68413956dc6a339a9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-590d3faada8a12bf0937bbf68413956dc6a339a9/results_id: 1 # 447.dealII,[.] contract<3> regressed by 200 from (for last_good == cb03dd22b36b7bd21a81137005ec42dab8355b62) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_mthumb artifacts/build-cb03dd22b36b7bd21a81137005ec42dab8355b62/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4418 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_32/tcwg_bmk_llvm_apm/bisect-llvm-master-arm-spec2k6-Oz/4431 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-binutils-590d3faada8a12bf0937bbf68413956dc6a339a9 cd investigate-binutils-590d3faada8a12bf0937bbf68413956dc6a339a9 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 590d3faada8a12bf0937bbf68413956dc6a339a9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach cb03dd22b36b7bd21a81137005ec42dab8355b62 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 590d3faada8a12bf0937bbf68413956dc6a339a9 Author: Tom de Vries <tdevries(a)suse.de> Date: Mon Aug 30 10:30:26 2021 +0200 [gdb/testsuite] Improve argument syntax of proc arange The current syntax of proc arange is: ... proc arange { arange_start arange_length {comment ""} {seg_sel ""} } { ... and a typical call looks like: ... arange $start $len ... This style is somewhat annoying because if you want to specify the last parameter, you need to give the default values of all the other optional ones before as well: ... arange $start $len "" $seg_sel ... Update the syntax to: ... proc arange { options arange_start arange_length } { parse_options { { comment "" } { seg_sel "" } } ... such that a typical call looks like: ... arange {} $start $len ... and a call using seg_sel looks like: ... arange { seg_sel $seg_sel } $start $len ... Also update proc aranges, which already has an options argument, to use the new proc parse_options. Tested on x86_64-linux. Co-Authored-By: Simon Marchi <simon.marchi(a)polymtl.ca> --- gdb/testsuite/gdb.dlang/watch-loc.exp | 2 +- gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp | 6 +- .../gdb.dwarf2/frame-inlined-in-outer-frame.exp | 2 +- .../template-specification-full-name.exp | 2 +- gdb/testsuite/gdb.testsuite/parse_options_args.exp | 59 ++++++++++++ gdb/testsuite/lib/dwarf.exp | 31 +++--- gdb/testsuite/lib/gdb.exp | 104 ++++++++++++++------- 7 files changed, 150 insertions(+), 56 deletions(-) diff --git a/gdb/testsuite/gdb.dlang/watch-loc.exp b/gdb/testsuite/gdb.dlang/watch-loc.exp index 6e8b26e3109..e13400ed479 100644 --- a/gdb/testsuite/gdb.dlang/watch-loc.exp +++ b/gdb/testsuite/gdb.dlang/watch-loc.exp @@ -68,7 +68,7 @@ Dwarf::assemble $asm_file { } aranges {} cu_start { - arange $dmain_start $dmain_length + arange {} $dmain_start $dmain_length } } diff --git a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp b/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp index e65b4c8610a..d55b7fd150e 100644 --- a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp +++ b/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp @@ -125,9 +125,9 @@ Dwarf::assemble $asm_file { } aranges {} cu_label { - arange [lindex $main_func 0] [lindex $main_func 1] - arange [lindex $frame2_func 0] [lindex $frame2_func 1] - arange [lindex $frame3_func 0] [lindex $frame3_func 1] + arange {} [lindex $main_func 0] [lindex $main_func 1] + arange {} [lindex $frame2_func 0] [lindex $frame2_func 1] + arange {} [lindex $frame3_func 0] [lindex $frame3_func 1] } } diff --git a/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp b/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp index ff12cd79f19..f95558dffef 100644 --- a/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp +++ b/gdb/testsuite/gdb.dwarf2/frame-inlined-in-outer-frame.exp @@ -95,7 +95,7 @@ Dwarf::assemble $dwarf_asm { } aranges {} cu_label { - arange __cu_low_pc __cu_high_pc + arange {} __cu_low_pc __cu_high_pc } } diff --git a/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp b/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp index 5c59777e1b6..6e736f2c8ef 100644 --- a/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp +++ b/gdb/testsuite/gdb.dwarf2/template-specification-full-name.exp @@ -69,7 +69,7 @@ Dwarf::assemble $asm_file { } aranges {} cu_start { - arange "$main_start" "$main_length" + arange {} "$main_start" "$main_length" } } diff --git a/gdb/testsuite/gdb.testsuite/parse_options_args.exp b/gdb/testsuite/gdb.testsuite/parse_options_args.exp new file mode 100644 index 00000000000..ce14fc3cd7c --- /dev/null +++ b/gdb/testsuite/gdb.testsuite/parse_options_args.exp @@ -0,0 +1,59 @@ +# Copyright 2021 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# Testsuite self-tests for parse_options and parse_args. + +with_test_prefix parse_options { + proc test1 { options a b } { + set v2 "defval2" + parse_options { + { opt1 defval1 } + { opt2 $v2 } + { opt3 } + { opt4 } + } + + gdb_assert { [string equal $a "vala"] } + gdb_assert { [string equal $b "valb"] } + gdb_assert { [string equal $opt1 "val1"] } + gdb_assert { [string equal $opt2 "defval2"] } + gdb_assert { $opt3 == 1 } + gdb_assert { $opt4 == 0 } + } + + set v1 "val1" + test1 { opt1 $v1 opt3 } "vala" "valb" +} + +with_test_prefix parse_args { + proc test2 { args } { + parse_args { + { opt1 defval1 } + { opt2 defval2 } + { opt3 } + { opt4 } + } + gdb_assert { [llength $args] == 2 } + lassign $args a b + gdb_assert { [string equal $a "vala"] } + gdb_assert { [string equal $b "valb"] } + gdb_assert { [string equal $opt1 "val1"] } + gdb_assert { [string equal $opt2 "defval2"] } + gdb_assert { $opt3 == 1 } + gdb_assert { $opt4 == 0 } + } + + set v1 "val1" + test2 -opt1 $v1 -opt3 "vala" "valb" +} diff --git a/gdb/testsuite/lib/dwarf.exp b/gdb/testsuite/lib/dwarf.exp index 120fa418201..7fb3561a443 100644 --- a/gdb/testsuite/lib/dwarf.exp +++ b/gdb/testsuite/lib/dwarf.exp @@ -2212,7 +2212,12 @@ namespace eval Dwarf { # Emit a DWARF .debug_aranges entry. - proc arange { arange_start arange_length {comment ""} {seg_sel ""} } { + proc arange { options arange_start arange_length } { + parse_options { + { comment "" } + { seg_sel "" } + } + if { $comment != "" } { # Wrap set comment " ($comment)" @@ -2270,22 +2275,14 @@ namespace eval Dwarf { variable _addr_size variable _seg_size - # Establish the defaults. - set is_64 0 - set cu_is_64 0 - set section_version 2 - set _seg_size 0 - # Handle options. - foreach { name value } $options { - switch -exact -- $name { - is_64 { set is_64 $value } - cu_is_64 { set cu_is_64 $value } - section_version {set section_version $value } - seg_size { set _seg_size $value } - default { error "unknown option $name" } - } + parse_options { + { is_64 0 } + { cu_is_64 0 } + { section_version 2 } + { seg_size 0 } } + set _seg_size $seg_size if { [is_64_target] } { set _addr_size 8 @@ -2354,9 +2351,9 @@ namespace eval Dwarf { # Terminator tuple. set comment "Terminator" if { $_seg_size == 0 } { - arange 0 0 $comment + arange {comment $comment} 0 0 } else { - arange 0 0 $comment 0 + arange {comment $comment seg_sel 0} 0 0 } # End label. diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp index 093392709b4..3aea7baaab0 100644 --- a/gdb/testsuite/lib/gdb.exp +++ b/gdb/testsuite/lib/gdb.exp @@ -7293,8 +7293,8 @@ proc using_fission { } { return [regexp -- "-gsplit-dwarf" $debug_flags] } -# Search the caller's ARGS list and set variables according to the list of -# valid options described by ARGSET. +# Search LISTNAME in uplevel LEVEL caller and set variables according to the +# list of valid options with prefix PREFIX described by ARGSET. # # The first member of each one- or two-element list in ARGSET defines the # name of a variable that will be added to the caller's scope. @@ -7305,13 +7305,15 @@ proc using_fission { } { # # If two elements are given, the second element is the default value of # the variable. This is then overwritten if the option exists in ARGS. +# If EVAL, then subst is called on the value, which allows variables +# to be used. # # Any parse_args elements in (the caller's) ARGS will be removed, leaving # any optional components. - +# # Example: # proc myproc {foo args} { -# parse_args {{bar} {baz "abc"} {qux}} +# parse_list args 1 {{bar} {baz "abc"} {qux}} "-" false # # ... # } # myproc ABC -bar -baz DEF peanut butter @@ -7319,43 +7321,79 @@ proc using_fission { } { # foo (=ABC), bar (=1), baz (=DEF), and qux (=0) # args will be the list {peanut butter} -proc parse_args { argset } { - upvar args args +proc parse_list { level listname argset prefix eval } { + upvar $level $listname args foreach argument $argset { - if {[llength $argument] == 1} { - # No default specified, so we assume that we should set - # the value to 1 if the arg is present and 0 if it's not. - # It is assumed that no value is given with the argument. - set result [lsearch -exact $args "-$argument"] - if {$result != -1} then { - uplevel 1 [list set $argument 1] - set args [lreplace $args $result $result] - } else { - uplevel 1 [list set $argument 0] - } - } elseif {[llength $argument] == 2} { - # There are two items in the argument. The second is a - # default value to use if the item is not present. - # Otherwise, the variable is set to whatever is provided - # after the item in the args. - set arg [lindex $argument 0] - set result [lsearch -exact $args "-[lindex $arg 0]"] - if {$result != -1} then { - uplevel 1 [list set $arg [lindex $args [expr $result+1]]] - set args [lreplace $args $result [expr $result+1]] - } else { - uplevel 1 [list set $arg [lindex $argument 1]] - } - } else { - error "Badly formatted argument \"$argument\" in argument set" - } + if {[llength $argument] == 1} { + # Normalize argument, strip leading/trailing whitespace. + # Allows us to treat {foo} and { foo } the same. + set argument [string trim $argument] + + # No default specified, so we assume that we should set + # the value to 1 if the arg is present and 0 if it's not. + # It is assumed that no value is given with the argument. + set pattern "$prefix$argument" + set result [lsearch -exact $args $pattern] + + if {$result != -1} then { + set value 1 + set args [lreplace $args $result $result] + } else { + set value 0 + } + uplevel $level [list set $argument $value] + } elseif {[llength $argument] == 2} { + # There are two items in the argument. The second is a + # default value to use if the item is not present. + # Otherwise, the variable is set to whatever is provided + # after the item in the args. + set arg [lindex $argument 0] + set pattern "$prefix[lindex $arg 0]" + set result [lsearch -exact $args $pattern] + + if {$result != -1} then { + set value [lindex $args [expr $result+1]] + if { $eval } { + set value [uplevel [expr $level + 1] [list subst $value]] + } + set args [lreplace $args $result [expr $result+1]] + } else { + set value [lindex $argument 1] + if { $eval } { + set value [uplevel $level [list subst $value]] + } + } + uplevel $level [list set $arg $value] + } else { + error "Badly formatted argument \"$argument\" in argument set" + } } +} + +# Search the caller's args variable and set variables according to the list of +# valid options described by ARGSET. + +proc parse_args { argset } { + parse_list 2 args $argset "-" false # The remaining args should be checked to see that they match the # number of items expected to be passed into the procedure... } +# Process the caller's options variable and set variables according +# to the list of valid options described by OPTIONSET. + +proc parse_options { optionset } { + parse_list 2 options $optionset "" true + + # Require no remaining options. + upvar 1 options options + if { [llength $options] != 0 } { + error "Options left unparsed: $options" + } +} + # Capture the output of COMMAND in a string ignoring PREFIX (a regexp); # return that string. </cut>

3 years, 10 months

1
0
0 0

[ACTIVITY] week ending 29 Aug 2021

by Richard Henderson

(PSA: On holiday through 11 September.) [ UM-2 ] * Some patch review * Revise riscv tcg_constant cleanup * Cleanup tcg/optimize.c * Optimize repeat sign-extensions. r~

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap_O3 - Build # 2 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_O3. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap_O3 Culprit: <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. </cut> Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_O3: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_O3… Full commit (up to 1000 lines): <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. --- gcc/expr.c | 19 ++++++++++++++++++- gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index 096c0315ecc..5dd98a9bccc 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp) && (GET_MODE_PRECISION (subreg_promoted_mode (x)) >= GET_MODE_PRECISION (int_mode)) && SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp)) - x = gen_lowpart (int_mode, SUBREG_REG (x)); + { + scalar_int_mode int_orig_mode; + machine_mode orig_mode = GET_MODE (x); + x = gen_lowpart (int_mode, SUBREG_REG (x)); + + /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than + the original mode, but narrower than the inner mode. */ + if (GET_CODE (x) == SUBREG + && GET_MODE_PRECISION (subreg_promoted_mode (x)) + > GET_MODE_PRECISION (int_mode) + && is_a <scalar_int_mode> (orig_mode, &int_orig_mode) + && GET_MODE_PRECISION (int_mode) + > GET_MODE_PRECISION (int_orig_mode)) + { + SUBREG_PROMOTED_VAR_P (x) = 1; + SUBREG_PROMOTED_SET (x, unsignedp); + } + } if (GET_MODE (x) != VOIDmode) oldmode = GET_MODE (x); diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index e431e0c19d7..ebad5cb5a79 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_SIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_SIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 1); + } + return temp; + } + } + else + /* Sign-extending a sign-extended subreg. */ + return simplify_gen_unary (SIGN_EXTEND, mode, + subreg, subreg_mode); } /* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>). @@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_UNSIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_UNSIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 0); + } + return temp; + } + } + else + /* Zero-extending a zero-extended subreg. */ + return simplify_gen_unary (ZERO_EXTEND, mode, + subreg, subreg_mode); } /* Extending a widening multiplication should be canonicalized to </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-master-arm-spec2k6-O2 - Build # 11 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2 Culprit: <cut> commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Thu Aug 19 11:42:09 2021 -0700 Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408 </cut> Results regressed to (for first_bad == 92c1fd19abb15bc68b1127a26137a69e033cdb39) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_marm artifacts/build-92c1fd19abb15bc68b1127a26137a69e033cdb39/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 1d02a8bcd393ea9c50f0212797059888efc78002) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_marm artifacts/build-1d02a8bcd393ea9c50f0212797059888efc78002/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/4381 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/4378 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Thu Aug 19 11:42:09 2021 -0700 Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408 --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 + llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 526 +-- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 282 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 348 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 324 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3540 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 720 ++-- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 ++- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 42 files changed, 4217 insertions(+), 4202 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index 2f853a2c6f9f..1c05afba730d 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,10 +117,11 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects and requires no operands that aren't always available. - /// This means the only allowed uses are constants and unallocatable physical - /// registers so that the instructions result is independent of the place - /// in the function. + /// has no side effects. Uses of constants and unallocatable physical + /// registers are always trivial to rematerialize so that the instructions + /// result is independent of the place in the function. Uses of virtual + /// registers are allowed but it is caller's responsility to ensure these + /// operands are valid at the point the instruction is beeing moved. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -140,8 +141,7 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value, or if it requres any address registers that are - /// not always available. + /// than producing a value. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index 1eab8e7443a7..fe7d60e0b7e2 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,7 +921,8 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || + MI.getOperand(0).isTied()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -983,12 +984,6 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; - - // Don't allow any virtual-register uses. Rematting an instruction with - // virtual register uses would length the live ranges of the uses, which - // is not necessarily a good idea, certainly not "trivial". - if (MO.isUse()) - return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index ed799bfca028..c9915aaabfde 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,6 +51,66 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... +# The liverange of %0 covers a point of rematerialization, source value is +# availabe. +--- +name: test_remat_s_mov_b32_vreg_src_long_lr +tracksRegLiveness: true +machineFunctionInfo: + stackPtrOffsetReg: $sgpr32 +body: | + bb.0: + ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr + ; GCN: renamable $sgpr0 = IMPLICIT_DEF + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 + ; GCN: S_ENDPGM 0 + %0:sreg_32 = IMPLICIT_DEF + %1:sreg_32 = S_MOV_B32 %0:sreg_32 + %2:sreg_32 = S_MOV_B32 %0:sreg_32 + %3:sreg_32 = S_MOV_B32 %0:sreg_32 + S_NOP 0, implicit %1 + S_NOP 0, implicit %2 + S_NOP 0, implicit %3 + S_NOP 0, implicit %0 + S_ENDPGM 0 +... +# The liverange of %0 does not cover a point of rematerialization, source value is +# unavailabe and we do not want to artificially extend the liverange. +--- +name: test_no_remat_s_mov_b32_vreg_src_short_lr +tracksRegLiveness: true +machineFunctionInfo: + stackPtrOffsetReg: $sgpr32 +body: | + bb.0: + ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr + ; GCN: renamable $sgpr0 = IMPLICIT_DEF + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) + ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 + ; GCN: S_ENDPGM 0 + %0:sreg_32 = IMPLICIT_DEF + %1:sreg_32 = S_MOV_B32 %0:sreg_32 + %2:sreg_32 = S_MOV_B32 %0:sreg_32 + %3:sreg_32 = S_MOV_B32 %0:sreg_32 + S_NOP 0, implicit %1 + S_NOP 0, implicit %2 + S_NOP 0, implicit %3 + S_ENDPGM 0 +... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index a4243276c70a..175a2069a441 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r1, r1, #1 +; ENABLE-NEXT: sub r3, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r3, [r0] -; ENABLE-NEXT: ldrb r3, [r12, r3] -; ENABLE-NEXT: add r0, r0, r3 -; ENABLE-NEXT: sub r3, r1, #1 -; ENABLE-NEXT: cmp r3, r1 +; ENABLE-NEXT: ldrb r1, [r0] +; ENABLE-NEXT: ldrb r1, [r12, r1] +; ENABLE-NEXT: add r0, r0, r1 +; ENABLE-NEXT: sub r1, r3, #1 +; ENABLE-NEXT: cmp r1, r3 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r1, r3 +; ENABLE-NEXT: mov r3, r1 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r1, r1, #1 +; DISABLE-NEXT: sub r3, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r3, [r0] -; DISABLE-NEXT: ldrb r3, [r12, r3] -; DISABLE-NEXT: add r0, r0, r3 -; DISABLE-NEXT: sub r3, r1, #1 -; DISABLE-NEXT: cmp r3, r1 +; DISABLE-NEXT: ldrb r1, [r0] +; DISABLE-NEXT: ldrb r1, [r12, r1] +; DISABLE-NEXT: add r0, r0, r1 +; DISABLE-NEXT: sub r1, r3, #1 +; DISABLE-NEXT: cmp r1, r3 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r1, r3 +; DISABLE-NEXT: mov r3, r1 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index 55157875d355..ea15fcc5c824 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and lr, r3, #63 -; SCALAR-NEXT: rsb r3, lr, #32 +; SCALAR-NEXT: and r12, r3, #63 +; SCALAR-NEXT: rsb r3, r12, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr r12, r0, lr -; SCALAR-NEXT: orr r3, r12, r1, lsl r3 -; SCALAR-NEXT: subs r12, lr, #32 -; SCALAR-NEXT: lsrpl r3, r1, r12 +; SCALAR-NEXT: lsr lr, r0, r12 +; SCALAR-NEXT: orr r3, lr, r1, lsl r3 +; SCALAR-NEXT: subs lr, r12, #32 +; SCALAR-NEXT: lsrpl r3, r1, lr ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, lr -; SCALAR-NEXT: cmp r12, #0 +; SCALAR-NEXT: lsr r0, r1, r12 +; SCALAR-NEXT: cmp lr, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and lr, r2, #63 +; CHECK-NEXT: and r12, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, lr, #32 +; CHECK-NEXT: rsb r3, r12, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr r12, r0, lr -; CHECK-NEXT: orr r3, r12, r1, lsl r3 -; CHECK-NEXT: subs r12, lr, #32 +; CHECK-NEXT: lsr lr, r0, r12 +; CHECK-NEXT: orr r3, lr, r1, lsl r3 +; CHECK-NEXT: subs lr, r12, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, r12 +; CHECK-NEXT: lsrpl r3, r1, lr ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, lr -; CHECK-NEXT: cmp r12, #0 +; CHECK-NEXT: lsr r0, r1, r12 +; CHECK-NEXT: cmp lr, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 54c93b493c98..6372f9be2ca3 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r6, r6, #27 -; CHECK-NEXT: and r1, r0, #63 ; CHECK-NEXT: lsl r2, r7, #27 +; CHECK-NEXT: and r12, r0, #63 +; CHECK-NEXT: lsl r6, r6, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 +; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: rsb r3, r1, #32 -; CHECK-NEXT: lsr r2, r2, r1 -; CHECK-NEXT: subs r12, r1, #32 -; CHECK-NEXT: bic r6, r6, r0 ; CHECK-NEXT: orr r2, r2, r7, lsl r3 +; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: bic r6, r6, r0 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r12 +; CHECK-NEXT: lsrpl r2, r7, r3 +; CHECK-NEXT: subs r1, r6, #32 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: subs r4, r6, #32 -; CHECK-NEXT: lsl r3, r8, #1 +; CHECK-NEXT: lsl r4, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r3, r3, r9, lsr #31 +; CHECK-NEXT: orr r4, r4, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r4, #0 -; CHECK-NEXT: lsr r1, r7, r1 +; CHECK-NEXT: cmp r1, #0 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r3, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r4 -; CHECK-NEXT: cmp r12, #0 +; CHECK-NEXT: orr r2, r2, r4, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r1 +; CHECK-NEXT: lsr r1, r7, r12 +; CHECK-NEXT: cmp r3, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 2922e0ed5423..0a0bb62b0a09 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 -; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: orr r2, r2, r12, lsl #24 -; BE-NEXT: orr r2, r2, #384 -; BE-NEXT: strb r2, [r1, #2] -; BE-NEXT: lsr r3, r2, #8 -; BE-NEXT: strh r3, [r1] -; BE-NEXT: bic r1, r12, #255 -; BE-NEXT: orr r1, r1, r2, lsr #24 +; BE-NEXT: ldr r3, [r0] +; BE-NEXT: orr r2, r2, r3, lsl #24 +; BE-NEXT: orr r12, r2, #384 +; BE-NEXT: strb r12, [r1, #2] +; BE-NEXT: lsr r2, r12, #8 +; BE-NEXT: strh r2, [r1] +; BE-NEXT: bic r1, r3, #255 +; BE-NEXT: orr r1, r1, r12, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r12, [r0] -; BE-NEXT: orr r2, r2, r12, lsl #24 -; BE-NEXT: orr r2, r2, #384 -; BE-NEXT: lsr r3, r2, #8 -; BE-NEXT: strh r3, [r1] -; BE-NEXT: bic r1, r12, #255 -; BE-NEXT: orr r1, r1, r2, lsr #24 +; BE-NEXT: ldr r3, [r0] +; BE-NEXT: orr r2, r2, r3, lsl #24 +; BE-NEXT: orr r12, r2, #384 +; BE-NEXT: lsr r2, r12, #8 +; BE-NEXT: strh r2, [r1] +; BE-NEXT: bic r1, r3, #255 +; BE-NEXT: orr r1, r1, r12, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 09a991da2e59..46490efb6631 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r0, r0, #3 +; CHECK-NEXT: and r12, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r3, sp -; CHECK-NEXT: vmov.u16 r12, d0[3] -; CHECK-NEXT: orr r0, r3, r0, lsl #1 +; CHECK-NEXT: mov r0, sp +; CHECK-NEXT: vmov.u16 r3, d0[3] +; CHECK-NEXT: orr r0, r0, r12, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r12 +; CHECK-NEXT: vmov.16 d0[3], r3 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index 8be7100d368b..a125446b27c3 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,79 +766,85 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $2, $6 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $4, $7, $16 -; MMR3-NEXT: not16 $3, $16 -; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sllv $3, $2, $3 -; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: or16 $3, $4 -; MMR3-NEXT: srlv $6, $6, $16 -; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: srlv $3, $7, $16 +; MMR3-NEXT: not16 $6, $16 +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $2 +; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $2, 1 +; MMR3-NEXT: sllv $2, $2, $6 +; MMR3-NEXT: li16 $6, 64 +; MMR3-NEXT: or16 $2, $3 +; MMR3-NEXT: srlv $4, $4, $16 +; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $6, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $2, $7, 32 -; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $5, $16, 32 -; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $9 +; MMR3-NEXT: andi16 $5, $7, 32 +; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $6, $16, 32 +; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $3, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $4, $17, $2 -; MMR3-NEXT: movn $3, $6, $5 -; MMR3-NEXT: addiu $2, $16, -64 -; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $5, $5, $2 -; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $6, $17, 1 -; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $5, $2 -; MMR3-NEXT: sllv $5, $6, $5 -; MMR3-NEXT: or16 $3, $4 -; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: srav $1, $17, $2 -; MMR3-NEXT: andi16 $2, $2, 32 -; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $2 -; MMR3-NEXT: sllv $2, $17, $7 -; MMR3-NEXT: not16 $4, $7 -; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: srl16 $6, $7, 1 -; MMR3-NEXT: srlv $6, $6, $4 +; MMR3-NEXT: movn $3, $17, $5 +; MMR3-NEXT: movn $2, $4, $6 +; MMR3-NEXT: addiu $4, $16, -64 +; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $4, $17, $4 +; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $4, $6, 1 +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: addiu $5, $16, -64 +; MMR3-NEXT: not16 $5, $5 +; MMR3-NEXT: sllv $5, $4, $5 +; MMR3-NEXT: or16 $2, $3 +; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $3 +; MMR3-NEXT: addiu $3, $16, -64 +; MMR3-NEXT: srav $1, $6, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 +; MMR3-NEXT: sllv $3, $6, $7 +; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: srl16 $4, $17, 1 +; MMR3-NEXT: srlv $3, $4, $3 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $3, $10 -; MMR3-NEXT: or16 $6, $2 -; MMR3-NEXT: srlv $2, $7, $16 -; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $3, $4, $3 +; MMR3-NEXT: movn $5, $2, $10 +; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srav $11, $17, $16 -; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $11, $4 -; MMR3-NEXT: sra $2, $17, 31 +; MMR3-NEXT: srlv $2, $17, $16 +; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $17, $7, $4 +; MMR3-NEXT: or16 $17, $2 +; MMR3-NEXT: srav $11, $6, $16 +; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $17, $11, $2 +; MMR3-NEXT: sra $2, $6, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $8, $2 -; MMR3-NEXT: movn $8, $3, $10 -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $6, $9, $3 -; MMR3-NEXT: li16 $3, 0 -; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $3, $4 -; MMR3-NEXT: or16 $7, $6 +; MMR3-NEXT: move $4, $2 +; MMR3-NEXT: movn $4, $17, $10 +; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $9, $6 +; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $17, $6 +; MMR3-NEXT: or16 $7, $3 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $4 +; MMR3-NEXT: movn $11, $2, $6 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $8 +; MMR3-NEXT: move $3, $4 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -852,79 +858,80 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $1, $7 +; MMR6-NEXT: move $12, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $7, $2, $3 -; MMR6-NEXT: sllv $8, $5, $7 -; MMR6-NEXT: andi16 $2, $7, 32 -; MMR6-NEXT: selnez $9, $8, $2 -; MMR6-NEXT: sllv $10, $4, $7 -; MMR6-NEXT: not16 $7, $7 -; MMR6-NEXT: srl16 $16, $5, 1 -; MMR6-NEXT: srlv $7, $16, $7 -; MMR6-NEXT: or $7, $10, $7 -; MMR6-NEXT: seleqz $7, $7, $2 -; MMR6-NEXT: or $7, $9, $7 -; MMR6-NEXT: srlv $9, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $16, $2, $3 +; MMR6-NEXT: sllv $1, $5, $16 +; MMR6-NEXT: andi16 $2, $16, 32 +; MMR6-NEXT: selnez $8, $1, $2 +; MMR6-NEXT: sllv $9, $4, $16 +; MMR6-NEXT: not16 $16, $16 +; MMR6-NEXT: srl16 $17, $5, 1 +; MMR6-NEXT: srlv $10, $17, $16 +; MMR6-NEXT: or $9, $9, $10 +; MMR6-NEXT: seleqz $9, $9, $2 +; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $9, $7, $3 +; MMR6-NEXT: not16 $7, $3 +; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $16 +; MMR6-NEXT: sllv $10, $17, $7 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $10, $10, $7 -; MMR6-NEXT: seleqz $12, $8, $2 -; MMR6-NEXT: or $8, $11, $9 +; MMR6-NEXT: or $8, $10, $8 +; MMR6-NEXT: seleqz $1, $1, $2 +; MMR6-NEXT: or $9, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $9, $5, $2 +; MMR6-NEXT: srlv $10, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $8, $8, $12 -; MMR6-NEXT: selnez $10, $10, $13 -; MMR6-NEXT: or $9, $11, $9 -; MMR6-NEXT: srav $11, $4, $2 +; MMR6-NEXT: or $1, $9, $1 +; MMR6-NEXT: selnez $8, $8, $13 +; MMR6-NEXT: or $9, $11, $10 +; MMR6-NEXT: srav $10, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $12, $11, $2 +; MMR6-NEXT: seleqz $11, $10, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $12, $15, $12 -; MMR6-NEXT: seleqz $12, $12, $13 -; MMR6-NEXT: selnez $2, $11, $2 -; MMR6-NEXT: seleqz $11, $14, $13 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: selnez $10, $10, $3 -; MMR6-NEXT: selnez $8, $8, $13 +; MMR6-NEXT: or $11, $15, $11 +; MMR6-NEXT: seleqz $11, $11, $13 +; MMR6-NEXT: selnez $2, $10, $2 +; MMR6-NEXT: seleqz $10, $14, $13 +; MMR6-NEXT: or $8, $8, $11 +; MMR6-NEXT: selnez $8, $8, $3 +; MMR6-NEXT: selnez $1, $1, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $12, $14, $17 -; MMR6-NEXT: or $4, $12, $4 -; MMR6-NEXT: selnez $12, $4, $13 +; MMR6-NEXT: selnez $11, $14, $17 +; MMR6-NEXT: or $4, $11, $4 +; MMR6-NEXT: selnez $11, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: seleqz $6, $12, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: or $4, $4, $10 -; MMR6-NEXT: or $2, $12, $11 -; MMR6-NEXT: srlv $3, $5, $3 -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $5, $7, $5 -; MMR6-NEXT: or $3, $5, $3 -; MMR6-NEXT: seleqz $3, $3, $17 -; MMR6-NEXT: selnez $5, $9, $17 -; MMR6-NEXT: or $3, $5, $3 -; MMR6-NEXT: selnez $3, $3, $13 -; MMR6-NEXT: or $3, $3, $11 +; MMR6-NEXT: selnez $1, $1, $3 +; MMR6-NEXT: or $1, $6, $1 +; MMR6-NEXT: or $4, $4, $8 +; MMR6-NEXT: or $6, $11, $10 +; MMR6-NEXT: srlv $2, $5, $3 +; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $3, $7, $3 +; MMR6-NEXT: or $2, $3, $2 +; MMR6-NEXT: seleqz $2, $2, $17 +; MMR6-NEXT: selnez $3, $9, $17 +; MMR6-NEXT: or $2, $3, $2 +; MMR6-NEXT: selnez $2, $2, $13 +; MMR6-NEXT: or $3, $2, $10 +; MMR6-NEXT: move $2, $6 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index ed2bfc9fcf60..e4b4b3ae1d0f 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,76 +776,77 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $7, $2, $16 -; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: move $17, $5 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $3, $7, 32 +; MMR3-NEXT: subu16 $17, $2, $16 +; MMR3-NEXT: sllv $9, $5, $17 +; MMR3-NEXT: andi16 $3, $17, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $8, $16 +; MMR3-NEXT: srlv $5, $7, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $5, $6, $16 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: srlv $7, $6, $16 ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $5, $3 +; MMR3-NEXT: movn $2, $7, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: srlv $4, $17, $3 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $6, $4, 1 -; MMR3-NEXT: not16 $5, $3 -; MMR3-NEXT: sllv $5, $6, $5 -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $17 -; MMR3-NEXT: srlv $1, $4, $3 -; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $3, $6, $3 ; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $4, $3, 1 +; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: addiu $5, $16, -64 +; MMR3-NEXT: not16 $5, $5 +; MMR3-NEXT: sllv $5, $4, $5 +; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: addiu $4, $16, -64 +; MMR3-NEXT: srlv $1, $3, $4 +; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $4 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $4, $7 -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srl16 $4, $7, 1 +; MMR3-NEXT: sllv $2, $3, $17 +; MMR3-NEXT: not16 $3, $17 +; MMR3-NEXT: srl16 $4, $6, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: srlv $2, $6, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $17 +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $6 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $6, 0 -; MMR3-NEXT: movz $3, $6, $10 -; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $7 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $7, 0 -; MMR3-NEXT: movn $6, $7, $17 -; MMR3-NEXT: or16 $6, $4 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movz $3, $17, $10 +; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $17 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movn $7, $17, $6 +; MMR3-NEXT: or16 $7, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $7, $4 -; MMR3-NEXT: li16 $7, 0 -; MMR3-NEXT: movn $1, $6, $10 +; MMR3-NEXT: movn $1, $17, $4 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $7, $17 +; MMR3-NEXT: movn $2, $17, $6 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -855,98 +856,91 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -32 -; MMR6-NEXT: .cfi_def_cfa_offset 32 -; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -24 +; MMR6-NEXT: .cfi_def_cfa_offset 24 +; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $5 -; MMR6-NEXT: lw $3, 60($sp) +; MMR6-NEXT: move $7, $4 +; MMR6-NEXT: lw $3, 52($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $5, $3 -; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $17, $6 -; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $4, $6 +; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $5 +; MMR6-NEXT: sllv $6, $6, $16 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $5, $3, -64 -; MMR6-NEXT: srlv $9, $7, $5 -; MMR6-NEXT: move $6, $4 -; MMR6-NEXT: sll16 $2, $4, 1 -; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $5 +; MMR6-NEXT: addiu $6, $3, -64 +; MMR6-NEXT: srlv $9, $5, $6 +; MMR6-NEXT: sll16 $2, $7, 1 +; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $6 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $17, $3 +; MMR6-NEXT: srlv $10, $4, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $7, $2 -; MMR6-NEXT: move $17, $7 +; MMR6-NEXT: sllv $12, $5, $2 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $7, $5, 32 -; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: seleqz $9, $9, $7 +; MMR6-NEXT: andi16 $17, $6, 32 +; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $6, $2 -; MMR6-NEXT: move $7, $6 -; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: sllv $12, $7, $2 ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $17, 1 +; MMR6-NEXT: srl16 $6, $5, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: srlv $4, $7, $5 -; MMR6-NEXT: or $11, $11, $2 -; MMR6-NEXT: or $5, $8, $13 -; MMR6-NEXT: srlv $6, $17, $3 -; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: selnez $7, $4, $2 -; MMR6-NEXT: sltiu $8, $3, 64 -; MMR6-NEXT: selnez $12, $5, $8 -; MMR6-NEXT: or $7, $7, $9 -; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $4, $3, -64 +; MMR6-NEXT: srlv $4, $7, $4 +; MMR6-NEXT: or $12, $11, $2 +; MMR6-NEXT: or $6, $8, $13 +; MMR6-NEXT: srlv $5, $5, $3 +; MMR6-NEXT: selnez $8, $4, $17 +; MMR6-NEXT: sltiu $11, $3, 64 +; MMR6-NEXT: selnez $13, $6, $11 +; MMR6-NEXT: or $8, $8, $9 ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $2, $5 +; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $9, $6, $2 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $5, 0 -; MMR6-NEXT: or $10, $10, $11 -; MMR6-NEXT: or $6, $9, $6 -; MMR6-NEXT: seleqz $2, $7, $8 -; MMR6-NEXT: seleqz $7, $5, $8 -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: srlv $9, $5, $3 -; MMR6-NEXT: seleqz $11, $9, $16 -; MMR6-NEXT: selnez $11, $11, $8 +; MMR6-NEXT: li16 $2, 0 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: or $9, $9, $5 +; MMR6-NEXT: seleqz $5, $8, $11 +; MMR6-NEXT: seleqz $8, $2, $11 +; MMR6-NEXT: srlv $7, $7, $3 +; MMR6-NEXT: seleqz $2, $7, $16 +; MMR6-NEXT: selnez $2, $2, $11 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $2, $12, $2 -; MMR6-NEXT: selnez $2, $2, $3 -; MMR6-NEXT: or $5, $1, $2 -; MMR6-NEXT: or $2, $7, $11 -; MMR6-NEXT: seleqz $1, $6, $16 -; MMR6-NEXT: selnez $6, $9, $16 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $9, $16, $3 -; MMR6-NEXT: selnez $10, $10, $8 -; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $4, $4, $16 -; MMR6-NEXT: seleqz $4, $4, $8 -; MMR6-NEXT: or $4, $10, $4 +; MMR6-NEXT: or $5, $13, $5 +; MMR6-NEXT: selnez $5, $5, $3 +; MMR6-NEXT: or $5, $1, $5 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: seleqz $1, $9, $16 +; MMR6-NEXT: selnez $6, $7, $16 +; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $7, $7, $3 +; MMR6-NEXT: selnez $9, $10, $11 +; MMR6-NEXT: seleqz $4, $4, $17 +; MMR6-NEXT: seleqz $4, $4, $11 </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_native_build/master-aarch64 - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gnu_native_build/master-aarch64. So far, this commit has regressed CI configurations: - tcwg_gnu_native_build/master-aarch64 Culprit: <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. </cut> Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:05:59 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/libgcc/shared-object.mk:14: trunctfhf2.o] Error 1 from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # build_abe gdb: 6 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/art… Build log: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-aarch64/1/con… Full commit (up to 1000 lines): <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. --- gcc/expr.c | 19 ++++++++++++++++++- gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index 096c0315ecc..5dd98a9bccc 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp) && (GET_MODE_PRECISION (subreg_promoted_mode (x)) >= GET_MODE_PRECISION (int_mode)) && SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp)) - x = gen_lowpart (int_mode, SUBREG_REG (x)); + { + scalar_int_mode int_orig_mode; + machine_mode orig_mode = GET_MODE (x); + x = gen_lowpart (int_mode, SUBREG_REG (x)); + + /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than + the original mode, but narrower than the inner mode. */ + if (GET_CODE (x) == SUBREG + && GET_MODE_PRECISION (subreg_promoted_mode (x)) + > GET_MODE_PRECISION (int_mode) + && is_a <scalar_int_mode> (orig_mode, &int_orig_mode) + && GET_MODE_PRECISION (int_mode) + > GET_MODE_PRECISION (int_orig_mode)) + { + SUBREG_PROMOTED_VAR_P (x) = 1; + SUBREG_PROMOTED_SET (x, unsignedp); + } + } if (GET_MODE (x) != VOIDmode) oldmode = GET_MODE (x); diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index e431e0c19d7..ebad5cb5a79 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_SIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_SIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 1); + } + return temp; + } + } + else + /* Sign-extending a sign-extended subreg. */ + return simplify_gen_unary (SIGN_EXTEND, mode, + subreg, subreg_mode); } /* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>). @@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_UNSIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_UNSIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 0); + } + return temp; + } + } + else + /* Zero-extending a zero-extended subreg. */ + return simplify_gen_unary (ZERO_EXTEND, mode, + subreg, subreg_mode); } /* Extending a widening multiplication should be canonicalized to </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_cross_build/master-aarch64 - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gnu_cross_build/master-aarch64. So far, this commit has regressed CI configurations: - tcwg_gnu_cross_build/master-aarch64 Culprit: <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. </cut> Results regressed to (for first_bad == cad36f38576a6a781e3c62ab061c68f5b8dab13a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:04:40 cc1: error: no include path in which to search for stdc-predef.h # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-1.h:127:36: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 # 00:04:48 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/static-object.mk:17: floatsitf.o] Error 1 # 00:04:48 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/soft-fp/op-2.h:249:37: internal compiler error: in subreg_promoted_mode, at rtl.h:3132 from (for last_good == 0960d937d9bee3c831d0b64a9c828c263a58ff89) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # build_abe stage2: 5 # build_abe gdb: 6 # build_abe qemu: 7 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a cd investigate-gcc-cad36f38576a6a781e3c62ab061c68f5b8dab13a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach cad36f38576a6a781e3c62ab061c68f5b8dab13a ../artifacts/test.sh # Reproduce last_good build git checkout --detach 0960d937d9bee3c831d0b64a9c828c263a58ff89 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/arti… Build log: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/1/cons… Full commit (up to 1000 lines): <cut> commit cad36f38576a6a781e3c62ab061c68f5b8dab13a Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Tue Aug 31 11:45:07 2021 +0100 Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))). SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical. --- gcc/expr.c | 19 ++++++++++++++++++- gcc/simplify-rtx.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index 096c0315ecc..5dd98a9bccc 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -688,7 +688,24 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp) && (GET_MODE_PRECISION (subreg_promoted_mode (x)) >= GET_MODE_PRECISION (int_mode)) && SUBREG_CHECK_PROMOTED_SIGN (x, unsignedp)) - x = gen_lowpart (int_mode, SUBREG_REG (x)); + { + scalar_int_mode int_orig_mode; + machine_mode orig_mode = GET_MODE (x); + x = gen_lowpart (int_mode, SUBREG_REG (x)); + + /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than + the original mode, but narrower than the inner mode. */ + if (GET_CODE (x) == SUBREG + && GET_MODE_PRECISION (subreg_promoted_mode (x)) + > GET_MODE_PRECISION (int_mode) + && is_a <scalar_int_mode> (orig_mode, &int_orig_mode) + && GET_MODE_PRECISION (int_mode) + > GET_MODE_PRECISION (int_orig_mode)) + { + SUBREG_PROMOTED_VAR_P (x) = 1; + SUBREG_PROMOTED_SET (x, unsignedp); + } + } if (GET_MODE (x) != VOIDmode) oldmode = GET_MODE (x); diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index e431e0c19d7..ebad5cb5a79 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1512,12 +1512,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_SIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_SIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 1); + } + return temp; + } + } + else + /* Sign-extending a sign-extended subreg. */ + return simplify_gen_unary (SIGN_EXTEND, mode, + subreg, subreg_mode); } /* (sign_extend:M (sign_extend:N <X>)) is (sign_extend:M <X>). @@ -1631,12 +1647,28 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, target mode is the same as the variable's promotion. */ if (GET_CODE (op) == SUBREG && SUBREG_PROMOTED_VAR_P (op) - && SUBREG_PROMOTED_UNSIGNED_P (op) - && !paradoxical_subreg_p (mode, GET_MODE (SUBREG_REG (op)))) + && SUBREG_PROMOTED_UNSIGNED_P (op)) { - temp = rtl_hooks.gen_lowpart_no_emit (mode, SUBREG_REG (op)); - if (temp) - return temp; + rtx subreg = SUBREG_REG (op); + machine_mode subreg_mode = GET_MODE (subreg); + if (!paradoxical_subreg_p (mode, subreg_mode)) + { + temp = rtl_hooks.gen_lowpart_no_emit (mode, subreg); + if (temp) + { + /* Preserve SUBREG_PROMOTED_VAR_P. */ + if (partial_subreg_p (temp)) + { + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, 0); + } + return temp; + } + } + else + /* Zero-extending a zero-extended subreg. */ + return simplify_gen_unary (ZERO_EXTEND, mode, + subreg, subreg_mode); } /* Extending a widening multiplication should be canonicalized to </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_native_build/master-arm - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gdb* in CI configuration tcwg_gnu_native_build/master-arm. So far, this commit has regressed CI configurations: - tcwg_gnu_native_build/master-arm Culprit: <cut> commit 282aa4f7d292eb4bc213d028465a3b96f5af2f22 Author: Tom Tromey <tom(a)tromey.com> Date: Sat Aug 28 13:16:50 2021 -0600 Add some parallel_for_each tests Tom de Vries noticed that a patch in the DWARF scanner rewrite series caused a regression in parallel_for_each -- it started crashing in the case where the number of threads is 0 (there was an unchecked use of "n-1" that was used to size an array). He also pointed out that there were no tests of parallel_for_each. This adds a few tests of parallel_for_each, primarily testing that different settings for the number of threads will work. This test catches the bug that he found in that series. </cut> Results regressed to (for first_bad == 282aa4f7d292eb4bc213d028465a3b96f5af2f22) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # First few build errors in logs: # 00:03:45 ../../../../../../gdb/gdb/unittests/parallel-for-selftests.c:53:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’ # 00:03:45 make[1]: *** [unittests/parallel-for-selftests.o] Error 1 # 00:03:46 make: *** [all-gdb] Error 2 from (for last_good == ee8b88452c1cb1be97199942aee7a76bbca210ee) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # build_abe gdb: 6 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gdb-282aa4f7d292eb4bc213d028465a3b96f5af2f22 cd investigate-gdb-282aa4f7d292eb4bc213d028465a3b96f5af2f22 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gdb/ ./ ./bisect/baseline/ cd gdb # Reproduce first_bad build git checkout --detach 282aa4f7d292eb4bc213d028465a3b96f5af2f22 ../artifacts/test.sh # Reproduce last_good build git checkout --detach ee8b88452c1cb1be97199942aee7a76bbca210ee ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/artifac… Build log: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/1/console… Full commit (up to 1000 lines): <cut> commit 282aa4f7d292eb4bc213d028465a3b96f5af2f22 Author: Tom Tromey <tom(a)tromey.com> Date: Sat Aug 28 13:16:50 2021 -0600 Add some parallel_for_each tests Tom de Vries noticed that a patch in the DWARF scanner rewrite series caused a regression in parallel_for_each -- it started crashing in the case where the number of threads is 0 (there was an unchecked use of "n-1" that was used to size an array). He also pointed out that there were no tests of parallel_for_each. This adds a few tests of parallel_for_each, primarily testing that different settings for the number of threads will work. This test catches the bug that he found in that series. --- gdb/Makefile.in | 1 + gdb/unittests/parallel-for-selftests.c | 86 ++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/gdb/Makefile.in b/gdb/Makefile.in index 73a1bf83c85..320d3326a81 100644 --- a/gdb/Makefile.in +++ b/gdb/Makefile.in @@ -456,6 +456,7 @@ SELFTESTS_SRCS = \ unittests/offset-type-selftests.c \ unittests/observable-selftests.c \ unittests/optional-selftests.c \ + unittests/parallel-for-selftests.c \ unittests/parse-connection-spec-selftests.c \ unittests/ptid-selftests.c \ unittests/main-thread-selftests.c \ diff --git a/gdb/unittests/parallel-for-selftests.c b/gdb/unittests/parallel-for-selftests.c new file mode 100644 index 00000000000..7f61b709fa7 --- /dev/null +++ b/gdb/unittests/parallel-for-selftests.c @@ -0,0 +1,86 @@ +/* Self tests for parallel_for_each + + Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of GDB. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#include "defs.h" +#include "gdbsupport/selftest.h" +#include "gdbsupport/parallel-for.h" +#include "gdbsupport/thread-pool.h" + +#if CXX_STD_THREAD + +namespace selftests { +namespace parallel_for { + +struct save_restore_n_threads +{ + save_restore_n_threads () + : n_threads (gdb::thread_pool::g_thread_pool->thread_count ()) + { + } + + ~save_restore_n_threads () + { + gdb::thread_pool::g_thread_pool->set_thread_count (n_threads); + } + + int n_threads; +}; + +static void +test (int n_threads) +{ + save_restore_n_threads saver; + gdb::thread_pool::g_thread_pool->set_thread_count (n_threads); + +#define NUMBER 10000 + + std::atomic<int> counter = 0; + gdb::parallel_for_each (0, NUMBER, + [&] (int start, int end) + { + counter += end - start; + }); + + SELF_CHECK (counter == NUMBER); + +#undef NUMBER +} + +static void +test_n_threads () +{ + test (0); + test (1); + test (3); +} + +} +} + +#endif /* CXX_STD_THREAD */ + +void _initialize_parallel_for_selftests (); +void +_initialize_parallel_for_selftests () +{ +#ifdef CXX_STD_THREAD + selftests::register_test ("parallel_for", + selftests::parallel_for::test_n_threads); +#endif /* CXX_STD_THREAD */ +} </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gnu_cross_build/master-arm - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gnu_cross_build/master-arm. So far, this commit has regressed CI configurations: - tcwg_gnu_cross_build/master-arm Culprit: <cut> commit caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 Author: Sebastian Huber <sebastian.huber(a)embedded-brains.de> Date: Tue Aug 17 09:53:43 2021 +0200 Use __builtin_trap() for abort() if inhibit_libc abort() is used in gcc_assert() and gcc_unreachable() which is used by target libraries such as libgcov.a. This patch changes the abort() definition under certain conditions. If inhibit_libc is defined and abort is not already defined, then abort() is defined to __builtin_trap(). The inhibit_libc define is usually defined if GCC is built for targets running in embedded systems which may optionally use a C standard library. If inhibit_libc is defined, then there may be still a full featured abort() available. abort() is a heavy weight function which depends on signals and file streams. For statically linked applications, this means that a dependency on gcc_assert() pulls in the support for signals and file streams. This could prevent using gcov to test low end targets for example. Using __builtin_trap() avoids these dependencies if the target implements a "trap" instruction. The application or operating system could use a trap handler to react to failed GCC runtime checks which caused a trap. gcc/ * tsystem.h (abort): Define abort() if inhibit_libc is defined and it is not already defined. </cut> Results regressed to (for first_bad == caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:01:44 cc1: error: no include path in which to search for stdc-predef.h # 00:02:05 /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/unwind-arm-common.inc:55:24: error: macro passed 1 arguments, but takes just 0 # 00:02:05 make[2]: *** [/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libgcc/static-object.mk:17: unwind-arm.o] Error 1 # 00:02:06 make[1]: *** [Makefile:12484: all-target-libgcc] Error 2 # 00:02:06 make: *** [Makefile:953: all] Error 2 from (for last_good == d7e56b084d0b230ae5ee280f569d679fa0f09f4d) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # build_abe stage2: 5 # build_abe gdb: 6 # build_abe qemu: 7 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… Build top page/logs: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 cd investigate-gcc-caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 ../artifacts/test.sh # Reproduce last_good build git checkout --detach d7e56b084d0b230ae5ee280f569d679fa0f09f4d ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/artifact… Build log: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-arm/1/consoleT… Full commit (up to 1000 lines): <cut> commit caf81d3b57501b1f58dcd9b1ef9d7b4bc76f4ab1 Author: Sebastian Huber <sebastian.huber(a)embedded-brains.de> Date: Tue Aug 17 09:53:43 2021 +0200 Use __builtin_trap() for abort() if inhibit_libc abort() is used in gcc_assert() and gcc_unreachable() which is used by target libraries such as libgcov.a. This patch changes the abort() definition under certain conditions. If inhibit_libc is defined and abort is not already defined, then abort() is defined to __builtin_trap(). The inhibit_libc define is usually defined if GCC is built for targets running in embedded systems which may optionally use a C standard library. If inhibit_libc is defined, then there may be still a full featured abort() available. abort() is a heavy weight function which depends on signals and file streams. For statically linked applications, this means that a dependency on gcc_assert() pulls in the support for signals and file streams. This could prevent using gcov to test low end targets for example. Using __builtin_trap() avoids these dependencies if the target implements a "trap" instruction. The application or operating system could use a trap handler to react to failed GCC runtime checks which caused a trap. gcc/ * tsystem.h (abort): Define abort() if inhibit_libc is defined and it is not already defined. --- gcc/tsystem.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tsystem.h b/gcc/tsystem.h index e1e6a96a4f4..5c72c69ff3e 100644 --- a/gcc/tsystem.h +++ b/gcc/tsystem.h @@ -59,7 +59,7 @@ extern int atexit (void (*)(void)); #endif #ifndef abort -extern void abort (void) __attribute__ ((__noreturn__)); +#define abort() __builtin_trap () #endif #ifndef strlen </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled - Build # 3 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *binutils* in CI configuration tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_profiled Culprit: <cut> commit a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Mon Aug 23 00:00:07 2021 +0000 Automatic date update in version.in </cut> Results regressed to (for first_bad == a12ea97b9dab8eedf411fc5052ffaa8be29f5d36) # reset_artifacts: -10 # true: 0 # First few build errors in logs: from (for last_good == fe7f0b013526b30ef657c5ad34a3c622a54499ac) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_profiled: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Configuration details: Reproduce builds: <cut> mkdir investigate-binutils-a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 cd investigate-binutils-a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fe7f0b013526b30ef657c5ad34a3c622a54499ac ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Full commit (up to 1000 lines): <cut> commit a12ea97b9dab8eedf411fc5052ffaa8be29f5d36 Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Mon Aug 23 00:00:07 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index d22e02c7fd4..66d43d6f3a1 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210822 +#define BFD_VERSION_DATE 20210823 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

3 years, 10 months

1
0
0 0

[ACTIVITY] report week ending 26 Aug

by Peter Maydell

Progress (short week, 3 days): * UM-2 [QEMU upstream maintainership] + QEMU 6.1.0 has now been released + Sent out the first arm pullreq for the 6.2 cycle, including another slice of the MVE patches + tried to work through some of the codereview backlog -- PMM

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap_debug - Build # 2 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_debug. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap_debug Culprit: <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. </cut> Results regressed to (for first_bad == 1d244020246cb155e4de62ca3b302b920a1f513f) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:06:26 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:25:39 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: error: type mismatch in ‘lshift_expr’ # 00:29:38 /home/tcwg-buildslave/workspace/tcwg_gnu_8/abe/snapshots/gcc.git~master/gcc/bitmap.h:357:13: internal compiler error: ‘verify_gimple’ failed # 00:29:38 make[3]: *** [bitmap.o] Error 1 # 00:34:06 make[2]: *** [all-stage3-gcc] Error 2 # 00:34:06 make[1]: *** [stage3-bubble] Error 2 # 00:34:07 make: *** [all] Error 2 from (for last_good == b320edc0c29c838b0090c3c9be14187d132f73f2) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_debug: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f cd investigate-gcc-1d244020246cb155e4de62ca3b302b920a1f513f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 1d244020246cb155e4de62ca3b302b920a1f513f ../artifacts/test.sh # Reproduce last_good build git checkout --detach b320edc0c29c838b0090c3c9be14187d132f73f2 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Full commit (up to 1000 lines): <cut> commit 1d244020246cb155e4de62ca3b302b920a1f513f Author: Roger Sayle <roger(a)nextmovesoftware.com> Date: Mon Aug 23 12:37:04 2021 +0100 Fold sign of LSHIFT_EXPR to eliminate no-op conversions. This short patch teaches fold that it is "safe" to change the sign of a left shift, to reduce the number of type conversions in gimple. As an example: unsigned int foo(unsigned int i) { return (int)i << 8; } is currently optimized to: unsigned int foo (unsigned int i) { int i.0_1; int _2; unsigned int _4; <bb 2> [local count: 1073741824]: i.0_1 = (int) i_3(D); _2 = i.0_1 << 8; _4 = (unsigned int) _2; return _4; } with this patch, this now becomes: unsigned int foo (unsigned int i) { unsigned int _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) << 8; return _2; } which generates exactly the same assembly language. Aside from the reduced memory usage, the real benefit is that no-op conversions tend to interfere with many folding optimizations. For example, unsigned int bar(unsigned char i) { return (i ^ (i<<16)) | (i<<8); } currently gets (tangled in conversions and) optimized to: unsigned int bar (unsigned char i) { unsigned int _1; unsigned int _2; int _3; int _4; unsigned int _6; unsigned int _8; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_5(D); _2 = _1 * 65537; _3 = (int) i_5(D); _4 = _3 << 8; _8 = (unsigned int) _4; _6 = _2 | _8; return _6; } but with this patch, bar now optimizes down to: unsigned int bar(unsigned char i) { unsigned int _1; unsigned int _4; <bb 2> [local count: 1073741824]: _1 = (unsigned int) i_3(D); _4 = _1 * 65793; return _4; } 2021-08-23 Roger Sayle <roger(a)nextmovesoftware.com> gcc/ChangeLog * match.pd (shift transformations): Change the sign of an LSHIFT_EXPR if it reduces the number of explicit conversions. gcc/testsuite/ChangeLog * gcc.dg/fold-convlshift-1.c: New test case. * gcc.dg/fold-convlshift-2.c: New test case. --- gcc/match.pd | 9 +++++++++ gcc/testsuite/gcc.dg/fold-convlshift-1.c | 20 ++++++++++++++++++++ gcc/testsuite/gcc.dg/fold-convlshift-2.c | 20 ++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 0fcfd0ea62c..978a1b0172e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3385,6 +3385,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (integer_zerop (@2) || integer_all_onesp (@2)) (cmp @0 @2))))) +/* Both signed and unsigned lshift produce the same result, so use + the form that minimizes the number of conversions. */ +(simplify + (convert (lshift:s@0 (convert:s@1 @2) INTEGER_CST@3)) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@2)) + && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)) + (lshift (convert @2) @3))) + /* Simplifications of conversions. */ /* Basic strip-useless-type-conversions / strip_nops. */ diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-1.c b/gcc/testsuite/gcc.dg/fold-convlshift-1.c new file mode 100644 index 00000000000..b6f57f81e72 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned int i) +{ + int t1 = i; + int t2 = t1 << 8; + return t2; +} + +int bar(int i) +{ + unsigned int t1 = i; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-not "\$int\$" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\$unsigned int\$" "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-2.c b/gcc/testsuite/gcc.dg/fold-convlshift-2.c new file mode 100644 index 00000000000..f21358c4584 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-convlshift-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int foo(unsigned char c) +{ + int t1 = c; + int t2 = t1 << 8; + return t2; +} + +int bar(unsigned char c) +{ + unsigned int t1 = c; + unsigned int t2 = t1 << 8; + return t2; +} + +/* { dg-final { scan-tree-dump-times "\$int\$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\$unsigned int\$" 1 "optimized" } } */ + </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 14 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit efa7df1682c2859dabe3646ee7dc01e68629417f Author: Gabor Marton <gabor.marton(a)ericsson.com> Date: Thu Mar 25 15:40:26 2021 +0100 [Analyzer] Track RValue expressions It makes sense to track rvalue expressions in the case of special concrete integer values. The most notable special value is zero (later we may find other values). By tracking the origin of 0, we can provide a better explanation for users e.g. in case of division by 0 warnings. When the divisor is a product of a multiplication then now we can show which operand (or both) was (were) zero and why. Differential Revision: https://reviews.llvm.org/D99344 </cut> Results regressed to (for first_bad == efa7df1682c2859dabe3646ee7dc01e68629417f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-efa7df1682c2859dabe3646ee7dc01e68629417f/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 1696b8ae96b2d8bcbf90894bd344a8a090f43c84) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-1696b8ae96b2d8bcbf90894bd344a8a090f43c84/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4320 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4322 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-efa7df1682c2859dabe3646ee7dc01e68629417f cd investigate-llvm-efa7df1682c2859dabe3646ee7dc01e68629417f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach efa7df1682c2859dabe3646ee7dc01e68629417f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1696b8ae96b2d8bcbf90894bd344a8a090f43c84 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit efa7df1682c2859dabe3646ee7dc01e68629417f Author: Gabor Marton <gabor.marton(a)ericsson.com> Date: Thu Mar 25 15:40:26 2021 +0100 [Analyzer] Track RValue expressions It makes sense to track rvalue expressions in the case of special concrete integer values. The most notable special value is zero (later we may find other values). By tracking the origin of 0, we can provide a better explanation for users e.g. in case of division by 0 warnings. When the divisor is a product of a multiplication then now we can show which operand (or both) was (were) zero and why. Differential Revision: https://reviews.llvm.org/D99344 --- .../StaticAnalyzer/Core/BugReporterVisitors.cpp | 43 ++++++++++ clang/test/Analysis/division-by-zero-track-zero.c | 11 +++ .../test/Analysis/division-by-zero-track-zero.cpp | 98 ++++++++++++++++++++++ clang/test/Analysis/nullptr.cpp | 2 +- 4 files changed, 153 insertions(+), 1 deletion(-) diff --git a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp index 0edd6e3f731b..fd334b0bc9c3 100644 --- a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp +++ b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp @@ -1924,6 +1924,44 @@ static const ExplodedNode* findNodeForExpression(const ExplodedNode *N, return N; } +/// Attempts to add visitors to track an RValue expression back to its point of +/// origin. Works similarly to trackExpressionValue, but accepts only RValues. +static void trackRValueExpression(const ExplodedNode *InputNode, const Expr *E, + PathSensitiveBugReport &report, + bugreporter::TrackingKind TKind, + bool EnableNullFPSuppression) { + assert(E->isRValue() && "The expression is not an rvalue!"); + const ExplodedNode *RVNode = findNodeForExpression(InputNode, E); + if (!RVNode) + return; + ProgramStateRef RVState = RVNode->getState(); + SVal V = RVState->getSValAsScalarOrLoc(E, RVNode->getLocationContext()); + const auto *BO = dyn_cast<BinaryOperator>(E); + if (!BO) + return; + if (!V.isZeroConstant()) + return; + if (!BO->isMultiplicativeOp()) + return; + + SVal RHSV = RVState->getSVal(BO->getRHS(), RVNode->getLocationContext()); + SVal LHSV = RVState->getSVal(BO->getLHS(), RVNode->getLocationContext()); + + // Track both LHS and RHS of a multiplication. + if (BO->getOpcode() == BO_Mul) { + if (LHSV.isZeroConstant()) + trackExpressionValue(InputNode, BO->getLHS(), report, TKind, + EnableNullFPSuppression); + if (RHSV.isZeroConstant()) + trackExpressionValue(InputNode, BO->getRHS(), report, TKind, + EnableNullFPSuppression); + } else { // Track only the LHS of a division or a modulo. + if (LHSV.isZeroConstant()) + trackExpressionValue(InputNode, BO->getLHS(), report, TKind, + EnableNullFPSuppression); + } +} + bool bugreporter::trackExpressionValue(const ExplodedNode *InputNode, const Expr *E, PathSensitiveBugReport &report, @@ -2069,6 +2107,11 @@ bool bugreporter::trackExpressionValue(const ExplodedNode *InputNode, loc::MemRegionVal(RegionRVal), /*assumption=*/false)); } } + + if (Inner->isRValue()) + trackRValueExpression(LVNode, Inner, report, TKind, + EnableNullFPSuppression); + return true; } diff --git a/clang/test/Analysis/division-by-zero-track-zero.c b/clang/test/Analysis/division-by-zero-track-zero.c new file mode 100644 index 000000000000..f6b2a78ed701 --- /dev/null +++ b/clang/test/Analysis/division-by-zero-track-zero.c @@ -0,0 +1,11 @@ +// RUN: %clang_analyze_cc1 -analyzer-checker=core \ +// RUN: -analyzer-output=text \ +// RUN: -verify %s + +int track_mul_lhs_0(int x, int y) { + int p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = p0 * y; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} +} diff --git a/clang/test/Analysis/division-by-zero-track-zero.cpp b/clang/test/Analysis/division-by-zero-track-zero.cpp new file mode 100644 index 000000000000..c4b9550c76c0 --- /dev/null +++ b/clang/test/Analysis/division-by-zero-track-zero.cpp @@ -0,0 +1,98 @@ +// RUN: %clang_analyze_cc1 -analyzer-checker=core \ +// RUN: -analyzer-output=text \ +// RUN: -verify %s + +namespace test_tracking_of_lhs_multiplier { + int f(int x, int y) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = p0 * y; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_of_lhs_multiplier + +namespace test_tracking_of_rhs_multiplier { + int f(int x, int y) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = y * p0; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_of_rhs_multiplier + +namespace test_tracking_of_nested_multiplier { + int f(int x, int y, int z) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = y*z*p0; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_of_nested_multiplier + +namespace test_tracking_through_multiple_stmts { + int f(int x, int y) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} + bool p1 = p0 ? 0 : 1; // expected-note {{'p0' is false}} \ + // expected-note {{'?' condition is false}} + bool p2 = 1 - p1; // expected-note {{'p2' initialized to 0}} + int div = p2 * y; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_through_multiple_stmts + +namespace test_tracking_both_lhs_and_rhs { + int f(int x, int y) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + bool p1 = y < 0; // expected-note {{Assuming 'y' is >= 0}} \ + // expected-note {{'p1' initialized to 0}} + int div = p0 * p1; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_both_lhs_and_rhs + +namespace test_tracking_of_multiplier_and_parens { + int f(int x, int y, int z) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = y*(z*p0); // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_of_multiplier_and_parens + +namespace test_tracking_of_divisible { + int f(int x, int y) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = p0 / y; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_of_divisible + +namespace test_tracking_of_modulo { + int f(int x, int y) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = p0 % y; // expected-note {{'div' initialized to 0}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_of_modulo + +namespace test_tracking_of_assignment { + int f(int x) { + bool p0 = x < 0; // expected-note {{Assuming 'x' is >= 0}} \ + // expected-note {{'p0' initialized to 0}} + int div = 1; + div *= p0; // expected-note {{The value 0 is assigned to 'div'}} + return 1 / div; // expected-note {{Division by zero}} \ + // expected-warning {{Division by zero}} + } +} // namespace test_tracking_of_assignment diff --git a/clang/test/Analysis/nullptr.cpp b/clang/test/Analysis/nullptr.cpp index e9b975c148aa..24b574a4ccfe 100644 --- a/clang/test/Analysis/nullptr.cpp +++ b/clang/test/Analysis/nullptr.cpp @@ -64,7 +64,7 @@ void zoo1backwards() { typedef __INTPTR_TYPE__ intptr_t; void zoo1multiply() { - char **p = 0; // FIXME-should-be-note:{{'p' initialized to a null pointer value}} + char **p = 0; // expected-note{{'p' initialized to a null pointer value}} delete *((char **)((intptr_t)p * 2)); // expected-warning{{Dereference of null pointer}} // expected-note@-1{{Dereference of null pointer}} } </cut>

3 years, 10 months

1
0
0 0

Moving to llvm@lists.linux.dev

by Nathan Chancellor

Hi everyone, We are shifting the ClangBuiltLinux mailing list from clang-built-linux(a)googlegroups.com to llvm(a)lists.linux.dev. Google Groups has served us well but moving to lists.linux.dev allows for easier archival (as we will be on lore.kernel.org automatically) and allows for people to subscribe to us easier, as they only need an email address, rather than a Google account. Please follow these directions to subscribe to the new mailing list: https://subspace.kernel.org/index.html#subscribing Some more information about lists.linux.dev: https://www.kernel.org/lists-linux-dev.html https://subspace.kernel.org/lists.linux.dev.html I have added CI maintainers/mailing lists that send us regular reports to this announcement. Please continue to send us emails about build results, just switch the email from clang-built-linux(a)googlegroups.com to llvm(a)lists.linux.dev so that they get archived as a part of lore and can be easily searched, especially with the upcoming https://x-lore.kernel.org/all/. I will send a patch shortly to update MAINTAINERS. Cheers, Nathan

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 Author: Fangrui Song <i(a)maskray.me> Date: Thu Mar 25 21:55:27 2021 -0700 [sanitizer] Simplify GetTls with dl_iterate_phdr GetTls is the range of * thread control block and optional TLS_PRE_TCB_SIZE * static TLS blocks plus static TLS surplus On glibc, lsan requires the range to include `pthread::{specific_1stblock,specific}` so that allocations only referenced by `pthread_setspecific` can be scanned. This patch uses `dl_iterate_phdr` to collect TLS ranges. Find the one with `dlpi_tls_modid==1` as one of the initially loaded module, then find consecutive ranges. The boundaries give us addr and size. This allows us to drop the glibc internal `_dl_get_tls_static_info` and `InitTlsSize` entirely. Use the simplified method with non-Android Linux for now, but in theory this can be used with *BSD and potentially other ELF OSes. In the future, we can move `ThreadDescriptorSize` code to lsan (and consider intercepting `pthread_setspecific`) to avoid hacks in generic code. See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage across various sanitizers. Differential Revision: https://reviews.llvm.org/D98926 </cut> Results regressed to (for first_bad == 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 9d375a40c3df90dd48edc0e1b1115c702c55d716) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-9d375a40c3df90dd48edc0e1b1115c702c55d716/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4304 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4302 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 cd investigate-llvm-9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9d375a40c3df90dd48edc0e1b1115c702c55d716 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 9be8f8b34d9b150cd1811e3556fe9d0cd735ae29 Author: Fangrui Song <i(a)maskray.me> Date: Thu Mar 25 21:55:27 2021 -0700 [sanitizer] Simplify GetTls with dl_iterate_phdr GetTls is the range of * thread control block and optional TLS_PRE_TCB_SIZE * static TLS blocks plus static TLS surplus On glibc, lsan requires the range to include `pthread::{specific_1stblock,specific}` so that allocations only referenced by `pthread_setspecific` can be scanned. This patch uses `dl_iterate_phdr` to collect TLS ranges. Find the one with `dlpi_tls_modid==1` as one of the initially loaded module, then find consecutive ranges. The boundaries give us addr and size. This allows us to drop the glibc internal `_dl_get_tls_static_info` and `InitTlsSize` entirely. Use the simplified method with non-Android Linux for now, but in theory this can be used with *BSD and potentially other ELF OSes. In the future, we can move `ThreadDescriptorSize` code to lsan (and consider intercepting `pthread_setspecific`) to avoid hacks in generic code. See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage across various sanitizers. Differential Revision: https://reviews.llvm.org/D98926 --- compiler-rt/lib/asan/asan_rtl.cpp | 5 +- compiler-rt/lib/asan/asan_thread.cpp | 2 +- compiler-rt/lib/hwasan/hwasan.cpp | 2 - compiler-rt/lib/lsan/lsan.cpp | 1 - compiler-rt/lib/memprof/memprof_rtl.cpp | 3 - compiler-rt/lib/msan/msan.cpp | 1 - .../lib/sanitizer_common/sanitizer_common.h | 1 - .../lib/sanitizer_common/sanitizer_fuchsia.cpp | 1 - compiler-rt/lib/sanitizer_common/sanitizer_linux.h | 1 - .../sanitizer_common/sanitizer_linux_libcdep.cpp | 231 ++++++++------------- compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp | 3 - .../lib/sanitizer_common/sanitizer_rtems.cpp | 1 - compiler-rt/lib/sanitizer_common/sanitizer_win.cpp | 3 - .../tests/sanitizer_common_test.cpp | 2 - .../tests/sanitizer_linux_test.cpp | 17 +- compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp | 1 - 16 files changed, 91 insertions(+), 184 deletions(-) diff --git a/compiler-rt/lib/asan/asan_rtl.cpp b/compiler-rt/lib/asan/asan_rtl.cpp index 7b5a929963c6..106a52607631 100644 --- a/compiler-rt/lib/asan/asan_rtl.cpp +++ b/compiler-rt/lib/asan/asan_rtl.cpp @@ -490,9 +490,6 @@ static void AsanInitInternal() { if (flags()->start_deactivated) AsanDeactivate(); - // interceptors - InitTlsSize(); - // Create main thread. AsanThread *main_thread = CreateMainThread(); CHECK_EQ(0, main_thread->tid()); @@ -568,7 +565,7 @@ void UnpoisonStack(uptr bottom, uptr top, const char *type) { type, top, bottom, top - bottom, top - bottom); return; } - PoisonShadow(bottom, top - bottom, 0); + PoisonShadow(bottom, RoundUpTo(top - bottom, SHADOW_GRANULARITY), 0); } static void UnpoisonDefaultStack() { diff --git a/compiler-rt/lib/asan/asan_thread.cpp b/compiler-rt/lib/asan/asan_thread.cpp index ae3bcba204c6..f7778c0f1e34 100644 --- a/compiler-rt/lib/asan/asan_thread.cpp +++ b/compiler-rt/lib/asan/asan_thread.cpp @@ -307,7 +307,7 @@ void AsanThread::SetThreadStackAndTls(const InitOptions *options) { uptr stack_size = 0; GetThreadStackAndTls(tid() == 0, &stack_bottom_, &stack_size, &tls_begin_, &tls_size); - stack_top_ = stack_bottom_ + stack_size; + stack_top_ = RoundDownTo(stack_bottom_ + stack_size, SHADOW_GRANULARITY); tls_end_ = tls_begin_ + tls_size; dtls_ = DTLS_Get(); diff --git a/compiler-rt/lib/hwasan/hwasan.cpp b/compiler-rt/lib/hwasan/hwasan.cpp index 5c0d804561d2..ce08ec3508c4 100644 --- a/compiler-rt/lib/hwasan/hwasan.cpp +++ b/compiler-rt/lib/hwasan/hwasan.cpp @@ -265,8 +265,6 @@ void __hwasan_init() { hwasan_init_is_running = 1; SanitizerToolName = "HWAddressSanitizer"; - InitTlsSize(); - CacheBinaryName(); InitializeFlags(); diff --git a/compiler-rt/lib/lsan/lsan.cpp b/compiler-rt/lib/lsan/lsan.cpp index 2c0a3bf0787c..b264be0ba792 100644 --- a/compiler-rt/lib/lsan/lsan.cpp +++ b/compiler-rt/lib/lsan/lsan.cpp @@ -98,7 +98,6 @@ extern "C" void __lsan_init() { InitCommonLsan(); InitializeAllocator(); ReplaceSystemMalloc(); - InitTlsSize(); InitializeInterceptors(); InitializeThreadRegistry(); InstallDeadlySignalHandlers(LsanOnDeadlySignal); diff --git a/compiler-rt/lib/memprof/memprof_rtl.cpp b/compiler-rt/lib/memprof/memprof_rtl.cpp index d6d606f666ee..05759e406f7a 100644 --- a/compiler-rt/lib/memprof/memprof_rtl.cpp +++ b/compiler-rt/lib/memprof/memprof_rtl.cpp @@ -214,9 +214,6 @@ static void MemprofInitInternal() { InitializeCoverage(common_flags()->coverage, common_flags()->coverage_dir); - // interceptors - InitTlsSize(); - // Create main thread. MemprofThread *main_thread = CreateMainThread(); CHECK_EQ(0, main_thread->tid()); diff --git a/compiler-rt/lib/msan/msan.cpp b/compiler-rt/lib/msan/msan.cpp index 4be1630cd302..4ee7e2ec4dd6 100644 --- a/compiler-rt/lib/msan/msan.cpp +++ b/compiler-rt/lib/msan/msan.cpp @@ -436,7 +436,6 @@ void __msan_init() { InitializeInterceptors(); CheckASLR(); - InitTlsSize(); InstallDeadlySignalHandlers(MsanOnDeadlySignal); InstallAtExitHandler(); // Needs __cxa_atexit interceptor. diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common.h b/compiler-rt/lib/sanitizer_common/sanitizer_common.h index dcd625d30f77..2b2629fc12dd 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_common.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_common.h @@ -284,7 +284,6 @@ void SetSandboxingCallback(void (*f)()); void InitializeCoverage(bool enabled, const char *coverage_dir); -void InitTlsSize(); uptr GetTlsSize(); // Other diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp index 4f692f99c207..5d68ad8ee8e4 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_fuchsia.cpp @@ -103,7 +103,6 @@ void DisableCoreDumperIfNecessary() {} void InstallDeadlySignalHandlers(SignalHandlerType handler) {} void SetAlternateSignalStack() {} void UnsetAlternateSignalStack() {} -void InitTlsSize() {} bool SignalContext::IsStackOverflow() const { return false; } void SignalContext::DumpAllRegisters(void *context) { UNIMPLEMENTED(); } diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux.h b/compiler-rt/lib/sanitizer_common/sanitizer_linux.h index 41ae072d6cac..9a23fcfb3b93 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_linux.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux.h @@ -98,7 +98,6 @@ class ThreadLister { // Exposed for testing. uptr ThreadDescriptorSize(); uptr ThreadSelf(); -uptr ThreadSelfOffset(); // Matches a library's file name against a base name (stripping path and version // information). diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp index 613658147bbd..1177a1ceb14f 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp @@ -184,80 +184,8 @@ __attribute__((unused)) static bool GetLibcVersion(int *major, int *minor, #endif } -#if SANITIZER_GLIBC && !SANITIZER_GO -static uptr g_tls_size; - -#ifdef __i386__ -#define CHECK_GET_TLS_STATIC_INFO_VERSION (!__GLIBC_PREREQ(2, 27)) -#else -#define CHECK_GET_TLS_STATIC_INFO_VERSION 0 -#endif - -#if CHECK_GET_TLS_STATIC_INFO_VERSION -#define DL_INTERNAL_FUNCTION __attribute__((regparm(3), stdcall)) -#else -#define DL_INTERNAL_FUNCTION -#endif - -namespace { -struct GetTlsStaticInfoCall { - typedef void (*get_tls_func)(size_t*, size_t*); -}; -struct GetTlsStaticInfoRegparmCall { - typedef void (*get_tls_func)(size_t*, size_t*) DL_INTERNAL_FUNCTION; -}; - -template <typename T> -void CallGetTls(void* ptr, size_t* size, size_t* align) { - typename T::get_tls_func get_tls; - CHECK_EQ(sizeof(get_tls), sizeof(ptr)); - internal_memcpy(&get_tls, &ptr, sizeof(ptr)); - CHECK_NE(get_tls, 0); - get_tls(size, align); -} - -bool CmpLibcVersion(int major, int minor, int patch) { - int ma; - int mi; - int pa; - if (!GetLibcVersion(&ma, &mi, &pa)) - return false; - if (ma > major) - return true; - if (ma < major) - return false; - if (mi > minor) - return true; - if (mi < minor) - return false; - return pa >= patch; -} - -} // namespace - -void InitTlsSize() { - // all current supported platforms have 16 bytes stack alignment - const size_t kStackAlign = 16; - void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info"); - size_t tls_size = 0; - size_t tls_align = 0; - // On i?86, _dl_get_tls_static_info used to be internal_function, i.e. - // __attribute__((regparm(3), stdcall)) before glibc 2.27 and is normal - // function in 2.27 and later. - if (CHECK_GET_TLS_STATIC_INFO_VERSION && !CmpLibcVersion(2, 27, 0)) - CallGetTls<GetTlsStaticInfoRegparmCall>(get_tls_static_info_ptr, - &tls_size, &tls_align); - else - CallGetTls<GetTlsStaticInfoCall>(get_tls_static_info_ptr, - &tls_size, &tls_align); - if (tls_align < kStackAlign) - tls_align = kStackAlign; - g_tls_size = RoundUpTo(tls_size, tls_align); -} -#else -void InitTlsSize() { } -#endif // SANITIZER_GLIBC && !SANITIZER_GO - +// ThreadDescriptorSize() is only used by lsan to get the pointer to +// thread-specific data keys in the thread control block. #if (defined(__x86_64__) || defined(__i386__) || defined(__mips__) || \ defined(__aarch64__) || defined(__powerpc64__) || defined(__s390__) || \ defined(__arm__) || SANITIZER_RISCV64) && \ @@ -330,13 +258,6 @@ uptr ThreadDescriptorSize() { return val; } -// The offset at which pointer to self is located in the thread descriptor. -const uptr kThreadSelfOffset = FIRST_32_SECOND_64(8, 16); - -uptr ThreadSelfOffset() { - return kThreadSelfOffset; -} - #if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 // TlsPreTcbSize includes size of struct pthread_descr and size of tcb // head structure. It lies before the static tls blocks. @@ -355,48 +276,61 @@ static uptr TlsPreTcbSize() { } #endif -uptr ThreadSelf() { - uptr descr_addr; -#if defined(__i386__) - asm("mov %%gs:%c1,%0" : "=r"(descr_addr) : "i"(kThreadSelfOffset)); -#elif defined(__x86_64__) - asm("mov %%fs:%c1,%0" : "=r"(descr_addr) : "i"(kThreadSelfOffset)); -#elif defined(__mips__) - // MIPS uses TLS variant I. The thread pointer (in hardware register $29) - // points to the end of the TCB + 0x7000. The pthread_descr structure is - // immediately in front of the TCB. TlsPreTcbSize() includes the size of the - // TCB and the size of pthread_descr. - const uptr kTlsTcbOffset = 0x7000; - uptr thread_pointer; - asm volatile(".set push;\ - .set mips64r2;\ - rdhwr %0,$29;\ - .set pop" : "=r" (thread_pointer)); - descr_addr = thread_pointer - kTlsTcbOffset - TlsPreTcbSize(); -#elif defined(__aarch64__) || defined(__arm__) - descr_addr = reinterpret_cast<uptr>(__builtin_thread_pointer()) - - ThreadDescriptorSize(); -#elif SANITIZER_RISCV64 - // https://github.com/riscv/riscv-elf-psabi-doc/issues/53 - uptr thread_pointer = reinterpret_cast<uptr>(__builtin_thread_pointer()); - descr_addr = thread_pointer - TlsPreTcbSize(); -#elif defined(__s390__) - descr_addr = reinterpret_cast<uptr>(__builtin_thread_pointer()); -#elif defined(__powerpc64__) - // PPC64LE uses TLS variant I. The thread pointer (in GPR 13) - // points to the end of the TCB + 0x7000. The pthread_descr structure is - // immediately in front of the TCB. TlsPreTcbSize() includes the size of the - // TCB and the size of pthread_descr. - const uptr kTlsTcbOffset = 0x7000; - uptr thread_pointer; - asm("addi %0,13,%1" : "=r"(thread_pointer) : "I"(-kTlsTcbOffset)); - descr_addr = thread_pointer - TlsPreTcbSize(); -#else -#error "unsupported CPU arch" -#endif - return descr_addr; +#if !SANITIZER_GO +namespace { +struct TlsRange { + uptr begin, end, align; + size_t tls_modid; + bool operator<(const TlsRange &rhs) const { return begin < rhs.begin; } +}; +} // namespace + +static int CollectStaticTlsRanges(struct dl_phdr_info *info, size_t size, + void *data) { + if (!info->dlpi_tls_data) + return 0; + const uptr begin = (uptr)info->dlpi_tls_data; + for (unsigned i = 0; i != info->dlpi_phnum; ++i) + if (info->dlpi_phdr[i].p_type == PT_TLS) { + static_cast<InternalMmapVector<TlsRange> *>(data)->push_back( + TlsRange{begin, begin + info->dlpi_phdr[i].p_memsz, + info->dlpi_phdr[i].p_align, info->dlpi_tls_modid}); + break; + } + return 0; } -#endif // (x86_64 || i386 || MIPS) && SANITIZER_LINUX + +static void GetStaticTlsRange(uptr *addr, uptr *size) { + InternalMmapVector<TlsRange> ranges; + dl_iterate_phdr(CollectStaticTlsRanges, &ranges); + uptr len = ranges.size(); + Sort(ranges.begin(), len); + // Find the range with tls_modid=1. For glibc, because libc.so uses PT_TLS, + // this module is guaranteed to exist and is one of the initially loaded + // modules. + uptr one = 0; + while (one != len && ranges[one].tls_modid != 1) ++one; + if (one == len) { + // This may happen with musl if no module uses PT_TLS. + *addr = 0; + *size = 0; + return; + } + // Find the maximum consecutive ranges. We consider two modules consecutive if + // the gap is smaller than the alignment. The dynamic loader places static TLS + // blocks this way not to waste space. + uptr l = one; + while (l != 0 && ranges[l].begin < ranges[l - 1].end + ranges[l - 1].align) + --l; + uptr r = one + 1; + while (r != len && ranges[r].begin < ranges[r - 1].end + ranges[r - 1].align) + ++r; + *addr = ranges[l].begin; + *size = ranges[r - 1].end - ranges[l].begin; +} +#endif // !SANITIZER_GO +#endif // (x86_64 || i386 || mips || ...) && SANITIZER_LINUX && + // !SANITIZER_ANDROID #if SANITIZER_FREEBSD static void **ThreadSelfSegbase() { @@ -468,18 +402,36 @@ static void GetTls(uptr *addr, uptr *size) { *size = 0; } #elif SANITIZER_LINUX + GetStaticTlsRange(addr, size); #if defined(__x86_64__) || defined(__i386__) || defined(__s390__) - *addr = ThreadSelf(); - *size = GetTlsSize(); - *addr -= *size; - *addr += ThreadDescriptorSize(); -#elif defined(__mips__) || defined(__aarch64__) || defined(__powerpc64__) || \ - defined(__arm__) || SANITIZER_RISCV64 - *addr = ThreadSelf(); - *size = GetTlsSize(); + // lsan requires the range to additionally cover the static TLS surplus + // (elf/dl-tls.c defines 1664). Otherwise there may be false positives for + // allocations only referenced by tls in dynamically loaded modules. + if (SANITIZER_GLIBC) { + *addr -= 1664; + *size += 1664; + } + // Extend the range to include the thread control block. On glibc, lsan needs + // the range to include pthread::{specific_1stblock,specific} so that + // allocations only referenced by pthread_setspecific can be scanned. This may + // underestimate by at most TLS_TCB_ALIGN-1 bytes but it should be fine + // because the number of bytes after pthread::specific is larger. + *size += ThreadDescriptorSize(); #else - *addr = 0; - *size = 0; + if (SANITIZER_GLIBC) + *size += 1664; +#if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 + const uptr pre_tcb_size = TlsPreTcbSize(); + *addr -= pre_tcb_size; + *size += pre_tcb_size; +#else + // arm and aarch64 reserve two words at TP, so this underestimates the range. + // However, this is sufficient for the purpose of finding the pointers to + // thread-specific data keys. + const uptr tcb_size = ThreadDescriptorSize(); + *addr -= tcb_size; + *size += tcb_size; +#endif #endif #elif SANITIZER_FREEBSD void** segbase = ThreadSelfSegbase(); @@ -520,17 +472,11 @@ static void GetTls(uptr *addr, uptr *size) { #if !SANITIZER_GO uptr GetTlsSize() { -#if SANITIZER_FREEBSD || SANITIZER_ANDROID || SANITIZER_NETBSD || \ +#if SANITIZER_FREEBSD || SANITIZER_LINUX || SANITIZER_NETBSD || \ SANITIZER_SOLARIS uptr addr, size; GetTls(&addr, &size); return size; -#elif SANITIZER_GLIBC -#if defined(__mips__) || defined(__powerpc64__) || SANITIZER_RISCV64 - return RoundUpTo(g_tls_size + TlsPreTcbSize(), 16); -#else - return g_tls_size; -#endif #else return 0; #endif @@ -553,10 +499,9 @@ void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, if (!main) { // If stack and tls intersect, make them non-intersecting. if (*tls_addr > *stk_addr && *tls_addr < *stk_addr + *stk_size) { - CHECK_GT(*tls_addr + *tls_size, *stk_addr); - CHECK_LE(*tls_addr + *tls_size, *stk_addr + *stk_size); - *stk_size -= *tls_size; - *tls_addr = *stk_addr + *stk_size; + if (*stk_addr + *stk_size < *tls_addr + *tls_size) + *tls_size = *stk_addr + *stk_size - *tls_addr; + *stk_size = *tls_addr - *stk_addr; } } #endif diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp index d7b0bde173c8..5055df1ec29a 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_mac.cpp @@ -548,9 +548,6 @@ uptr GetTlsSize() { return 0; } -void InitTlsSize() { -} - uptr TlsBaseAddr() { uptr segbase = 0; #if defined(__x86_64__) diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp index d58bd08fb1a8..01554349cc04 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_rtems.cpp @@ -106,7 +106,6 @@ void DisableCoreDumperIfNecessary() {} void InstallDeadlySignalHandlers(SignalHandlerType handler) {} void SetAlternateSignalStack() {} void UnsetAlternateSignalStack() {} -void InitTlsSize() {} void SignalContext::DumpAllRegisters(void *context) {} const char *DescribeSignalOrException(int signo) { UNIMPLEMENTED(); } diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp index f383e130fa59..d47ccad1764d 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp +++ b/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp @@ -846,9 +846,6 @@ uptr GetTlsSize() { return 0; } -void InitTlsSize() { -} - void GetThreadStackAndTls(bool main, uptr *stk_addr, uptr *stk_size, uptr *tls_addr, uptr *tls_size) { #if SANITIZER_GO diff --git a/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp b/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp index 80df9b497b2d..21c6b036b956 100644 --- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp +++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_common_test.cpp @@ -210,12 +210,10 @@ static void *WorkerThread(void *arg) { } TEST(SanitizerCommon, ThreadStackTlsMain) { - InitTlsSize(); TestThreadInfo(true); } TEST(SanitizerCommon, ThreadStackTlsWorker) { - InitTlsSize(); pthread_t t; PTHREAD_CREATE(&t, 0, WorkerThread, 0); PTHREAD_JOIN(t, 0); diff --git a/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp b/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp index cb6c0724ac88..025cba922d2d 100644 --- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp +++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp @@ -188,24 +188,9 @@ TEST(SanitizerCommon, SetEnvTest) { } #if (defined(__x86_64__) || defined(__i386__)) && !SANITIZER_ANDROID -void *thread_self_offset_test_func(void *arg) { - bool result = - *(uptr *)((char *)ThreadSelf() + ThreadSelfOffset()) == ThreadSelf(); - return (void *)result; -} - -TEST(SanitizerLinux, ThreadSelfOffset) { - EXPECT_TRUE((bool)thread_self_offset_test_func(0)); - pthread_t tid; - void *result; - ASSERT_EQ(0, pthread_create(&tid, 0, thread_self_offset_test_func, 0)); - ASSERT_EQ(0, pthread_join(tid, &result)); - EXPECT_TRUE((bool)result); -} - // libpthread puts the thread descriptor at the end of stack space. void *thread_descriptor_size_test_func(void *arg) { - uptr descr_addr = ThreadSelf(); + uptr descr_addr = (uptr)pthread_self(); pthread_attr_t attr; pthread_getattr_np(pthread_self(), &attr); void *stackaddr; diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp index 45acfe66ff3f..0d26f497f2bd 100644 --- a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp +++ b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp @@ -318,7 +318,6 @@ void InitializePlatform() { } CheckAndProtect(); - InitTlsSize(); #endif // !SANITIZER_GO } </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO Culprit: <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 </cut> Results regressed to (for first_bad == a26f1bf67ec70f72e64101cf483b26466928fc38) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-a26f1bf67ec70f72e64101cf483b26466928fc38/results_id: 1 # 462.libquantum,libquantum_base.default regressed by 104 from (for last_good == bb1e5399e4586239d6424f5eea5a9f06c52ebe9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-baseline/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/baseline-llvm-release-arm-spec2k6-O3_LTO/4271 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/4289 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 cd investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a26f1bf67ec70f72e64101cf483b26466928fc38 ../artifacts/test.sh # Reproduce last_good build git checkout --detach bb1e5399e4586239d6424f5eea5a9f06c52ebe9b ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 --- llvm/lib/Passes/PassBuilder.cpp | 10 +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp | 4 + llvm/test/CodeGen/AMDGPU/opt-pipeline.ll | 30 +++++--- llvm/test/Other/new-pm-defaults.ll | 7 +- llvm/test/Other/new-pm-thinlto-defaults.ll | 7 +- .../Other/new-pm-thinlto-postlink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-postlink-samplepgo-defaults.ll | 7 +- .../Other/new-pm-thinlto-prelink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-prelink-samplepgo-defaults.ll | 5 +- llvm/test/Other/opt-O2-pipeline.ll | 10 ++- llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | 10 ++- llvm/test/Other/opt-O3-pipeline.ll | 10 ++- llvm/test/Other/opt-Os-pipeline.ll | 10 ++- llvm/test/Other/pass-pipelines.ll | 3 + llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll | 25 ++++--- .../PhaseOrdering/X86/spurious-peeling.ll | 87 +++++++++------------- llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll | 78 +++++++++---------- .../loop-rotation-vs-common-code-hoisting.ll | 22 +++--- 18 files changed, 193 insertions(+), 150 deletions(-) diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 3a325277e370..5a2285215769 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -568,6 +568,11 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true, isLTOPreLink(Phase))); // TODO: Investigate promotion cap for O1. @@ -736,6 +741,11 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + // Disable header duplication in loop rotation at -Oz. LPM1.addPass( LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase))); diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 109e7c97ff1b..2c80a16febef 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -431,6 +431,10 @@ void PassManagerBuilder::addFunctionSimplificationPasses( MPM.add(createLoopInstSimplifyPass()); MPM.add(createLoopSimplifyCFGPass()); } + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap)); // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO)); // TODO: Investigate promotion cap for O1. diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 34e5e6c647da..5e33d968c710 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -129,16 +129,20 @@ ; GCN-O1-NEXT: Simplify the CFG ; GCN-O1-NEXT: Reassociate expressions ; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O1-NEXT: Function Alias Analysis Results +; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Natural Loop Information ; GCN-O1-NEXT: Canonicalize natural loops ; GCN-O1-NEXT: LCSSA Verifier ; GCN-O1-NEXT: Loop-Closed SSA Form Pass -; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O1-NEXT: Function Alias Analysis Results ; GCN-O1-NEXT: Scalar Evolution Analysis +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Loop Pass Manager +; GCN-O1-NEXT: Loop Invariant Code Motion ; GCN-O1-NEXT: Loop Pass Manager ; GCN-O1-NEXT: Rotate Loops -; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Lazy Branch Probability Analysis ; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: Loop Pass Manager @@ -451,16 +455,20 @@ ; GCN-O2-NEXT: Simplify the CFG ; GCN-O2-NEXT: Reassociate expressions ; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O2-NEXT: Function Alias Analysis Results +; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Natural Loop Information ; GCN-O2-NEXT: Canonicalize natural loops ; GCN-O2-NEXT: LCSSA Verifier ; GCN-O2-NEXT: Loop-Closed SSA Form Pass -; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O2-NEXT: Function Alias Analysis Results ; GCN-O2-NEXT: Scalar Evolution Analysis +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis +; GCN-O2-NEXT: Loop Pass Manager +; GCN-O2-NEXT: Loop Invariant Code Motion ; GCN-O2-NEXT: Loop Pass Manager ; GCN-O2-NEXT: Rotate Loops -; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Lazy Branch Probability Analysis ; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: Loop Pass Manager @@ -810,16 +818,20 @@ ; GCN-O3-NEXT: Simplify the CFG ; GCN-O3-NEXT: Reassociate expressions ; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O3-NEXT: Function Alias Analysis Results +; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Natural Loop Information ; GCN-O3-NEXT: Canonicalize natural loops ; GCN-O3-NEXT: LCSSA Verifier ; GCN-O3-NEXT: Loop-Closed SSA Form Pass -; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O3-NEXT: Function Alias Analysis Results ; GCN-O3-NEXT: Scalar Evolution Analysis +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis +; GCN-O3-NEXT: Loop Pass Manager +; GCN-O3-NEXT: Loop Invariant Code Motion ; GCN-O3-NEXT: Loop Pass Manager ; GCN-O3-NEXT: Rotate Loops -; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Lazy Branch Probability Analysis ; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll index 01b02b8fd482..337a0857701c 100644 --- a/llvm/test/Other/new-pm-defaults.ll +++ b/llvm/test/Other/new-pm-defaults.ll @@ -113,9 +113,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}> ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -156,6 +156,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-defaults.ll b/llvm/test/Other/new-pm-thinlto-defaults.ll index fbf47de87eeb..bba43dd50e7a 100644 --- a/llvm/test/Other/new-pm-thinlto-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-defaults.ll @@ -98,9 +98,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -139,6 +139,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll index 4bcf70e15a5b..57f0e0da73b6 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll @@ -68,10 +68,10 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -112,6 +112,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll index 1071d28432b9..0e0e2854b8df 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll @@ -78,9 +78,9 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -121,6 +121,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll index e2f1385cf52b..4cfb9825c97e 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll @@ -93,10 +93,10 @@ ; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo @@ -158,6 +158,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll index d4dc552aea01..a05555c57003 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll @@ -73,8 +73,8 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis @@ -116,6 +116,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index f7217c122fdb..a3b01e5464d4 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -101,16 +101,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll index 6b98c1f80d9e..fafd5c8fdcb8 100644 --- a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll +++ b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index 00a1d61ac058..103d49bbbbab 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index 21f9b8c6009e..508c21edbc68 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -87,16 +87,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/pass-pipelines.ll b/llvm/test/Other/pass-pipelines.ll index ccd364d5d740..768e8343529e 100644 --- a/llvm/test/Other/pass-pipelines.ll +++ b/llvm/test/Other/pass-pipelines.ll @@ -53,6 +53,9 @@ ; CHECK-O2-NEXT: FunctionPass Manager ; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager +; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager ; CHECK-O2-NOT: Manager ; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll index 82deee9f367b..8f43029fa303 100644 --- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll +++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll @@ -22,30 +22,33 @@ define dso_local i32 @main() { ; CHECK-NEXT: bb: ; CHECK-NEXT: [[I6:%.*]] = load i32, i32* @a, align 4 ; CHECK-NEXT: [[I24:%.*]] = load i32, i32* @b, align 4 -; CHECK-NEXT: [[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4 -; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]] +; CHECK-NEXT: [[D_PROMOTED7:%.*]] = load i32, i32* @d, align 4 +; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]] ; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0 -; CHECK-NEXT: br label [[BB1:%.*]] -; CHECK: bb1: -; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.*]], label [[BB19_PREHEADER:%.*]] -; CHECK: bb19.preheader: +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.*]], label [[BB27_PREHEADER:%.*]] +; CHECK: bb27.preheader: ; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]] ; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4 ; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0 -; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]] -; CHECK: bb13.preheader.bb27.thread.split_crit_edge: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 +; CHECK-NEXT: br label [[BB27:%.*]] +; CHECK: bb27.thread: ; CHECK-NEXT: store i32 0, i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 0, i32* @c, align 4 ; CHECK-NEXT: br label [[BB32:%.*]] +; CHECK: bb27: +; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB36:%.*]] ; CHECK: bb32.loopexit: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: br label [[BB32]] ; CHECK: bb32: -; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ] +; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ] ; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4 ; CHECK-NEXT: ret i32 0 +; CHECK: bb36: +; CHECK-NEXT: store i32 1, i32* @c, align 4 +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]] ; bb: %i = alloca i32, align 4 diff --git a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll index 3e659414d982..4661bd8a36cc 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll @@ -16,32 +16,28 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; OLDPM-NEXT: entry: ; OLDPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; OLDPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; OLDPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; OLDPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; OLDPM: for.body7.lr.ph.i: ; OLDPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; OLDPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] -; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]] -; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; OLDPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]] +; OLDPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; OLDPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] +; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]] +; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; OLDPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]] +; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; OLDPM: for.body7.i: -; OLDPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[J_012_I:%.*]] = phi i32 [ [[INC_I:%.*]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; OLDPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]] +; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1 +; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 ; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] ; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; OLDPM: _ZN12FloatVecPair6vecIncEv.exit: @@ -51,39 +47,30 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; NEWPM-NEXT: entry: ; NEWPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; NEWPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; NEWPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; NEWPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; NEWPM: for.body7.lr.ph.i: ; NEWPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; NEWPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] -; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]] -; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; NEWPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]] -; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1 +; NEWPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; NEWPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] +; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]] +; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; NEWPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]] ; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; NEWPM: for.body7.i: -; NEWPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[INC_I_PHI:%.*]] = phi i32 [ [[INC_I_0:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; NEWPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]] +; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]] -; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]] -; NEWPM: for.body7.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1 -; NEWPM-NEXT: br label [[FOR_BODY7_I]] +; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 +; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] +; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; NEWPM: _ZN12FloatVecPair6vecIncEv.exit: ; NEWPM-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll index 280f849dbb35..8b8b535f1a77 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll @@ -15,18 +15,18 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-LABEL: @vdiv( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[N:%.*]], 0 -; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_END:%.*]] -; CHECK: for.body.lr.ph: +; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]] +; CHECK: for.body.preheader: ; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 ; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4 -; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.*]], label [[VECTOR_MEMCHECK:%.*]] +; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.*]], label [[VECTOR_MEMCHECK:%.*]] ; CHECK: vector.memcheck: ; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr double, double* [[X:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[SCEVGEP6:%.*]] = getelementptr double, double* [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt double* [[SCEVGEP6]], [[X]] ; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt double* [[SCEVGEP]], [[Y]] ; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] -; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]] +; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]] ; CHECK: vector.ph: ; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292 ; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[A:%.*]], i32 0 @@ -49,39 +49,39 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[NITER:%.*]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP9:%.*]] = bitcast double* [[TMP8]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7 ; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]] ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4 ; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP14:%.*]] = bitcast double* [[TMP13]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]] ; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP16]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8 ; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP18]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]] ; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP22:%.*]] = bitcast double* [[TMP21]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12 ; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP24:%.*]] = bitcast double* [[TMP23]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]] ; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP27:%.*]] = bitcast double* [[TMP26]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16 ; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4 ; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0 -; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] ; CHECK: middle.block.unr-lcssa: ; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0 @@ -94,78 +94,78 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[EPIL_ITER:%.*]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.*]], [[VECTOR_BODY_EPIL]] ] ; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP30:%.*]] = bitcast double* [[TMP29]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]] ; CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP33:%.*]] = bitcast double* [[TMP32]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4 ; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1 ; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]] -; CHECK: for.body.preheader: -; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]] +; CHECK: for.body.preheader8: +; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1 ; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 -; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0 -; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] +; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 +; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0 +; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] ; CHECK: for.body.prol.preheader: ; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]] ; CHECK: for.body.prol: ; CHECK-NEXT: [[INDVARS_IV_PROL:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.*]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ] -; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ] +; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ] ; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]] ; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1 ; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1 ; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]] ; CHECK: for.body.prol.loopexit: -; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] +; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] ; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3 -; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]] -; CHECK: for.body.preheader.new: +; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]] +; CHECK: for.body.preheader8.new: ; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]] ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV]] -; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1 ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]] ; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2 ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]] ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3 ; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]] ; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4 ; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]] ; CHECK: for.end: ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll index 7d7d18a5247d..bb320af193e3 100644 --- a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll +++ b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll @@ -76,18 +76,20 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_OLDPM: for.cond.preheader: +; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_OLDPM: for.body.preheader: ; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]] +; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_OLDPM: for.cond.cleanup: ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f2() ; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_OLDPM: for.body: -; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ] +; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f1() -; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1 +; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1 ; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]] ; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]] ; ROTATED_LATER_OLDPM: return: @@ -98,24 +100,24 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_NEWPM: for.cond.preheader: +; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_NEWPM: for.body.preheader: ; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]] -; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_NEWPM: for.cond.cleanup: ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f2() ; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_NEWPM: for.body: -; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ] +; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f1() ; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]] ; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]] ; ROTATED_LATER_NEWPM: return: ; ROTATED_LATER_NEWPM-NEXT: ret void </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 12 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit 880822255e21179e9706ebaf77fff9111d9d3844 Author: Tobias Gysi <gysit(a)google.com> Date: Wed Mar 24 14:22:17 2021 +0000 [mlir][linalg] Do not call region builder during vectorization. All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete. Differential Revision: https://reviews.llvm.org/D99168 </cut> Results regressed to (for first_bad == 880822255e21179e9706ebaf77fff9111d9d3844) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-880822255e21179e9706ebaf77fff9111d9d3844/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 104 from (for last_good == 92417ebbd10382436136ed5e755be567304ac139) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-92417ebbd10382436136ed5e755be567304ac139/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4267 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4268 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-880822255e21179e9706ebaf77fff9111d9d3844 cd investigate-llvm-880822255e21179e9706ebaf77fff9111d9d3844 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 880822255e21179e9706ebaf77fff9111d9d3844 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 92417ebbd10382436136ed5e755be567304ac139 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 880822255e21179e9706ebaf77fff9111d9d3844 Author: Tobias Gysi <gysit(a)google.com> Date: Wed Mar 24 14:22:17 2021 +0000 [mlir][linalg] Do not call region builder during vectorization. All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete. Differential Revision: https://reviews.llvm.org/D99168 --- .../Dialect/Linalg/Transforms/Vectorization.cpp | 31 ++++++---------------- 1 file changed, 8 insertions(+), 23 deletions(-) diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp index d4581013ae69..10562d68a9e0 100644 --- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp +++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp @@ -288,7 +288,7 @@ static AffineMap getTransferReadMap(LinalgOp linalgOp, unsigned argIndex) { /// Generic vectorization function that rewrites the body of a `linalgOp` into /// vector form. Generic vectorization proceeds as follows: -/// 1. The region for the linalg op is created if necessary. +/// 1. Verify the `linalgOp` has one non-empty region. /// 2. Values defined above the region are mapped to themselves and will be /// broadcasted on a per-need basis by their consumers. /// 3. Each region argument is vectorized into a vector.transfer_read (or 0-d @@ -299,36 +299,21 @@ static AffineMap getTransferReadMap(LinalgOp linalgOp, unsigned argIndex) { LogicalResult vectorizeAsLinalgGeneric( OpBuilder &builder, LinalgOp linalgOp, SmallVectorImpl<Value> &newResults, ArrayRef<CustomVectorizationHook> customVectorizationHooks = {}) { - // 1. Certain Linalg ops do not have a region but only a region builder. - // If so, build the region so we can vectorize. - std::unique_ptr<Region> owningRegion; - Region *region; - if (linalgOp->getNumRegions() > 0) { - region = &linalgOp->getRegion(0); - } else { - // RAII avoid remaining in block. - OpBuilder::InsertionGuard g(builder); - owningRegion = std::make_unique<Region>(); - region = owningRegion.get(); - Block *block = builder.createBlock(region); - auto elementTypes = llvm::to_vector<4>( - llvm::map_range(linalgOp.getShapedOperandTypes(), - [](ShapedType t) { return t.getElementType(); })); - block->addArguments(elementTypes); - linalgOp.getRegionBuilder()(*block, /*captures=*/{}); - } - Block *block = &region->front(); + // 1. Fail to vectorize if the operation does not have one non-empty region. + if (linalgOp->getNumRegions() != 1 || linalgOp->getRegion(0).empty()) + return failure(); + auto &block = linalgOp->getRegion(0).front(); BlockAndValueMapping bvm; // 2. Values defined above the region can only be broadcast for now. Make them // map to themselves. llvm::SetVector<Value> valuesSet; - mlir::getUsedValuesDefinedAbove(*region, valuesSet); + mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet); bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef()); // 3. Turn all BBArgs into vector.transfer_read / load. SmallVector<AffineMap> indexings; - for (auto bbarg : block->getArguments()) { + for (auto bbarg : block.getArguments()) { Value vectorArg = linalgOp.getShapedOperand(bbarg.getArgNumber()); AffineMap map; VectorType vectorType = extractVectorTypeFromShapedValue(vectorArg); @@ -360,7 +345,7 @@ LogicalResult vectorizeAsLinalgGeneric( hooks.push_back(vectorizeYield); // 5. Iteratively call `vectorizeOneOp` to each op in the slice. - for (Operation &op : block->getOperations()) { + for (Operation &op : block.getOperations()) { VectorizationResult result = vectorizeOneOp(builder, &op, bvm, hooks); if (result.status == VectorizationStatus::Failure) { LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: failed to vectorize: " << op); </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3_LTO - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 0237dbfdd38053cc190f814b6f92e311ae3509c6 Author: Chuanqi Xu <yedeng.yd(a)linux.alibaba.com> Date: Tue Jul 27 13:13:39 2021 +0800 [Coroutine] Record the elided coroutines Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105606 </cut> Results regressed to (for first_bad == 0237dbfdd38053cc190f814b6f92e311ae3509c6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-0237dbfdd38053cc190f814b6f92e311ae3509c6/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 104 from (for last_good == dbefcde6da1b58eb181dcbd8d7913175b2ec8350) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-dbefcde6da1b58eb181dcbd8d7913175b2ec8350/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4264 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4262 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-0237dbfdd38053cc190f814b6f92e311ae3509c6 cd investigate-llvm-0237dbfdd38053cc190f814b6f92e311ae3509c6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 0237dbfdd38053cc190f814b6f92e311ae3509c6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach dbefcde6da1b58eb181dcbd8d7913175b2ec8350 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 0237dbfdd38053cc190f814b6f92e311ae3509c6 Author: Chuanqi Xu <yedeng.yd(a)linux.alibaba.com> Date: Tue Jul 27 13:13:39 2021 +0800 [Coroutine] Record the elided coroutines Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105606 --- llvm/lib/Transforms/Coroutines/CoroElide.cpp | 28 ++++++++++++++++++++++ .../{coro-elide-count.ll => coro-elide-stat.ll} | 7 ++++++ 2 files changed, 35 insertions(+) diff --git a/llvm/lib/Transforms/Coroutines/CoroElide.cpp b/llvm/lib/Transforms/Coroutines/CoroElide.cpp index d35a5de6d4bd..84bebb7bf42d 100644 --- a/llvm/lib/Transforms/Coroutines/CoroElide.cpp +++ b/llvm/lib/Transforms/Coroutines/CoroElide.cpp @@ -17,6 +17,7 @@ #include "llvm/InitializePasses.h" #include "llvm/Pass.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/FileSystem.h" using namespace llvm; @@ -24,6 +25,12 @@ using namespace llvm; STATISTIC(NumOfCoroElided, "The # of coroutine get elided."); +#ifndef NDEBUG +static cl::opt<std::string> CoroElideInfoOutputFilename( + "coro-elide-info-output-file", cl::value_desc("filename"), + cl::desc("File to record the coroutines got elided"), cl::Hidden); +#endif + namespace { // Created on demand if the coro-elide pass has work to do. struct Lowerer : coro::LowererBase { @@ -121,6 +128,21 @@ static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) { llvm_unreachable("no terminator in the entry block"); } +#ifndef NDEBUG +static std::unique_ptr<raw_fd_ostream> getOrCreateLogFile() { + assert(!CoroElideInfoOutputFilename.empty() && + "coro-elide-info-output-file shouldn't be empty"); + std::error_code EC; + auto Result = std::make_unique<raw_fd_ostream>(CoroElideInfoOutputFilename, + EC, sys::fs::OF_Append); + if (!EC) + return Result; + llvm::errs() << "Error opening coro-elide-info-output-file '" + << CoroElideInfoOutputFilename << " for appending!\n"; + return std::make_unique<raw_fd_ostream>(2, false); // stderr. +} +#endif + // To elide heap allocations we need to suppress code blocks guarded by // llvm.coro.alloc and llvm.coro.free instructions. void Lowerer::elideHeapAllocations(Function *F, uint64_t FrameSize, @@ -344,6 +366,12 @@ bool Lowerer::processCoroId(CoroIdInst *CoroId, AAResults &AA, FrameSizeAndAlign.second, AA); coro::replaceCoroFree(CoroId, /*Elide=*/true); NumOfCoroElided++; +#ifndef NDEBUG + if (!CoroElideInfoOutputFilename.empty()) + *getOrCreateLogFile() + << "Elide " << CoroId->getCoroutine()->getName() << " in " + << CoroId->getFunction()->getName() << "\n"; +#endif } return true; diff --git a/llvm/test/Transforms/Coroutines/coro-elide-count.ll b/llvm/test/Transforms/Coroutines/coro-elide-stat.ll similarity index 92% rename from llvm/test/Transforms/Coroutines/coro-elide-count.ll rename to llvm/test/Transforms/Coroutines/coro-elide-stat.ll index ae40a74f41d5..e92d484786c5 100644 --- a/llvm/test/Transforms/Coroutines/coro-elide-count.ll +++ b/llvm/test/Transforms/Coroutines/coro-elide-stat.ll @@ -4,8 +4,15 @@ ; RUN: opt < %s -S \ ; RUN: -passes='cgscc(repeat<2>(inline,function(coro-elide,dce)))' -stats 2>&1 \ ; RUN: | FileCheck %s +; RUN: opt < %s --disable-output \ +; RUN: -passes='cgscc(repeat<2>(inline,function(coro-elide,dce)))' \ +; RUN: -coro-elide-info-output-file=%t && \ +; RUN: cat %t \ +; RUN: | FileCheck %s --check-prefix=FILE ; CHECK: 2 coro-elide - The # of coroutine get elided. +; FILE: Elide f in callResume +; FILE: Elide f in callResumeMultiRetDommmed declare void @print(i32) nounwind </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 8 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO Culprit: <cut> commit bb1e5399e4586239d6424f5eea5a9f06c52ebe9b Author: Vitaly Buka <vitalybuka(a)google.com> Date: Fri Apr 2 00:58:09 2021 -0700 [NFC][scudo] Inline some functions into ScudoPrimaryTest </cut> Results regressed to (for first_bad == bb1e5399e4586239d6424f5eea5a9f06c52ebe9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-bb1e5399e4586239d6424f5eea5a9f06c52ebe9b/results_id: 1 # 462.libquantum,libquantum_base.default regressed by 105 from (for last_good == f343a730596b6b02039a91d71dc16c113d09cfe6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-baseline/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/baseline-llvm-release-arm-spec2k6-O3_LTO/4201 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/4234 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-bb1e5399e4586239d6424f5eea5a9f06c52ebe9b cd investigate-llvm-bb1e5399e4586239d6424f5eea5a9f06c52ebe9b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach bb1e5399e4586239d6424f5eea5a9f06c52ebe9b ../artifacts/test.sh # Reproduce last_good build git checkout --detach f343a730596b6b02039a91d71dc16c113d09cfe6 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit bb1e5399e4586239d6424f5eea5a9f06c52ebe9b Author: Vitaly Buka <vitalybuka(a)google.com> Date: Fri Apr 2 00:58:09 2021 -0700 [NFC][scudo] Inline some functions into ScudoPrimaryTest --- .../lib/scudo/standalone/tests/primary_test.cpp | 173 +++++++++------------ 1 file changed, 71 insertions(+), 102 deletions(-) diff --git a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp index 07f3d6b77c17..2707985b5c72 100644 --- a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp +++ b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp @@ -21,37 +21,6 @@ // 32-bit architectures. It's not something we want to encourage, but we still // should ensure the tests pass. -template <typename Primary> static void testPrimary() { - const scudo::uptr NumberOfAllocations = 32U; - auto Deleter = [](Primary *P) { - P->unmapTestOnly(); - delete P; - }; - std::unique_ptr<Primary, decltype(Deleter)> Allocator(new Primary, Deleter); - Allocator->init(/*ReleaseToOsInterval=*/-1); - typename Primary::CacheT Cache; - Cache.init(nullptr, Allocator.get()); - for (scudo::uptr I = 0; I <= 16U; I++) { - const scudo::uptr Size = 1UL << I; - if (!Primary::canAllocate(Size)) - continue; - const scudo::uptr ClassId = Primary::SizeClassMap::getClassIdBySize(Size); - void *Pointers[NumberOfAllocations]; - for (scudo::uptr J = 0; J < NumberOfAllocations; J++) { - void *P = Cache.allocate(ClassId); - memset(P, 'B', Size); - Pointers[J] = P; - } - for (scudo::uptr J = 0; J < NumberOfAllocations; J++) - Cache.deallocate(ClassId, Pointers[J]); - } - Cache.destroy(nullptr); - Allocator->releaseToOS(); - scudo::ScopedString Str(1024); - Allocator->getStats(&Str); - Str.output(); -} - struct TestConfig1 { static const scudo::uptr PrimaryRegionSizeLog = 18U; static const scudo::s32 PrimaryMinReleaseToOsIntervalMs = INT32_MIN; @@ -84,13 +53,16 @@ struct Config : public BaseConfig { using SizeClassMap = SizeClassMapT; }; -template <typename BaseConfig, typename SizeClassMapT> struct MakeAllocator { - using Value = scudo::SizeClassAllocator64<Config<BaseConfig, SizeClassMapT>>; -}; - +template <typename BaseConfig, typename SizeClassMapT> +struct SizeClassAllocator + : public scudo::SizeClassAllocator64<Config<BaseConfig, SizeClassMapT>> {}; template <typename SizeClassMapT> -struct MakeAllocator<TestConfig1, SizeClassMapT> { - using Value = scudo::SizeClassAllocator32<Config<TestConfig1, SizeClassMapT>>; +struct SizeClassAllocator<TestConfig1, SizeClassMapT> + : public scudo::SizeClassAllocator32<Config<TestConfig1, SizeClassMapT>> {}; + +template <typename BaseConfig, typename SizeClassMapT> +struct TestAllocator : public SizeClassAllocator<BaseConfig, SizeClassMapT> { + ~TestAllocator() { this->unmapTestOnly(); } }; namespace testing { @@ -114,8 +86,31 @@ using ScudoPrimaryTestTypes = testing::Types< TYPED_TEST_CASE(ScudoPrimaryTest, ScudoPrimaryTestTypes); TYPED_TEST(ScudoPrimaryTest, BasicPrimary) { - using SizeClassMap = scudo::DefaultSizeClassMap; - testPrimary<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); + using Primary = TestAllocator<TypeParam, scudo::DefaultSizeClassMap>; + std::unique_ptr<Primary> Allocator(new Primary); + Allocator->init(/*ReleaseToOsInterval=*/-1); + typename Primary::CacheT Cache; + Cache.init(nullptr, Allocator.get()); + const scudo::uptr NumberOfAllocations = 32U; + for (scudo::uptr I = 0; I <= 16U; I++) { + const scudo::uptr Size = 1UL << I; + if (!Primary::canAllocate(Size)) + continue; + const scudo::uptr ClassId = Primary::SizeClassMap::getClassIdBySize(Size); + void *Pointers[NumberOfAllocations]; + for (scudo::uptr J = 0; J < NumberOfAllocations; J++) { + void *P = Cache.allocate(ClassId); + memset(P, 'B', Size); + Pointers[J] = P; + } + for (scudo::uptr J = 0; J < NumberOfAllocations; J++) + Cache.deallocate(ClassId, Pointers[J]); + } + Cache.destroy(nullptr); + Allocator->releaseToOS(); + scudo::ScopedString Str(1024); + Allocator->getStats(&Str); + Str.output(); } struct SmallRegionsConfig { @@ -166,12 +161,9 @@ TEST(ScudoPrimaryTest, Primary64OOM) { Allocator.unmapTestOnly(); } -template <typename Primary> static void testIteratePrimary() { - auto Deleter = [](Primary *P) { - P->unmapTestOnly(); - delete P; - }; - std::unique_ptr<Primary, decltype(Deleter)> Allocator(new Primary, Deleter); +TYPED_TEST(ScudoPrimaryTest, PrimaryIterate) { + using Primary = TestAllocator<TypeParam, scudo::DefaultSizeClassMap>; + std::unique_ptr<Primary> Allocator(new Primary); Allocator->init(/*ReleaseToOsInterval=*/-1); typename Primary::CacheT Cache; Cache.init(nullptr, Allocator.get()); @@ -205,50 +197,40 @@ template <typename Primary> static void testIteratePrimary() { Str.output(); } -TYPED_TEST(ScudoPrimaryTest, PrimaryIterate) { - using SizeClassMap = scudo::DefaultSizeClassMap; - testIteratePrimary<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); -} - -static std::mutex Mutex; -static std::condition_variable Cv; -static bool Ready; - -template <typename Primary> static void performAllocations(Primary *Allocator) { - static thread_local typename Primary::CacheT Cache; - Cache.init(nullptr, Allocator); - std::vector<std::pair<scudo::uptr, void *>> V; - { - std::unique_lock<std::mutex> Lock(Mutex); - while (!Ready) - Cv.wait(Lock); - } - for (scudo::uptr I = 0; I < 256U; I++) { - const scudo::uptr Size = std::rand() % Primary::SizeClassMap::MaxSize / 4; - const scudo::uptr ClassId = Primary::SizeClassMap::getClassIdBySize(Size); - void *P = Cache.allocate(ClassId); - if (P) - V.push_back(std::make_pair(ClassId, P)); - } - while (!V.empty()) { - auto Pair = V.back(); - Cache.deallocate(Pair.first, Pair.second); - V.pop_back(); - } - Cache.destroy(nullptr); -} - -template <typename Primary> static void testPrimaryThreaded() { - Ready = false; - auto Deleter = [](Primary *P) { - P->unmapTestOnly(); - delete P; - }; - std::unique_ptr<Primary, decltype(Deleter)> Allocator(new Primary, Deleter); +TYPED_TEST(ScudoPrimaryTest, PrimaryThreaded) { + using Primary = TestAllocator<TypeParam, scudo::SvelteSizeClassMap>; + std::unique_ptr<Primary> Allocator(new Primary); Allocator->init(/*ReleaseToOsInterval=*/-1); + std::mutex Mutex; + std::condition_variable Cv; + bool Ready = false; std::thread Threads[32]; for (scudo::uptr I = 0; I < ARRAY_SIZE(Threads); I++) - Threads[I] = std::thread(performAllocations<Primary>, Allocator.get()); + Threads[I] = std::thread([&]() { + static thread_local typename Primary::CacheT Cache; + Cache.init(nullptr, Allocator.get()); + std::vector<std::pair<scudo::uptr, void *>> V; + { + std::unique_lock<std::mutex> Lock(Mutex); + while (!Ready) + Cv.wait(Lock); + } + for (scudo::uptr I = 0; I < 256U; I++) { + const scudo::uptr Size = + std::rand() % Primary::SizeClassMap::MaxSize / 4; + const scudo::uptr ClassId = + Primary::SizeClassMap::getClassIdBySize(Size); + void *P = Cache.allocate(ClassId); + if (P) + V.push_back(std::make_pair(ClassId, P)); + } + while (!V.empty()) { + auto Pair = V.back(); + Cache.deallocate(Pair.first, Pair.second); + V.pop_back(); + } + Cache.destroy(nullptr); + }); { std::unique_lock<std::mutex> Lock(Mutex); Ready = true; @@ -262,20 +244,12 @@ template <typename Primary> static void testPrimaryThreaded() { Str.output(); } -TYPED_TEST(ScudoPrimaryTest, PrimaryThreaded) { - using SizeClassMap = scudo::SvelteSizeClassMap; - testPrimaryThreaded<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); -} - // Through a simple allocation that spans two pages, verify that releaseToOS // actually releases some bytes (at least one page worth). This is a regression // test for an error in how the release criteria were computed. -template <typename Primary> static void testReleaseToOS() { - auto Deleter = [](Primary *P) { - P->unmapTestOnly(); - delete P; - }; - std::unique_ptr<Primary, decltype(Deleter)> Allocator(new Primary, Deleter); +TYPED_TEST(ScudoPrimaryTest, ReleaseToOS) { + using Primary = TestAllocator<TypeParam, scudo::DefaultSizeClassMap>; + std::unique_ptr<Primary> Allocator(new Primary); Allocator->init(/*ReleaseToOsInterval=*/-1); typename Primary::CacheT Cache; Cache.init(nullptr, Allocator.get()); @@ -288,8 +262,3 @@ template <typename Primary> static void testReleaseToOS() { Cache.destroy(nullptr); EXPECT_GT(Allocator->releaseToOS(), 0U); } - -TYPED_TEST(ScudoPrimaryTest, ReleaseToOS) { - using SizeClassMap = scudo::DefaultSizeClassMap; - testReleaseToOS<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); -} </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3_LTO - Build # 8 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b Author: Carl Ritson <carl.ritson(a)amd.com> Date: Tue May 11 12:14:01 2021 +0900 [AMDGPU] Pre-commit tests for D102211 </cut> Results regressed to (for first_bad == ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b/results_id: 1 # 473.astar,astar_base.default regressed by 104 from (for last_good == d8ec2b183e9243366e3a0cd1116dbe879856b333) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-d8ec2b183e9243366e3a0cd1116dbe879856b333/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4218 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/4219 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b cd investigate-llvm-ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b ../artifacts/test.sh # Reproduce last_good build git checkout --detach d8ec2b183e9243366e3a0cd1116dbe879856b333 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit ad558a4ff7cd61081cfeaabff1dbc8c0a9afa92b Author: Carl Ritson <carl.ritson(a)amd.com> Date: Tue May 11 12:14:01 2021 +0900 [AMDGPU] Pre-commit tests for D102211 --- llvm/test/CodeGen/AMDGPU/hard-clauses.mir | 36 +++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/llvm/test/CodeGen/AMDGPU/hard-clauses.mir b/llvm/test/CodeGen/AMDGPU/hard-clauses.mir index 506f9a77c177..e6ca33341bfb 100644 --- a/llvm/test/CodeGen/AMDGPU/hard-clauses.mir +++ b/llvm/test/CodeGen/AMDGPU/hard-clauses.mir @@ -209,3 +209,39 @@ body: | $vgpr79 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 316, 0, 0, 0, implicit $exec $vgpr80 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 320, 0, 0, 0, implicit $exec ... + +--- +name: mimg_nsa +tracksRegLiveness: true +body: | + bb.0: + liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK-LABEL: name: mimg_nsa + ; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK: BUNDLE implicit-def $vgpr10_vgpr11_vgpr12_vgpr13, implicit-def $vgpr10, implicit-def $vgpr10_lo16, implicit-def $vgpr10_hi16, implicit-def $vgpr11, implicit-def $vgpr11_lo16, implicit-def $vgpr11_hi16, implicit-def $vgpr12, implicit-def $vgpr12_lo16, implicit-def $vgpr12_hi16, implicit-def $vgpr13, implicit-def $vgpr13_lo16, implicit-def $vgpr13_hi16, implicit-def $vgpr10_vgpr11, implicit-def $vgpr10_vgpr11_vgpr12, implicit-def $vgpr11_vgpr12, implicit-def $vgpr11_vgpr12_vgpr13, implicit-def $vgpr12_vgpr13, implicit-def $vgpr20_vgpr21_vgpr22_vgpr23, implicit-def $vgpr20, implicit-def $vgpr20_lo16, implicit-def $vgpr20_hi16, implicit-def $vgpr21, implicit-def $vgpr21_lo16, implicit-def $vgpr21_hi16, implicit-def $vgpr22, implicit-def $vgpr22_lo16, implicit-def $vgpr22_hi16, implicit-def $vgpr23, implicit-def $vgpr23_lo16, implicit-def $vgpr23_hi16, implicit-def $vgpr20_vgpr21, implicit-def $vgpr20_vgpr21_vgpr22, implicit-def $vgpr21_vgpr22, implicit-def $vgpr21_vgpr22_vgpr23, implicit-def $vgpr22_vgpr23, implicit $vgpr3, implicit $vgpr8, implicit $vgpr7, implicit $vgpr5, implicit $vgpr4, implicit $vgpr6, implicit $vgpr0, implicit $vgpr2, implicit $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, implicit $sgpr8_sgpr9_sgpr10_sgpr11, implicit $exec { + ; CHECK: S_CLAUSE 1 + ; CHECK: $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: } + $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) +... + +--- +name: mimg_nsa_mixed +tracksRegLiveness: true +body: | + bb.0: + liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK-LABEL: name: mimg_nsa_mixed + ; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8 + ; CHECK: BUNDLE implicit-def $vgpr10_vgpr11_vgpr12_vgpr13, implicit-def $vgpr10, implicit-def $vgpr10_lo16, implicit-def $vgpr10_hi16, implicit-def $vgpr11, implicit-def $vgpr11_lo16, implicit-def $vgpr11_hi16, implicit-def $vgpr12, implicit-def $vgpr12_lo16, implicit-def $vgpr12_hi16, implicit-def $vgpr13, implicit-def $vgpr13_lo16, implicit-def $vgpr13_hi16, implicit-def $vgpr10_vgpr11, implicit-def $vgpr10_vgpr11_vgpr12, implicit-def $vgpr11_vgpr12, implicit-def $vgpr11_vgpr12_vgpr13, implicit-def $vgpr12_vgpr13, implicit-def $vgpr14, implicit-def $vgpr14_lo16, implicit-def $vgpr14_hi16, implicit-def $vgpr20_vgpr21_vgpr22_vgpr23, implicit-def $vgpr20, implicit-def $vgpr20_lo16, implicit-def $vgpr20_hi16, implicit-def $vgpr21, implicit-def $vgpr21_lo16, implicit-def $vgpr21_hi16, implicit-def $vgpr22, implicit-def $vgpr22_lo16, implicit-def $vgpr22_hi16, implicit-def $vgpr23, implicit-def $vgpr23_lo16, implicit-def $vgpr23_hi16, implicit-def $vgpr20_vgpr21, implicit-def $vgpr20_vgpr21_vgpr22, implicit-def $vgpr21_vgpr22, implicit-def $vgpr21_vgpr22_vgpr23, implicit-def $vgpr22_vgpr23, implicit $vgpr3, implicit $vgpr8, implicit $vgpr7, implicit $vgpr5, implicit $vgpr4, implicit $vgpr6, implicit $vgpr0, implicit $vgpr2, implicit $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, implicit $sgpr8_sgpr9_sgpr10_sgpr11, implicit $exec, implicit $vgpr5_vgpr6 { + ; CHECK: S_CLAUSE 2 + ; CHECK: $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: $vgpr14 = IMAGE_SAMPLE_LZ_V1_V2_gfx10 $vgpr5_vgpr6, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 16 from custom "ImageResource") + ; CHECK: $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + ; CHECK: } + $vgpr10_vgpr11_vgpr12_vgpr13 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) + $vgpr14 = IMAGE_SAMPLE_LZ_V1_V2_gfx10 $vgpr5_vgpr6, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 1, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 16 from custom "ImageResource") + $vgpr20_vgpr21_vgpr22_vgpr23 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 $vgpr3, $vgpr8, $vgpr7, $vgpr5, $vgpr4, $vgpr6, $vgpr0, $vgpr2, $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16) +... </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3 - Build # 11 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3 Culprit: <cut> commit 99203f2004d031f2ef22f01e3c569d2775de1836 Author: Alexey Bataev <a.bataev(a)outlook.com> Date: Tue Mar 23 13:22:58 2021 -0700 [Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967 </cut> Results regressed to (for first_bad == 99203f2004d031f2ef22f01e3c569d2775de1836) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-99203f2004d031f2ef22f01e3c569d2775de1836/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 103 from (for last_good == 4157a079afbf7fa5c3ce3ac0e9f4541f89188ae2) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-4157a079afbf7fa5c3ce3ac0e9f4541f89188ae2/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4192 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4194 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-99203f2004d031f2ef22f01e3c569d2775de1836 cd investigate-llvm-99203f2004d031f2ef22f01e3c569d2775de1836 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 99203f2004d031f2ef22f01e3c569d2775de1836 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 4157a079afbf7fa5c3ce3ac0e9f4541f89188ae2 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 99203f2004d031f2ef22f01e3c569d2775de1836 Author: Alexey Bataev <a.bataev(a)outlook.com> Date: Tue Mar 23 13:22:58 2021 -0700 [Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967 --- llvm/include/llvm/Analysis/LoopAccessAnalysis.h | 9 + llvm/lib/Analysis/LoopAccessAnalysis.cpp | 198 ++++++++++------------ llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 79 ++++++--- llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll | 17 +- 4 files changed, 163 insertions(+), 140 deletions(-) diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h index 13fbe884eddf..39acfd5bbbee 100644 --- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h +++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h @@ -679,6 +679,15 @@ int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr, const Loop *Lp, const ValueToValueMap &StridesMap = ValueToValueMap(), bool Assume = false, bool ShouldCheckWrap = true); +/// Returns the distance between the pointers \p PtrA and \p PtrB iff they are +/// compatible and it is possible to calculate the distance between them. This +/// is a simple API that does not depend on the analysis pass. +/// \param StrictCheck Ensure that the calculated distance matches the +/// type-based one after all the bitcasts removal in the provided pointers. +Optional<int> getPointersDiff(Value *PtrA, Value *PtrB, const DataLayout &DL, + ScalarEvolution &SE, bool StrictCheck = false, + bool CheckType = true); + /// Attempt to sort the pointers in \p VL and return the sorted indices /// in \p SortedIndices, if reordering is required. /// diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index e632fe25c24c..997d4474a448 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -1124,139 +1124,123 @@ int64_t llvm::getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr, return Stride; } +Optional<int> llvm::getPointersDiff(Value *PtrA, Value *PtrB, + const DataLayout &DL, ScalarEvolution &SE, + bool StrictCheck, bool CheckType) { + assert(PtrA && PtrB && "Expected non-nullptr pointers."); + // Make sure that A and B are different pointers. + if (PtrA == PtrB) + return 0; + + // Make sure that PtrA and PtrB have the same type if required + if (CheckType && PtrA->getType() != PtrB->getType()) + return None; + + unsigned ASA = PtrA->getType()->getPointerAddressSpace(); + unsigned ASB = PtrB->getType()->getPointerAddressSpace(); + + // Check that the address spaces match. + if (ASA != ASB) + return None; + unsigned IdxWidth = DL.getIndexSizeInBits(ASA); + + APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0); + Value *PtrA1 = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA); + Value *PtrB1 = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB); + + int Val; + if (PtrA1 == PtrB1) { + // Retrieve the address space again as pointer stripping now tracks through + // `addrspacecast`. + ASA = cast<PointerType>(PtrA1->getType())->getAddressSpace(); + ASB = cast<PointerType>(PtrB1->getType())->getAddressSpace(); + // Check that the address spaces match and that the pointers are valid. + if (ASA != ASB) + return None; + + IdxWidth = DL.getIndexSizeInBits(ASA); + OffsetA = OffsetA.sextOrTrunc(IdxWidth); + OffsetB = OffsetB.sextOrTrunc(IdxWidth); + + OffsetB -= OffsetA; + Val = OffsetB.getSExtValue(); + } else { + // Otherwise compute the distance with SCEV between the base pointers. + const SCEV *PtrSCEVA = SE.getSCEV(PtrA); + const SCEV *PtrSCEVB = SE.getSCEV(PtrB); + const auto *Diff = + dyn_cast<SCEVConstant>(SE.getMinusSCEV(PtrSCEVB, PtrSCEVA)); + if (!Diff) + return None; + Val = Diff->getAPInt().getSExtValue(); + } + Type *Ty = cast<PointerType>(PtrA->getType())->getElementType(); + int Size = DL.getTypeStoreSize(Ty); + int Dist = Val / Size; + + // Ensure that the calculated distance matches the type-based one after all + // the bitcasts removal in the provided pointers. + if (!StrictCheck || Dist * Size == Val) + return Dist; + return None; +} + bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL, ScalarEvolution &SE, SmallVectorImpl<unsigned> &SortedIndices) { assert(llvm::all_of( VL, [](const Value *V) { return V->getType()->isPointerTy(); }) && "Expected list of pointer operands."); - SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs; - OffValPairs.reserve(VL.size()); - // Walk over the pointers, and map each of them to an offset relative to // first pointer in the array. Value *Ptr0 = VL[0]; - const SCEV *Scev0 = SE.getSCEV(Ptr0); - Value *Obj0 = getUnderlyingObject(Ptr0); - - llvm::SmallSet<int64_t, 4> Offsets; - for (auto *Ptr : VL) { - // TODO: Outline this code as a special, more time consuming, version of - // computeConstantDifference() function. - if (Ptr->getType()->getPointerAddressSpace() != - Ptr0->getType()->getPointerAddressSpace()) - return false; - // If a pointer refers to a different underlying object, bail - the - // pointers are by definition incomparable. - Value *CurrObj = getUnderlyingObject(Ptr); - if (CurrObj != Obj0) - return false; - const SCEV *Scev = SE.getSCEV(Ptr); - const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev, Scev0)); - // The pointers may not have a constant offset from each other, or SCEV - // may just not be smart enough to figure out they do. Regardless, - // there's nothing we can do. + using DistOrdPair = std::pair<int64_t, int>; + auto Compare = [](const DistOrdPair &L, const DistOrdPair &R) { + return L.first < R.first; + }; + std::set<DistOrdPair, decltype(Compare)> Offsets(Compare); + Offsets.emplace(0, 0); + int Cnt = 1; + bool IsConsecutive = true; + for (auto *Ptr : VL.drop_front()) { + Optional<int> Diff = getPointersDiff(Ptr0, Ptr, DL, SE); if (!Diff) return false; // Check if the pointer with the same offset is found. - int64_t Offset = Diff->getAPInt().getSExtValue(); - if (!Offsets.insert(Offset).second) + int64_t Offset = *Diff; + auto Res = Offsets.emplace(Offset, Cnt); + if (!Res.second) return false; - OffValPairs.emplace_back(Offset, Ptr); + // Consecutive order if the inserted element is the last one. + IsConsecutive = IsConsecutive && std::next(Res.first) == Offsets.end(); + ++Cnt; } SortedIndices.clear(); - SortedIndices.resize(VL.size()); - std::iota(SortedIndices.begin(), SortedIndices.end(), 0); - - // Sort the memory accesses and keep the order of their uses in UseOrder. - llvm::stable_sort(SortedIndices, [&](unsigned Left, unsigned Right) { - return OffValPairs[Left].first < OffValPairs[Right].first; - }); - - // Check if the order is consecutive already. - if (llvm::all_of(SortedIndices, [&SortedIndices](const unsigned I) { - return I == SortedIndices[I]; - })) - SortedIndices.clear(); - + if (!IsConsecutive) { + // Fill SortedIndices array only if it is non-consecutive. + SortedIndices.resize(VL.size()); + Cnt = 0; + for (const std::pair<int64_t, int> &Pair : Offsets) { + IsConsecutive = IsConsecutive && Cnt == Pair.second; + SortedIndices[Cnt] = Pair.second; + ++Cnt; + } + } return true; } -/// Take the address space operand from the Load/Store instruction. -/// Returns -1 if this is not a valid Load/Store instruction. -static unsigned getAddressSpaceOperand(Value *I) { - if (LoadInst *L = dyn_cast<LoadInst>(I)) - return L->getPointerAddressSpace(); - if (StoreInst *S = dyn_cast<StoreInst>(I)) - return S->getPointerAddressSpace(); - return -1; -} - /// Returns true if the memory operations \p A and \p B are consecutive. bool llvm::isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL, ScalarEvolution &SE, bool CheckType) { Value *PtrA = getLoadStorePointerOperand(A); Value *PtrB = getLoadStorePointerOperand(B); - unsigned ASA = getAddressSpaceOperand(A); - unsigned ASB = getAddressSpaceOperand(B); - - // Check that the address spaces match and that the pointers are valid. - if (!PtrA || !PtrB || (ASA != ASB)) - return false; - - // Make sure that A and B are different pointers. - if (PtrA == PtrB) - return false; - - // Make sure that A and B have the same type if required. - if (CheckType && PtrA->getType() != PtrB->getType()) + if (!PtrA || !PtrB) return false; - - unsigned IdxWidth = DL.getIndexSizeInBits(ASA); - Type *Ty = cast<PointerType>(PtrA->getType())->getElementType(); - - APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0); - PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA); - PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB); - - // Retrieve the address space again as pointer stripping now tracks through - // `addrspacecast`. - ASA = cast<PointerType>(PtrA->getType())->getAddressSpace(); - ASB = cast<PointerType>(PtrB->getType())->getAddressSpace(); - // Check that the address spaces match and that the pointers are valid. - if (ASA != ASB) - return false; - - IdxWidth = DL.getIndexSizeInBits(ASA); - OffsetA = OffsetA.sextOrTrunc(IdxWidth); - OffsetB = OffsetB.sextOrTrunc(IdxWidth); - - APInt Size(IdxWidth, DL.getTypeStoreSize(Ty)); - - // OffsetDelta = OffsetB - OffsetA; - const SCEV *OffsetSCEVA = SE.getConstant(OffsetA); - const SCEV *OffsetSCEVB = SE.getConstant(OffsetB); - const SCEV *OffsetDeltaSCEV = SE.getMinusSCEV(OffsetSCEVB, OffsetSCEVA); - const APInt &OffsetDelta = cast<SCEVConstant>(OffsetDeltaSCEV)->getAPInt(); - - // Check if they are based on the same pointer. That makes the offsets - // sufficient. - if (PtrA == PtrB) - return OffsetDelta == Size; - - // Compute the necessary base pointer delta to have the necessary final delta - // equal to the size. - // BaseDelta = Size - OffsetDelta; - const SCEV *SizeSCEV = SE.getConstant(Size); - const SCEV *BaseDelta = SE.getMinusSCEV(SizeSCEV, OffsetDeltaSCEV); - - // Otherwise compute the distance with SCEV between the base pointers. - const SCEV *PtrSCEVA = SE.getSCEV(PtrA); - const SCEV *PtrSCEVB = SE.getSCEV(PtrB); - const SCEV *X = SE.getAddExpr(PtrSCEVA, BaseDelta); - return X == PtrSCEVB; + Optional<int> Diff = + getPointersDiff(PtrA, PtrB, DL, SE, /*StrictCheck=*/true, CheckType); + return Diff && *Diff == 1; } MemoryDepChecker::VectorizationSafetyStatus diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 385b6f30dc0f..78d2ea0032db 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -941,10 +941,16 @@ public: ScalarEvolution &SE) { auto *LI1 = dyn_cast<LoadInst>(V1); auto *LI2 = dyn_cast<LoadInst>(V2); - if (LI1 && LI2) - return isConsecutiveAccess(LI1, LI2, DL, SE) - ? VLOperands::ScoreConsecutiveLoads - : VLOperands::ScoreFail; + if (LI1 && LI2) { + if (LI1->getParent() != LI2->getParent()) + return VLOperands::ScoreFail; + + Optional<int> Dist = + getPointersDiff(LI1->getPointerOperand(), LI2->getPointerOperand(), + DL, SE, /*StrictCheck=*/true); + return (Dist && *Dist == 1) ? VLOperands::ScoreConsecutiveLoads + : VLOperands::ScoreFail; + } auto *C1 = dyn_cast<Constant>(V1); auto *C2 = dyn_cast<Constant>(V2); @@ -2871,13 +2877,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth, Ptr0 = PointerOps[CurrentOrder.front()]; PtrN = PointerOps[CurrentOrder.back()]; } - const SCEV *Scev0 = SE->getSCEV(Ptr0); - const SCEV *ScevN = SE->getSCEV(PtrN); - const auto *Diff = - dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0)); - uint64_t Size = DL->getTypeAllocSize(ScalarTy); + Optional<int> Diff = getPointersDiff(Ptr0, PtrN, *DL, *SE); // Check that the sorted loads are consecutive. - if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) { + if (static_cast<unsigned>(*Diff) == VL.size() - 1) { if (CurrentOrder.empty()) { // Original loads are consecutive and does not require reordering. ++NumOpsWantToKeepOriginalOrder; @@ -3150,13 +3152,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth, Ptr0 = PointerOps[CurrentOrder.front()]; PtrN = PointerOps[CurrentOrder.back()]; } - const SCEV *Scev0 = SE->getSCEV(Ptr0); - const SCEV *ScevN = SE->getSCEV(PtrN); - const auto *Diff = - dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0)); - uint64_t Size = DL->getTypeAllocSize(ScalarTy); + Optional<int> Dist = getPointersDiff(Ptr0, PtrN, *DL, *SE); // Check that the sorted pointer operands are consecutive. - if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) { + if (static_cast<unsigned>(*Dist) == VL.size() - 1) { if (CurrentOrder.empty()) { // Original stores are consecutive and does not require reordering. ++NumOpsWantToKeepOriginalOrder; @@ -6107,20 +6105,41 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores, int E = Stores.size(); SmallBitVector Tails(E, false); - SmallVector<int, 16> ConsecutiveChain(E, E + 1); int MaxIter = MaxStoreLookup.getValue(); + SmallVector<std::pair<int, int>, 16> ConsecutiveChain( + E, std::make_pair(E, INT_MAX)); + SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E, false)); int IterCnt; auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt, MaxIter, + &CheckedPairs, &ConsecutiveChain](int K, int Idx) { if (IterCnt >= MaxIter) return true; + if (CheckedPairs[Idx].test(K)) + return ConsecutiveChain[K].second == 1 && + ConsecutiveChain[K].first == Idx; ++IterCnt; - if (!isConsecutiveAccess(Stores[K], Stores[Idx], *DL, *SE)) + CheckedPairs[Idx].set(K); + CheckedPairs[K].set(Idx); + Optional<int> Diff = getPointersDiff(Stores[K]->getPointerOperand(), + Stores[Idx]->getPointerOperand(), *DL, + *SE, /*StrictCheck=*/true); + if (!Diff || *Diff == 0) + return false; + int Val = *Diff; + if (Val < 0) { + if (ConsecutiveChain[Idx].second > -Val) { + Tails.set(K); + ConsecutiveChain[Idx] = std::make_pair(K, -Val); + } + return false; + } + if (ConsecutiveChain[K].second <= Val) return false; Tails.set(Idx); - ConsecutiveChain[K] = Idx; - return true; + ConsecutiveChain[K] = std::make_pair(Idx, Val); + return Val == 1; }; // Do a quadratic search on all of the given stores in reverse order and find // all of the pairs of stores that follow each other. @@ -6140,17 +6159,31 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores, // For stores that start but don't end a link in the chain: for (int Cnt = E; Cnt > 0; --Cnt) { int I = Cnt - 1; - if (ConsecutiveChain[I] == E + 1 || Tails.test(I)) + if (ConsecutiveChain[I].first == E || Tails.test(I)) continue; // We found a store instr that starts a chain. Now follow the chain and try // to vectorize it. BoUpSLP::ValueList Operands; // Collect the chain into a list. - while (I != E + 1 && !VectorizedStores.count(Stores[I])) { + while (I != E && !VectorizedStores.count(Stores[I])) { Operands.push_back(Stores[I]); + Tails.set(I); + if (ConsecutiveChain[I].second != 1) { + // Mark the new end in the chain and go back, if required. It might be + // required if the original stores come in reversed order, for example. + if (ConsecutiveChain[I].first != E && + Tails.test(ConsecutiveChain[I].first) && + !VectorizedStores.count(Stores[ConsecutiveChain[I].first])) { + Tails.reset(ConsecutiveChain[I].first); + if (Cnt < ConsecutiveChain[I].first + 2) + Cnt = ConsecutiveChain[I].first + 2; + } + break; + } // Move to the next value in the chain. - I = ConsecutiveChain[I]; + I = ConsecutiveChain[I].first; } + assert(!Operands.empty() && "Expected non-empty list of stores."); unsigned MaxVecRegSize = R.getMaxVecRegSize(); unsigned EltSize = R.getVectorElementSize(Operands[0]); diff --git a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll index 267cf1a02c29..e28362894910 100644 --- a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll +++ b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll @@ -1,7 +1,7 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+avx -S | FileCheck %s --check-prefix=AVX -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+avx -S | FileCheck %s --check-prefix=AVX +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX %class.1 = type { %class.2 } %class.2 = type { %"class.3" } @@ -117,13 +117,10 @@ define void @pr35497() local_unnamed_addr #0 { ; AVX-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds [0 x i64], [0 x i64]* undef, i64 0, i64 0 ; AVX-NEXT: [[TMP10:%.*]] = bitcast i64* [[ARRAYIDX2_6]] to <2 x i64>* ; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1 -; AVX-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP4]], i32 0 -; AVX-NEXT: [[TMP12:%.*]] = insertelement <2 x i64> poison, i64 [[TMP11]], i32 0 -; AVX-NEXT: [[TMP13:%.*]] = insertelement <2 x i64> [[TMP12]], i64 [[TMP5]], i32 1 -; AVX-NEXT: [[TMP14:%.*]] = lshr <2 x i64> [[TMP13]], <i64 6, i64 6> -; AVX-NEXT: [[TMP15:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP14]] -; AVX-NEXT: [[TMP16:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x i64>* -; AVX-NEXT: store <2 x i64> [[TMP15]], <2 x i64>* [[TMP16]], align 1 +; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6> +; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]] +; AVX-NEXT: [[TMP13:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x i64>* +; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1 ; AVX-NEXT: ret void ; entry: </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2_LTO - Build # 24 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO Culprit: <cut> commit 643ce61fb3c2c730b7ecead4a489eaeef3f053ea Author: Akira Hatanaka <ahatanaka(a)apple.com> Date: Wed Aug 11 12:55:28 2021 -0700 [ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the release call findSafeStoreForStoreStrongContraction checks whether it's safe to move the release call to the store by inspecting all instructions between the two, but was ignoring retain instructions. This was causing objects to be released and deallocated before they were retained. rdar://81668577 </cut> Results regressed to (for first_bad == 643ce61fb3c2c730b7ecead4a489eaeef3f053ea) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-643ce61fb3c2c730b7ecead4a489eaeef3f053ea/results_id: 1 # 433.milc,milc_base.default regressed by 103 from (for last_good == 767496d19cb9a1fbba57ff08095faa161998ee36) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-767496d19cb9a1fbba57ff08095faa161998ee36/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/4172 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/4171 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-643ce61fb3c2c730b7ecead4a489eaeef3f053ea cd investigate-llvm-643ce61fb3c2c730b7ecead4a489eaeef3f053ea git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 643ce61fb3c2c730b7ecead4a489eaeef3f053ea ../artifacts/test.sh # Reproduce last_good build git checkout --detach 767496d19cb9a1fbba57ff08095faa161998ee36 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 643ce61fb3c2c730b7ecead4a489eaeef3f053ea Author: Akira Hatanaka <ahatanaka(a)apple.com> Date: Wed Aug 11 12:55:28 2021 -0700 [ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the release call findSafeStoreForStoreStrongContraction checks whether it's safe to move the release call to the store by inspecting all instructions between the two, but was ignoring retain instructions. This was causing objects to be released and deallocated before they were retained. rdar://81668577 --- llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp | 21 ++++++++++++--------- .../test/Transforms/ObjCARC/contract-storestrong.ll | 19 +++++++++++++++++++ 2 files changed, 31 insertions(+), 9 deletions(-) diff --git a/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp b/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp index 62161b5b6b40..577973c80601 100644 --- a/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp +++ b/llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp @@ -226,13 +226,6 @@ static StoreInst *findSafeStoreForStoreStrongContraction(LoadInst *Load, // of Inst. ARCInstKind Class = GetBasicARCInstKind(Inst); - // If Inst is an unrelated retain, we don't care about it. - // - // TODO: This is one area where the optimization could be made more - // aggressive. - if (IsRetain(Class)) - continue; - // If we have seen the store, but not the release... if (Store) { // We need to make sure that it is safe to move the release from its @@ -248,8 +241,18 @@ static StoreInst *findSafeStoreForStoreStrongContraction(LoadInst *Load, return nullptr; } - // Ok, now we know we have not seen a store yet. See if Inst can write to - // our load location, if it can not, just ignore the instruction. + // Ok, now we know we have not seen a store yet. + + // If Inst is a retain, we don't care about it as it doesn't prevent moving + // the load to the store. + // + // TODO: This is one area where the optimization could be made more + // aggressive. + if (IsRetain(Class)) + continue; + + // See if Inst can write to our load location, if it can not, just ignore + // the instruction. if (!isModSet(AA->getModRefInfo(Inst, Loc))) continue; diff --git a/llvm/test/Transforms/ObjCARC/contract-storestrong.ll b/llvm/test/Transforms/ObjCARC/contract-storestrong.ll index eff0a6fdf900..9c45e3334d83 100644 --- a/llvm/test/Transforms/ObjCARC/contract-storestrong.ll +++ b/llvm/test/Transforms/ObjCARC/contract-storestrong.ll @@ -256,6 +256,25 @@ define i8* @test13(i8* %a0, i8* %a1, i8** %addr, i8* %new) { ret i8* %retained } +; Cannot form a storeStrong call because it's unsafe to move the release call to +; the store. + +; CHECK-LABEL: define void @test14( +; CHECK: %[[V0:.*]] = load i8*, i8** %a +; CHECK: %[[V1:.*]] = call i8* @llvm.objc.retain(i8* %p) +; CHECK: store i8* %[[V1]], i8** %a +; CHECK: %[[V2:.*]] = call i8* @llvm.objc.retain(i8* %[[V0]]) +; CHECK: call void @llvm.objc.release(i8* %[[V2]]) + +define void @test14(i8** %a, i8* %p) { + %v0 = load i8*, i8** %a, align 8 + %v1 = call i8* @llvm.objc.retain(i8* %p) + store i8* %p, i8** %a, align 8 + %v2 = call i8* @llvm.objc.retain(i8* %v0) + call void @llvm.objc.release(i8* %v0) + ret void +} + !0 = !{} ; CHECK: attributes [[NUW]] = { nounwind } </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 11 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit 176379e0c8f9dbde2b357fb3b6a6802b83282e71 Author: Alex Zinenko <zinenko(a)google.com> Date: Fri Feb 12 12:53:27 2021 +0100 [mlir] Use the interface-based translation for LLVM "intrinsic" dialects Port the translation of five dialects that define LLVM IR intrinsics (LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect interface-based mechanism. This allows us to remove individual translations that were created for each of these dialects and just use one common MLIR-to-LLVM-IR translation that potentially supports all dialects instead, based on what is registered and including any combination of translatable dialects. This removal was one of the main goals of the refactoring. To support the addition of GPU-related metadata, the translation interface is extended with the `amendOperation` function that allows the interface implementation to post-process any translated operation with dialect attributes from the dialect for which the interface is implemented regardless of the operation's dialect. This is currently applied to "kernel" functions, but can be used to construct other metadata in dialect-specific ways without necessarily affecting operations. Depends On D96591, D96504 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D96592 </cut> Results regressed to (for first_bad == 176379e0c8f9dbde2b357fb3b6a6802b83282e71) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-176379e0c8f9dbde2b357fb3b6a6802b83282e71/results_id: 1 # 482.sphinx3,sphinx_livepretend_base.default regressed by 109 from (for last_good == 2d728bbff5c688284b8b8306ecfd3000b0ab8bb1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-2d728bbff5c688284b8b8306ecfd3000b0ab8bb1/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4128 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/4125 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-176379e0c8f9dbde2b357fb3b6a6802b83282e71 cd investigate-llvm-176379e0c8f9dbde2b357fb3b6a6802b83282e71 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 176379e0c8f9dbde2b357fb3b6a6802b83282e71 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 2d728bbff5c688284b8b8306ecfd3000b0ab8bb1 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 176379e0c8f9dbde2b357fb3b6a6802b83282e71 Author: Alex Zinenko <zinenko(a)google.com> Date: Fri Feb 12 12:53:27 2021 +0100 [mlir] Use the interface-based translation for LLVM "intrinsic" dialects Port the translation of five dialects that define LLVM IR intrinsics (LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect interface-based mechanism. This allows us to remove individual translations that were created for each of these dialects and just use one common MLIR-to-LLVM-IR translation that potentially supports all dialects instead, based on what is registered and including any combination of translatable dialects. This removal was one of the main goals of the refactoring. To support the addition of GPU-related metadata, the translation interface is extended with the `amendOperation` function that allows the interface implementation to post-process any translated operation with dialect attributes from the dialect for which the interface is implemented regardless of the operation's dialect. This is currently applied to "kernel" functions, but can be used to construct other metadata in dialect-specific ways without necessarily affecting operations. Depends On D96591, D96504 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D96592 --- .../test/Standalone/standalone-translate.mlir | 3 - mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td | 13 ++- mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | 3 +- mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | 2 +- mlir/include/mlir/InitAllTranslations.h | 10 -- mlir/include/mlir/Target/LLVMIR.h | 4 +- .../LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h | 37 ++++++ .../LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h | 37 ++++++ .../LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h | 37 ++++++ .../Dialect/LLVMIR/LLVMToLLVMIRTranslation.h | 4 +- .../LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h | 42 +++++++ .../Dialect/OpenMP/OpenMPToLLVMIRTranslation.h | 4 +- .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.h | 42 +++++++ .../mlir/Target/LLVMIR/LLVMTranslationInterface.h | 28 ++++- .../include/mlir/Target/LLVMIR/ModuleTranslation.h | 32 ++++-- mlir/include/mlir/Target/NVVMIR.h | 39 ------- mlir/include/mlir/Target/ROCDLIR.h | 40 ------- mlir/lib/Target/CMakeLists.txt | 107 +---------------- mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp | 35 +++++- mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp | 124 -------------------- mlir/lib/Target/LLVMIR/ConvertToROCDLIR.cpp | 127 --------------------- mlir/lib/Target/LLVMIR/Dialect/CMakeLists.txt | 5 + .../LLVMIR/Dialect/LLVMAVX512/CMakeLists.txt | 16 +++ .../LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.cpp | 33 ++++++ .../LLVMIR/Dialect/LLVMArmNeon/CMakeLists.txt | 16 +++ .../LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.cpp | 33 ++++++ .../LLVMIR/Dialect/LLVMArmSVE/CMakeLists.txt | 16 +++ .../LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.cpp | 33 ++++++ .../Dialect/LLVMIR/LLVMToLLVMIRTranslation.cpp | 71 ++++-------- mlir/lib/Target/LLVMIR/Dialect/NVVM/CMakeLists.txt | 16 +++ .../Dialect/NVVM/NVVMToLLVMIRTranslation.cpp | 67 +++++++++++ .../lib/Target/LLVMIR/Dialect/ROCDL/CMakeLists.txt | 16 +++ .../Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp | 69 +++++++++++ mlir/lib/Target/LLVMIR/LLVMAVX512Intr.cpp | 65 ----------- mlir/lib/Target/LLVMIR/LLVMArmNeonIntr.cpp | 65 ----------- mlir/lib/Target/LLVMIR/LLVMArmSVEIntr.cpp | 65 ----------- mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | 65 +++++++---- mlir/test/Target/arm-neon.mlir | 2 +- mlir/test/Target/arm-sve.mlir | 2 +- mlir/test/Target/avx512.mlir | 2 +- mlir/test/Target/nvvmir.mlir | 2 +- mlir/test/Target/rocdl.mlir | 2 +- mlir/test/lib/Transforms/CMakeLists.txt | 13 ++- .../lib/Transforms/TestConvertGPUKernelToCubin.cpp | 25 +++- .../lib/Transforms/TestConvertGPUKernelToHsaco.cpp | 22 +++- mlir/tools/mlir-cuda-runner/mlir-cuda-runner.cpp | 6 +- mlir/tools/mlir-rocm-runner/mlir-rocm-runner.cpp | 3 + mlir/tools/mlir-tblgen/LLVMIRConversionGen.cpp | 10 +- 48 files changed, 750 insertions(+), 760 deletions(-) diff --git a/mlir/examples/standalone/test/Standalone/standalone-translate.mlir b/mlir/examples/standalone/test/Standalone/standalone-translate.mlir index 2a096c38e128..16d49785ee16 100644 --- a/mlir/examples/standalone/test/Standalone/standalone-translate.mlir +++ b/mlir/examples/standalone/test/Standalone/standalone-translate.mlir @@ -1,8 +1,5 @@ // RUN: standalone-translate --help | FileCheck %s -// CHECK: --avx512-mlir-to-llvmir // CHECK: --deserialize-spirv // CHECK: --import-llvm // CHECK: --mlir-to-llvmir -// CHECK: --mlir-to-nvvmir -// CHECK: --mlir-to-rocdlir // CHECK: --serialize-spirv diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td index ad886e55b4e6..541f7ebfadfa 100644 --- a/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td +++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td @@ -219,12 +219,13 @@ class ListIntSubst<string pattern, list<int> values> { // or result in the operation. def LLVM_IntrPatterns { string operand = - [{convertType(opInst.getOperand($0).getType())}]; + [{moduleTranslation.convertType(opInst.getOperand($0).getType())}]; string result = - [{convertType(opInst.getResult($0).getType())}]; + [{moduleTranslation.convertType(opInst.getResult($0).getType())}]; string structResult = - [{convertType(opInst.getResult(0).getType().cast<LLVM::LLVMStructType>() - .getBody()[$0])}]; + [{moduleTranslation.convertType( + opInst.getResult(0).getType().cast<LLVM::LLVMStructType>() + .getBody()[$0])}]; } @@ -259,7 +260,7 @@ class LLVM_IntrOpBase<Dialect dialect, string opName, string enumName, ListIntSubst<LLVM_IntrPatterns.operand, overloadedOperands>.lst), ", ") # [{ }); - auto operands = lookupValues(opInst.getOperands()); + auto operands = moduleTranslation.lookupValues(opInst.getOperands()); }] # !if(!gt(numResults, 0), "$res = ", "") # [{builder.CreateCall(fn, operands); }]; @@ -325,7 +326,7 @@ class LLVM_VectorReductionAcc<string mnem> { }] # !interleave(ListIntSubst<LLVM_IntrPatterns.operand, [1]>.lst, ", ") # [{ }); - auto operands = lookupValues(opInst.getOperands()); + auto operands = moduleTranslation.lookupValues(opInst.getOperands()); llvm::FastMathFlags origFM = builder.getFastMathFlags(); llvm::FastMathFlags tempFM = origFM; tempFM.setAllowReassoc($reassoc); diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td index 7a2152b9a481..7dca08a43f7a 100644 --- a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td +++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td @@ -1083,7 +1083,8 @@ def LLVM_UndefOp : LLVM_Op<"mlir.undef", [NoSideEffect]>, def LLVM_ConstantOp : LLVM_Op<"mlir.constant", [NoSideEffect]>, - LLVM_Builder<"$res = getLLVMConstant($_resultType, $value, $_location);"> + LLVM_Builder<[{$res = getLLVMConstant($_resultType, $value, $_location, + moduleTranslation);}]> { let summary = "Defines a constant of LLVM type."; let description = [{ diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td index d5eec3cebb4a..cfb08ff465a2 100644 --- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td +++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td @@ -175,7 +175,7 @@ def ROCDL_MubufStoreOp : LLVM_Type:$glc, LLVM_Type:$slc)>{ string llvmBuilder = [{ - auto vdataType = convertType(op.vdata().getType()); + auto vdataType = moduleTranslation.convertType(op.vdata().getType()); createIntrinsicCall(builder, llvm::Intrinsic::amdgcn_buffer_store, {$vdata, $rsrc, $vindex, $offset, $glc, $slc}, {vdataType}); diff --git a/mlir/include/mlir/InitAllTranslations.h b/mlir/include/mlir/InitAllTranslations.h index 16dd113d14cd..fc319c09a8c8 100644 --- a/mlir/include/mlir/InitAllTranslations.h +++ b/mlir/include/mlir/InitAllTranslations.h @@ -20,11 +20,6 @@ void registerFromLLVMIRTranslation(); void registerFromSPIRVTranslation(); void registerToLLVMIRTranslation(); void registerToSPIRVTranslation(); -void registerToNVVMIRTranslation(); -void registerToROCDLIRTranslation(); -void registerArmNeonToLLVMIRTranslation(); -void registerAVX512ToLLVMIRTranslation(); -void registerArmSVEToLLVMIRTranslation(); // This function should be called before creating any MLIRContext if one // expects all the possible translations to be made available to the context @@ -35,11 +30,6 @@ inline void registerAllTranslations() { registerFromSPIRVTranslation(); registerToLLVMIRTranslation(); registerToSPIRVTranslation(); - registerToNVVMIRTranslation(); - registerToROCDLIRTranslation(); - registerArmNeonToLLVMIRTranslation(); - registerAVX512ToLLVMIRTranslation(); - registerArmSVEToLLVMIRTranslation(); return true; }(); (void)initOnce; diff --git a/mlir/include/mlir/Target/LLVMIR.h b/mlir/include/mlir/Target/LLVMIR.h index 2050c63df73f..10bec79f3506 100644 --- a/mlir/include/mlir/Target/LLVMIR.h +++ b/mlir/include/mlir/Target/LLVMIR.h @@ -28,14 +28,14 @@ namespace mlir { class DialectRegistry; class OwningModuleRef; class MLIRContext; -class ModuleOp; +class Operation; /// Convert the given MLIR module into LLVM IR. The LLVM context is extracted /// from the registered LLVM IR dialect. In case of error, report it /// to the error handler registered with the MLIR context, if any (obtained from /// the MLIR module), and return `nullptr`. std::unique_ptr<llvm::Module> -translateModuleToLLVMIR(ModuleOp m, llvm::LLVMContext &llvmContext, +translateModuleToLLVMIR(Operation *op, llvm::LLVMContext &llvmContext, StringRef name = "LLVMDialectModule"); /// Convert the given LLVM module into MLIR's LLVM dialect. The LLVM context is diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h new file mode 100644 index 000000000000..e591a95f0357 --- /dev/null +++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h @@ -0,0 +1,37 @@ +//===- LLVMAVX512ToLLVMIRTranslation.h - LLVMAVX512 to LLVM IR --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements the dialect interface for translating the LLVMAVX512 +// dialect to LLVM IR. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_TARGET_LLVMIR_DIALECT_LLVMAVX512_LLVMAVX512TOLLVMIRTRANSLATION_H +#define MLIR_TARGET_LLVMIR_DIALECT_LLVMAVX512_LLVMAVX512TOLLVMIRTRANSLATION_H + +#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h" + +namespace mlir { + +/// Implementation of the dialect interface that converts operations belonging +/// to the LLVMAVX512 dialect to LLVM IR. +class LLVMAVX512DialectLLVMIRTranslationInterface + : public LLVMTranslationDialectInterface { +public: + using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface; + + /// Translates the given operation to LLVM IR using the provided IR builder + /// and saving the state in `moduleTranslation`. + LogicalResult + convertOperation(Operation *op, llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation) const final; +}; + +} // namespace mlir + +#endif // MLIR_TARGET_LLVMIR_DIALECT_LLVMAVX512_LLVMAVX512TOLLVMIRTRANSLATION_H diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h new file mode 100644 index 000000000000..7d268d155083 --- /dev/null +++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h @@ -0,0 +1,37 @@ +//===- LLVMArmNeonToLLVMIRTranslation.h - LLVMArmNeon to LLVMIR -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements the dialect interface for translating the LLVMArmNeon +// dialect to LLVM IR. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_TARGET_LLVMIR_DIALECT_LLVMARMNEON_LLVMARMNEONTOLLVMIRTRANSLATION_H +#define MLIR_TARGET_LLVMIR_DIALECT_LLVMARMNEON_LLVMARMNEONTOLLVMIRTRANSLATION_H + +#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h" + +namespace mlir { + +/// Implementation of the dialect interface that converts operations belonging +/// to the LLVMArmNeon dialect to LLVM IR. +class LLVMArmNeonDialectLLVMIRTranslationInterface + : public LLVMTranslationDialectInterface { +public: + using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface; + + /// Translates the given operation to LLVM IR using the provided IR builder + /// and saving the state in `moduleTranslation`. + LogicalResult + convertOperation(Operation *op, llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation) const final; +}; + +} // namespace mlir + +#endif // MLIR_TARGET_LLVMIR_DIALECT_LLVMARMNEON_LLVMARMNEONTOLLVMIRTRANSLATION_H diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h new file mode 100644 index 000000000000..9d4d05b9b9bd --- /dev/null +++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h @@ -0,0 +1,37 @@ +//===- LLVMArmSVEToLLVMIRTranslation.h - LLVMArmSVE to LLVM IR --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements the dialect interface for translating the LLVMArmSVE +// dialect to LLVM IR. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_TARGET_LLVMIR_DIALECT_LLVMARMSVE_LLVMARMSVETOLLVMIRTRANSLATION_H +#define MLIR_TARGET_LLVMIR_DIALECT_LLVMARMSVE_LLVMARMSVETOLLVMIRTRANSLATION_H + +#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h" + +namespace mlir { + +/// Implementation of the dialect interface that converts operations belonging +/// to the LLVMArmSVE dialect to LLVM IR. +class LLVMArmSVEDialectLLVMIRTranslationInterface + : public LLVMTranslationDialectInterface { +public: + using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface; + + /// Translates the given operation to LLVM IR using the provided IR builder + /// and saving the state in `moduleTranslation`. + LogicalResult + convertOperation(Operation *op, llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation) const final; +}; + +} // namespace mlir + +#endif // MLIR_TARGET_LLVMIR_DIALECT_LLVMARMSVE_LLVMARMSVETOLLVMIRTRANSLATION_H diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h index 8b72cedf1ff2..2af76e092917 100644 --- a/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h +++ b/mlir/include/mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h @@ -18,8 +18,8 @@ namespace mlir { -/// Implementation of the dialect interface that converts operations beloning to -/// the LLVM dialect to LLVM IR. +/// Implementation of the dialect interface that converts operations belonging +/// to the LLVM dialect to LLVM IR. class LLVMDialectLLVMIRTranslationInterface : public LLVMTranslationDialectInterface { public: diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h new file mode 100644 index 000000000000..3a8a01df84b0 --- /dev/null +++ b/mlir/include/mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h @@ -0,0 +1,42 @@ +//===- NVVMToLLVMIRTranslation.h - NVVM to LLVM IR --------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements the dialect interface for translating the NVVM +// dialect to LLVM IR. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_TARGET_LLVMIR_DIALECT_NVVM_NVVMTOLLVMIRTRANSLATION_H +#define MLIR_TARGET_LLVMIR_DIALECT_NVVM_NVVMTOLLVMIRTRANSLATION_H + +#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h" + +namespace mlir { + +/// Implementation of the dialect interface that converts operations belonging +/// to the NVVM dialect to LLVM IR. +class NVVMDialectLLVMIRTranslationInterface + : public LLVMTranslationDialectInterface { +public: + using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface; + + /// Translates the given operation to LLVM IR using the provided IR builder + /// and saving the state in `moduleTranslation`. + LogicalResult + convertOperation(Operation *op, llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation) const final; + + /// Attaches module-level metadata for functions marked as kernels. + LogicalResult + amendOperation(Operation *op, NamedAttribute attribute, + LLVM::ModuleTranslation &moduleTranslation) const final; +}; + +} // namespace mlir + +#endif // MLIR_TARGET_LLVMIR_DIALECT_NVVM_NVVMTOLLVMIRTRANSLATION_H diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h index 7d9eeea9462e..07721d089689 100644 --- a/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h +++ b/mlir/include/mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h @@ -18,8 +18,8 @@ namespace mlir { -/// Implementation of the dialect interface that converts operations beloning to -/// the OpenMP dialect to LLVM IR. +/// Implementation of the dialect interface that converts operations belonging +/// to the OpenMP dialect to LLVM IR. class OpenMPDialectLLVMIRTranslationInterface : public LLVMTranslationDialectInterface { public: diff --git a/mlir/include/mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h b/mlir/include/mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h new file mode 100644 index 000000000000..e2211a59098f --- /dev/null +++ b/mlir/include/mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h @@ -0,0 +1,42 @@ +//===- ROCDLToLLVMIRTranslation.h - ROCDL to LLVM IR ------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file implements the dialect interface for translating the ROCDL +// dialect to LLVM IR. +// +//===----------------------------------------------------------------------===// + +#ifndef MLIR_TARGET_LLVMIR_DIALECT_ROCDL_ROCDLTOLLVMIRTRANSLATION_H +#define MLIR_TARGET_LLVMIR_DIALECT_ROCDL_ROCDLTOLLVMIRTRANSLATION_H + +#include "mlir/Target/LLVMIR/LLVMTranslationInterface.h" + +namespace mlir { + +/// Implementation of the dialect interface that converts operations belonging +/// to the ROCDL dialect to LLVM IR. +class ROCDLDialectLLVMIRTranslationInterface + : public LLVMTranslationDialectInterface { +public: + using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface; + + /// Translates the given operation to LLVM IR using the provided IR builder + /// and saving the state in `moduleTranslation`. + LogicalResult + convertOperation(Operation *op, llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation) const final; + + /// Attaches module-level metadata for functions marked as kernels. + LogicalResult + amendOperation(Operation *op, NamedAttribute attribute, + LLVM::ModuleTranslation &moduleTranslation) const final; +}; + +} // namespace mlir + +#endif // MLIR_TARGET_LLVMIR_DIALECT_ROCDL_ROCDLTOLLVMIRTRANSLATION_H diff --git a/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h b/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h index 0063beac2977..0c563e6e7d39 100644 --- a/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h +++ b/mlir/include/mlir/Target/LLVMIR/LLVMTranslationInterface.h @@ -13,12 +13,14 @@ #ifndef MLIR_TARGET_LLVMIR_LLVMTRANSLATIONINTERFACE_H #define MLIR_TARGET_LLVMIR_LLVMTRANSLATIONINTERFACE_H +#include "mlir/IR/Attributes.h" #include "mlir/IR/DialectInterface.h" +#include "mlir/IR/Identifier.h" #include "mlir/Support/LogicalResult.h" namespace llvm { class IRBuilderBase; -} +} // namespace llvm namespace mlir { namespace LLVM { @@ -43,6 +45,18 @@ public: LLVM::ModuleTranslation &moduleTranslation) const { return failure(); } + + /// Hook for derived dialect interface to act on an operation that has dialect + /// attributes from the derived dialect (the operation itself may be from a + /// different dialect). This gets called after the operation has been + /// translated. The hook is expected to use moduleTranslation to look up the + /// translation results and amend the corresponding IR constructs. Does + /// nothing and succeeds by default. + virtual LogicalResult + amendOperation(Operation *op, NamedAttribute attribute, + LLVM::ModuleTranslation &moduleTranslation) const { + return success(); + } }; /// Interface collection for translation to LLVM IR, dispatches to a concrete @@ -61,6 +75,18 @@ public: return iface->convertOperation(op, builder, moduleTranslation); return failure(); } + + /// Acts on the given operation using the interface implemented by the dialect + /// of one of the operation's dialect attributes. + virtual LogicalResult + amendOperation(Operation *op, NamedAttribute attribute, + LLVM::ModuleTranslation &moduleTranslation) const { + if (const LLVMTranslationDialectInterface *iface = + getInterfaceFor(attribute.first.getDialect())) { + return iface->amendOperation(op, attribute, moduleTranslation); + } + return success(); + } }; } // namespace mlir diff --git a/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h b/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h index 03b7f5336461..004524f33fa4 100644 --- a/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h +++ b/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h @@ -142,18 +142,11 @@ public: /// Looks up remapped a list of remapped values. SmallVector<llvm::Value *, 8> lookupValues(ValueRange values); - /// Create an LLVM IR constant of `llvmType` from the MLIR attribute `attr`. - /// This currently supports integer, floating point, splat and dense element - /// attributes and combinations thereof. In case of error, report it to `loc` - /// and return nullptr. - llvm::Constant *getLLVMConstant(llvm::Type *llvmType, Attribute attr, - Location loc); - /// Returns the MLIR context of the module being translated. MLIRContext &getContext() { return *mlirModule->getContext(); } /// Returns the LLVM context in which the IR is being constructed. - llvm::LLVMContext &getLLVMContext() { return llvmModule->getContext(); } + llvm::LLVMContext &getLLVMContext() const { return llvmModule->getContext(); } /// Finds an LLVM IR global value that corresponds to the given MLIR operation /// defining a global value. @@ -184,6 +177,10 @@ public: LogicalResult convertBlock(Block &bb, bool ignoreArguments, llvm::IRBuilder<> &builder); + /// Gets the named metadata in the LLVM IR module being constructed, creating + /// it if it does not exist. + llvm::NamedMDNode *getOrInsertNamedModuleMetadata(StringRef name); + protected: /// Translate the given MLIR module expressed in MLIR LLVM IR dialect into an /// LLVM IR module. The MLIR LLVM IR dialect holds a pointer to an @@ -208,6 +205,9 @@ private: LogicalResult convertGlobals(); LogicalResult convertOneFunction(LLVMFuncOp func); + /// Translates dialect attributes attached to the given operation. + LogicalResult convertDialectAttributes(Operation *op); + /// Original and translated module. Operation *mlirModule; std::unique_ptr<llvm::Module> llvmModule; @@ -228,6 +228,8 @@ private: /// A stateful object used to translate types. TypeToLLVMIRTranslator typeTranslator; + /// A dialect interface collection used for dispatching the translation to + /// specific dialects. LLVMTranslationInterface iface; /// Mappings between original and translated values, used for lookups. @@ -249,6 +251,20 @@ void connectPHINodes(Region &region, const ModuleTranslation &state); /// Get a topologically sorted list of blocks of the given region. llvm::SetVector<Block *> getTopologicallySortedBlocks(Region &region); + +/// Create an LLVM IR constant of `llvmType` from the MLIR attribute `attr`. +/// This currently supports integer, floating point, splat and dense element +/// attributes and combinations thereof. In case of error, report it to `loc` +/// and return nullptr. +llvm::Constant *getLLVMConstant(llvm::Type *llvmType, Attribute attr, + Location loc, + const ModuleTranslation &moduleTranslation); + +/// Creates a call to an LLVM IR intrinsic function with the given arguments. +llvm::Value *createIntrinsicCall(llvm::IRBuilderBase &builder, + llvm::Intrinsic::ID intrinsic, + ArrayRef<llvm::Value *> args = {}, + ArrayRef<llvm::Type *> tys = {}); } // namespace detail } // namespace LLVM diff --git a/mlir/include/mlir/Target/NVVMIR.h b/mlir/include/mlir/Target/NVVMIR.h deleted file mode 100644 index 0cd7688e275b..000000000000 --- a/mlir/include/mlir/Target/NVVMIR.h +++ /dev/null @@ -1,39 +0,0 @@ -//===- NVVMIR.h - MLIR to LLVM + NVVM IR conversion -------------*- C++ -*-===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This file declares the entry point for the MLIR to LLVM + NVVM IR conversion. -// -//===----------------------------------------------------------------------===// - -#ifndef MLIR_TARGET_NVVMIR_H -#define MLIR_TARGET_NVVMIR_H - -#include "llvm/ADT/StringRef.h" -#include <memory> - -// Forward-declare LLVM classes. -namespace llvm { -class LLVMContext; -class Module; -} // namespace llvm - -namespace mlir { -class Operation; - -/// Convert the given LLVM-module-like operation into NVVM IR. This conversion -/// requires the registration of the LLVM IR dialect and will extract the LLVM -/// context from the registered LLVM IR dialect. In case of error, report it to -/// the error handler registered with the MLIR context, if any (obtained from -/// the MLIR module), and return `nullptr`. -std::unique_ptr<llvm::Module> -translateModuleToNVVMIR(Operation *m, llvm::LLVMContext &llvmContext, - llvm::StringRef name = "LLVMDialectModule"); - -} // namespace mlir - -#endif // MLIR_TARGET_NVVMIR_H diff --git a/mlir/include/mlir/Target/ROCDLIR.h b/mlir/include/mlir/Target/ROCDLIR.h deleted file mode 100644 index e2cb812a173d..000000000000 --- a/mlir/include/mlir/Target/ROCDLIR.h +++ /dev/null @@ -1,40 +0,0 @@ -//===- ROCDLIR.h - MLIR to LLVM + ROCDL IR conversion -----------*- C++ -*-===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This file declares the entry point for the MLIR to LLVM + ROCDL IR -// conversion. -// -//===----------------------------------------------------------------------===// - -#ifndef MLIR_TARGET_ROCDLIR_H -#define MLIR_TARGET_ROCDLIR_H - -#include "llvm/ADT/StringRef.h" -#include <memory> - -// Forward-declare LLVM classes. -namespace llvm { -class LLVMContext; -class Module; -} // namespace llvm - -namespace mlir { -class Operation; - -/// Convert the given LLVM-module-like operation into ROCDL IR. This conversion -/// requires the registration of the LLVM IR dialect and will extract the LLVM -/// context from the registered LLVM IR dialect. In case of error, report it to -/// the error handler registered with the MLIR context, if any (obtained from -/// the MLIR module), and return `nullptr`. -std::unique_ptr<llvm::Module> -translateModuleToROCDLIR(Operation *m, llvm::LLVMContext &llvmContext, - llvm::StringRef name = "LLVMDialectModule"); - -} // namespace mlir - -#endif // MLIR_TARGET_ROCDLIR_H diff --git a/mlir/lib/Target/CMakeLists.txt b/mlir/lib/Target/CMakeLists.txt index e951ffade6aa..a23222d37ede 100644 --- a/mlir/lib/Target/CMakeLists.txt +++ b/mlir/lib/Target/CMakeLists.txt @@ -24,26 +24,6 @@ add_mlir_translation_library(MLIRTargetLLVMIRModuleTranslation MLIRTranslation ) -add_mlir_translation_library(MLIRTargetAVX512 - LLVMIR/LLVMAVX512Intr.cpp - - ADDITIONAL_HEADER_DIRS - ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR - - DEPENDS - MLIRLLVMAVX512ConversionsIncGen - - LINK_COMPONENTS - Core - - LINK_LIBS PUBLIC - MLIRIR - MLIRLLVMAVX512 - MLIRLLVMIR - MLIRTargetLLVMIR - MLIRTargetLLVMIRModuleTranslation - ) - add_mlir_translation_library(MLIRTargetLLVMIR LLVMIR/ConvertFromLLVMIR.cpp LLVMIR/ConvertToLLVMIR.cpp @@ -56,89 +36,12 @@ add_mlir_translation_library(MLIRTargetLLVMIR IRReader LINK_LIBS PUBLIC + MLIRLLVMArmNeonToLLVMIRTranslation + MLIRLLVMArmSVEToLLVMIRTranslation + MLIRLLVMAVX512ToLLVMIRTranslation MLIRLLVMToLLVMIRTranslation + MLIRNVVMToLLVMIRTranslation MLIROpenMPToLLVMIRTranslation - MLIRTargetLLVMIRModuleTranslation - ) - -add_mlir_translation_library(MLIRTargetArmNeon - LLVMIR/LLVMArmNeonIntr.cpp - - ADDITIONAL_HEADER_DIRS - ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR - - DEPENDS - MLIRLLVMArmNeonConversionsIncGen - - LINK_COMPONENTS - Core - - LINK_LIBS PUBLIC - MLIRIR - MLIRLLVMArmNeon - MLIRLLVMIR - MLIRTargetLLVMIR - MLIRTargetLLVMIRModuleTranslation - ) - -add_mlir_translation_library(MLIRTargetArmSVE - LLVMIR/LLVMArmSVEIntr.cpp - - ADDITIONAL_HEADER_DIRS - ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR - - DEPENDS - MLIRLLVMArmSVEConversionsIncGen - - LINK_COMPONENTS - Core - - LINK_LIBS PUBLIC - MLIRIR - MLIRLLVMArmSVE - MLIRLLVMIR - MLIRTargetLLVMIR - MLIRTargetLLVMIRModuleTranslation - ) - -add_mlir_translation_library(MLIRTargetNVVMIR - LLVMIR/ConvertToNVVMIR.cpp - - ADDITIONAL_HEADER_DIRS - ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR - - DEPENDS - intrinsics_gen - - LINK_COMPONENTS - Core - - LINK_LIBS PUBLIC - MLIRGPU - MLIRIR - MLIRLLVMIR - MLIRNVVMIR - MLIRTargetLLVMIR - MLIRTargetLLVMIRModuleTranslation - ) - -add_mlir_translation_library(MLIRTargetROCDLIR - LLVMIR/ConvertToROCDLIR.cpp - - ADDITIONAL_HEADER_DIRS - ${MLIR_MAIN_INCLUDE_DIR}/mlir/Target/LLVMIR - - DEPENDS - intrinsics_gen - - LINK_COMPONENTS - Core - - LINK_LIBS PUBLIC - MLIRGPU - MLIRIR - MLIRLLVMIR - MLIRROCDLIR - MLIRTargetLLVMIR + MLIRROCDLToLLVMIRTranslation MLIRTargetLLVMIRModuleTranslation ) diff --git a/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp b/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp index 6b30748cc79b..42391513bacf 100644 --- a/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp +++ b/mlir/lib/Target/LLVMIR/ConvertToLLVMIR.cpp @@ -12,9 +12,19 @@ #include "mlir/Target/LLVMIR.h" +#include "mlir/Dialect/LLVMIR/LLVMAVX512Dialect.h" +#include "mlir/Dialect/LLVMIR/LLVMArmNeonDialect.h" +#include "mlir/Dialect/LLVMIR/LLVMArmSVEDialect.h" +#include "mlir/Dialect/LLVMIR/NVVMDialect.h" +#include "mlir/Dialect/LLVMIR/ROCDLDialect.h" #include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/Target/LLVMIR/Dialect/LLVMAVX512/LLVMAVX512ToLLVMIRTranslation.h" +#include "mlir/Target/LLVMIR/Dialect/LLVMArmNeon/LLVMArmNeonToLLVMIRTranslation.h" +#include "mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h" #include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h" +#include "mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h" #include "mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h" +#include "mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h" #include "mlir/Target/LLVMIR/ModuleTranslation.h" #include "mlir/Translation.h" @@ -26,14 +36,14 @@ using namespace mlir; std::unique_ptr<llvm::Module> -mlir::translateModuleToLLVMIR(ModuleOp m, llvm::LLVMContext &llvmContext, +mlir::translateModuleToLLVMIR(Operation *op, llvm::LLVMContext &llvmContext, StringRef name) { auto llvmModule = - LLVM::ModuleTranslation::translateModule<>(m, llvmContext, name); + LLVM::ModuleTranslation::translateModule<>(op, llvmContext, name); if (!llvmModule) - emitError(m.getLoc(), "Fail to convert MLIR to LLVM IR"); + emitError(op->getLoc(), "Fail to convert MLIR to LLVM IR"); else if (verifyModule(*llvmModule)) - emitError(m.getLoc(), "LLVM IR fails to verify"); + emitError(op->getLoc(), "LLVM IR fails to verify"); return llvmModule; } @@ -70,9 +80,24 @@ void registerToLLVMIRTranslation() { return success(); }, [](DialectRegistry &registry) { - registry.insert<omp::OpenMPDialect>(); + registry.insert<omp::OpenMPDialect, LLVM::LLVMAVX512Dialect, + LLVM::LLVMArmSVEDialect, LLVM::LLVMArmNeonDialect, + NVVM::NVVMDialect, ROCDL::ROCDLDialect>(); registry.addDialectInterface<omp::OpenMPDialect, OpenMPDialectLLVMIRTranslationInterface>(); + registry + .addDialectInterface<LLVM::LLVMAVX512Dialect, + LLVMAVX512DialectLLVMIRTranslationInterface>(); + registry.addDialectInterface< + LLVM::LLVMArmNeonDialect, + LLVMArmNeonDialectLLVMIRTranslationInterface>(); + registry + .addDialectInterface<LLVM::LLVMArmSVEDialect, + LLVMArmSVEDialectLLVMIRTranslationInterface>(); + registry.addDialectInterface<NVVM::NVVMDialect, + NVVMDialectLLVMIRTranslationInterface>(); + registry.addDialectInterface<ROCDL::ROCDLDialect, + ROCDLDialectLLVMIRTranslationInterface>(); registerLLVMDialectTranslation(registry); }); } diff --git a/mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp b/mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp deleted file mode 100644 index 7aee913a27d7..000000000000 --- a/mlir/lib/Target/LLVMIR/ConvertToNVVMIR.cpp +++ /dev/null @@ -1,124 +0,0 @@ -//===- ConvertToNVVMIR.cpp - MLIR to LLVM IR conversion -------------------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This file implements a translation between the MLIR LLVM + NVVM dialects and -// LLVM IR with NVVM intrinsics and metadata. -// -//===----------------------------------------------------------------------===// - -#include "mlir/Target/NVVMIR.h" - -#include "mlir/Dialect/GPU/GPUDialect.h" -#include "mlir/Dialect/LLVMIR/LLVMDialect.h" -#include "mlir/Dialect/LLVMIR/NVVMDialect.h" -#include "mlir/IR/BuiltinOps.h" -#include "mlir/Target/LLVMIR.h" -#include "mlir/Target/LLVMIR/ModuleTranslation.h" -#include "mlir/Translation.h" - -#include "llvm/ADT/StringRef.h" -#include "llvm/IR/IntrinsicsNVPTX.h" -#include "llvm/IR/Module.h" -#include "llvm/Support/ToolOutputFile.h" - -using namespace mlir; - -static llvm::Value *createIntrinsicCall(llvm::IRBuilder<> &builder, - llvm::Intrinsic::ID intrinsic, - ArrayRef<llvm::Value *> args = {}) { - llvm::Module *module = builder.GetInsertBlock()->getModule(); - llvm::Function *fn = llvm::Intrinsic::getDeclaration(module, intrinsic); - return builder.CreateCall(fn, args); -} - -static llvm::Intrinsic::ID getShflBflyIntrinsicId(llvm::Type *resultType, - bool withPredicate) { - if (withPredicate) { - resultType = cast<llvm::StructType>(resultType)->getElementType(0); - return resultType->isFloatTy() ? llvm::Intrinsic::nvvm_shfl_sync_bfly_f32p - : llvm::Intrinsic::nvvm_shfl_sync_bfly_i32p; - } - return resultType->isFloatTy() ? llvm::Intrinsic::nvvm_shfl_sync_bfly_f32 - : llvm::Intrinsic::nvvm_shfl_sync_bfly_i32; -} - -namespace { -class ModuleTranslation : public LLVM::ModuleTranslation { -public: - using LLVM::ModuleTranslation::ModuleTranslation; - -protected: - LogicalResult convertOperation(Operation &opInst, - llvm::IRBuilder<> &builder) override { - -#include "mlir/Dialect/LLVMIR/NVVMConversions.inc" - - return LLVM::ModuleTranslation::convertOperation(opInst, builder); - } - - /// Allow access to the constructor. - friend LLVM::ModuleTranslation; -}; -} // namespace - -std::unique_ptr<llvm::Module> -mlir::translateModuleToNVVMIR(Operation *m, llvm::LLVMContext &llvmContext, - StringRef name) { - // Register the translation to LLVM IR if nobody else did before. This may - // happen if this translation is called inside a pass pipeline that converts - // GPU dialects to binary blobs without translating the rest of the code. - registerLLVMDialectTranslation(*m->getContext()); - - auto llvmModule = LLVM::ModuleTranslation::translateModule<ModuleTranslation>( - m, llvmContext, name); - if (!llvmModule) - return llvmModule; - - // Insert the nvvm.annotations kernel so that the NVVM backend recognizes the - // function as a kernel. - for (auto func : - ModuleTranslation::getModuleBody(m).getOps<LLVM::LLVMFuncOp>()) { - if (!func->getAttrOfType<UnitAttr>( - NVVM::NVVMDialect::getKernelFuncAttrName())) - continue; - - auto *llvmFunc = llvmModule->getFunction(func.getName()); - - llvm::Metadata *llvmMetadata[] = { - llvm::ValueAsMetadata::get(llvmFunc), - llvm::MDString::get(llvmModule->getContext(), "kernel"), - llvm::ValueAsMetadata::get(llvm::ConstantInt::get( - llvm::Type::getInt32Ty(llvmModule->getContext()), 1))}; - llvm::MDNode *llvmMetadataNode = - llvm::MDNode::get(llvmModule->getContext(), llvmMetadata); - llvmModule->getOrInsertNamedMetadata("nvvm.annotations") - ->addOperand(llvmMetadataNode); - } - - return llvmModule; -} - -namespace mlir { -void registerToNVVMIRTranslation() { - TranslateFromMLIRRegistration registration( - "mlir-to-nvvmir", - [](ModuleOp module, raw_ostream &output) { - llvm::LLVMContext llvmContext; - auto llvmModule = mlir::translateModuleToNVVMIR(module, llvmContext); - if (!llvmModule) </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz - Build # 8 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz Culprit: <cut> commit 844105d912a4337635001fc2077404aefb90c4c6 Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Date: Mon Aug 9 00:16:32 2021 +0000 Daily bump. </cut> Results regressed to (for first_bad == 844105d912a4337635001fc2077404aefb90c4c6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz artifacts/build-844105d912a4337635001fc2077404aefb90c4c6/results_id: 1 # 482.sphinx3,[.] OUTLINED_FUNCTION_4 regressed by 117 from (for last_good == 5f564fd013327977a4d04aa520237d17f641f01a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz artifacts/build-5f564fd013327977a4d04aa520237d17f641f01a/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz/4114 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz/4109 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-844105d912a4337635001fc2077404aefb90c4c6 cd investigate-gcc-844105d912a4337635001fc2077404aefb90c4c6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 844105d912a4337635001fc2077404aefb90c4c6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5f564fd013327977a4d04aa520237d17f641f01a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 844105d912a4337635001fc2077404aefb90c4c6 Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Date: Mon Aug 9 00:16:32 2021 +0000 Daily bump. --- gcc/ChangeLog | 4 ++++ gcc/DATESTAMP | 2 +- gcc/testsuite/ChangeLog | 4 ++++ libstdc++-v3/ChangeLog | 18 ++++++++++++++++++ 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3b0d1b06e9c..9d39f0f0064 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2021-08-08 Sergei Trofimovich <siarheit(a)google.com> + + * lra-constraints.c: Fix s/otput/output/ typo. + 2021-08-06 Martin Sebor <msebor(a)redhat.com> * builtins.c (expand_builtin_memchr): Move to gimple-ssa-warn-access.cc. diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index 3b0c65fa4e8..859da5af87c 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20210808 +20210809 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index fa9298d30df..034fc300a63 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2021-08-08 Jeff Law <jlaw(a)localhost.localdomain> + + * gcc.target/tic6x/rotdi16-scan.c: Pull rotate into its own function. + 2021-08-07 H.J. Lu <hjl.tools(a)gmail.com> PR tree-optimization/88531 diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index da75afb8ebc..70fb007885a 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,21 @@ +2021-08-08 François Dumont <fdumont(a)gcc.gnu.org> + + * testsuite/25_algorithms/copy/debug/constexpr_neg.cc: Replace 'failed_assertion' + dg-prune-output reason with 'builtin_unreachable'. + * testsuite/25_algorithms/copy_backward/debug/constexpr_neg.cc: Likewise. + * testsuite/25_algorithms/equal/debug/constexpr_neg.cc: Likewise. + * testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_neg.cc: Likewise. + * testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_pred_neg.cc: Likewise. + * testsuite/25_algorithms/lower_bound/debug/constexpr_valid_range_neg.cc: Likewise. + * testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_neg.cc: Likewise. + * testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_pred_neg.cc: Likewise. + * testsuite/25_algorithms/upper_bound/debug/constexpr_valid_range_neg.cc: Likewise. + +2021-08-08 Hans-Peter Nilsson <hp(a)bitrange.com> + + * testsuite/std/ranges/iota/max_size_type.cc: Set + dg-timeout-factor to 4. + 2021-08-06 Jonathan Wakely <jwakely(a)redhat.com> * libsupc++/compare (compare_three_way, strong_order) </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O2_LTO - Build # 6 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2_LTO Culprit: <cut> commit f8ed31cd991bf84372369409595707f6e3cebbce Author: Jonny Farley <jonny.farley(a)sony.com> Date: Tue Feb 16 16:34:35 2021 +0000 [Fuzzer][Test] Use %python substitution for trace-malloc-unbalanced.test This test was found to fail for some of our downstream builds, on computers where python was not on the default $PATH. Therefore add a %python substitution to use sys.executable, based on similar solutions for python calls in tests elsewhere in LLVM. Differential Revision: https://reviews.llvm.org/D96799 </cut> Results regressed to (for first_bad == f8ed31cd991bf84372369409595707f6e3cebbce) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-f8ed31cd991bf84372369409595707f6e3cebbce/results_id: 1 # 400.perlbench,perlbench_base.default regressed by 103 from (for last_good == cb2876800cc827431f4143c4fc5595c6c1191269) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-cb2876800cc827431f4143c4fc5595c6c1191269/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2_LTO/4095 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2_LTO/4070 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-f8ed31cd991bf84372369409595707f6e3cebbce cd investigate-llvm-f8ed31cd991bf84372369409595707f6e3cebbce git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach f8ed31cd991bf84372369409595707f6e3cebbce ../artifacts/test.sh # Reproduce last_good build git checkout --detach cb2876800cc827431f4143c4fc5595c6c1191269 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit f8ed31cd991bf84372369409595707f6e3cebbce Author: Jonny Farley <jonny.farley(a)sony.com> Date: Tue Feb 16 16:34:35 2021 +0000 [Fuzzer][Test] Use %python substitution for trace-malloc-unbalanced.test This test was found to fail for some of our downstream builds, on computers where python was not on the default $PATH. Therefore add a %python substitution to use sys.executable, based on similar solutions for python calls in tests elsewhere in LLVM. Differential Revision: https://reviews.llvm.org/D96799 --- compiler-rt/test/fuzzer/lit.cfg.py | 2 ++ compiler-rt/test/fuzzer/trace-malloc-unbalanced.test | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/compiler-rt/test/fuzzer/lit.cfg.py b/compiler-rt/test/fuzzer/lit.cfg.py index 00871b8a94a3..d114ae7937fa 100644 --- a/compiler-rt/test/fuzzer/lit.cfg.py +++ b/compiler-rt/test/fuzzer/lit.cfg.py @@ -61,6 +61,8 @@ config.substitutions.append(('%build_dir', config.cmake_binary_dir)) libfuzzer_src_root = os.path.join(config.compiler_rt_src_root, "lib", "fuzzer") config.substitutions.append(('%libfuzzer_src', libfuzzer_src_root)) +config.substitutions.append(('%python', '"%s"' % (sys.executable))) + def generate_compiler_cmd(is_cpp=True, fuzzer_enabled=True, msan_enabled=False): compiler_cmd = config.clang extra_cmd = config.target_flags diff --git a/compiler-rt/test/fuzzer/trace-malloc-unbalanced.test b/compiler-rt/test/fuzzer/trace-malloc-unbalanced.test index c7b4632140cb..395386ab5cd2 100644 --- a/compiler-rt/test/fuzzer/trace-malloc-unbalanced.test +++ b/compiler-rt/test/fuzzer/trace-malloc-unbalanced.test @@ -8,10 +8,10 @@ RUN: %cpp_compiler %S/TraceMallocTest.cpp -o %t-TraceMallocTest # Specify python because we can't use the shebang line on Windows. RUN: %run %t-TraceMallocTest -seed=1 -trace_malloc=1 -runs=200 2>&1 | \ -RUN: python %libfuzzer_src/scripts/unbalanced_allocs.py --skip=5 | FileCheck %s +RUN: %python %libfuzzer_src/scripts/unbalanced_allocs.py --skip=5 | FileCheck %s RUN: %run %t-TraceMallocTest -seed=1 -trace_malloc=2 -runs=200 2>&1 | \ -RUN: python %libfuzzer_src/scripts/unbalanced_allocs.py --skip=5 | FileCheck %s --check-prefixes=CHECK,CHECK2 +RUN: %python %libfuzzer_src/scripts/unbalanced_allocs.py --skip=5 | FileCheck %s --check-prefixes=CHECK,CHECK2 CHECK: MallocFreeTracer: START # Behavior of the format string "%p" is implementation defined. Account for the </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 7 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO Culprit: <cut> commit f343a730596b6b02039a91d71dc16c113d09cfe6 Author: Vitaly Buka <vitalybuka(a)google.com> Date: Fri Apr 2 00:17:45 2021 -0700 [NFC][scudo] Convert ScudoPrimaryTest into TYPED_TEST </cut> Results regressed to (for first_bad == f343a730596b6b02039a91d71dc16c113d09cfe6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-f343a730596b6b02039a91d71dc16c113d09cfe6/results_id: 1 # 462.libquantum,libquantum_base.default regressed by 104 from (for last_good == 28ea218417d713bcb399e9428e4c3f8f7960feb2) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-28ea218417d713bcb399e9428e4c3f8f7960feb2/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/4078 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/4076 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-f343a730596b6b02039a91d71dc16c113d09cfe6 cd investigate-llvm-f343a730596b6b02039a91d71dc16c113d09cfe6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach f343a730596b6b02039a91d71dc16c113d09cfe6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 28ea218417d713bcb399e9428e4c3f8f7960feb2 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit f343a730596b6b02039a91d71dc16c113d09cfe6 Author: Vitaly Buka <vitalybuka(a)google.com> Date: Fri Apr 2 00:17:45 2021 -0700 [NFC][scudo] Convert ScudoPrimaryTest into TYPED_TEST --- .../lib/scudo/standalone/tests/primary_test.cpp | 73 +++++++++++++--------- 1 file changed, 44 insertions(+), 29 deletions(-) diff --git a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp index 38bf67150853..07f3d6b77c17 100644 --- a/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp +++ b/compiler-rt/lib/scudo/standalone/tests/primary_test.cpp @@ -52,8 +52,7 @@ template <typename Primary> static void testPrimary() { Str.output(); } -template <typename SizeClassMapT> struct TestConfig1 { - using SizeClassMap = SizeClassMapT; +struct TestConfig1 { static const scudo::uptr PrimaryRegionSizeLog = 18U; static const scudo::s32 PrimaryMinReleaseToOsIntervalMs = INT32_MIN; static const scudo::s32 PrimaryMaxReleaseToOsIntervalMs = INT32_MAX; @@ -62,8 +61,7 @@ template <typename SizeClassMapT> struct TestConfig1 { static const scudo::uptr PrimaryCompactPtrScale = 0; }; -template <typename SizeClassMapT> struct TestConfig2 { - using SizeClassMap = SizeClassMapT; +struct TestConfig2 { static const scudo::uptr PrimaryRegionSizeLog = 24U; static const scudo::s32 PrimaryMinReleaseToOsIntervalMs = INT32_MIN; static const scudo::s32 PrimaryMaxReleaseToOsIntervalMs = INT32_MAX; @@ -72,8 +70,7 @@ template <typename SizeClassMapT> struct TestConfig2 { static const scudo::uptr PrimaryCompactPtrScale = 0; }; -template <typename SizeClassMapT> struct TestConfig3 { - using SizeClassMap = SizeClassMapT; +struct TestConfig3 { static const scudo::uptr PrimaryRegionSizeLog = 24U; static const scudo::s32 PrimaryMinReleaseToOsIntervalMs = INT32_MIN; static const scudo::s32 PrimaryMaxReleaseToOsIntervalMs = INT32_MAX; @@ -82,13 +79,43 @@ template <typename SizeClassMapT> struct TestConfig3 { static const scudo::uptr PrimaryCompactPtrScale = 0; }; -TEST(ScudoPrimaryTest, BasicPrimary) { - using SizeClassMap = scudo::DefaultSizeClassMap; +template <typename BaseConfig, typename SizeClassMapT> +struct Config : public BaseConfig { + using SizeClassMap = SizeClassMapT; +}; + +template <typename BaseConfig, typename SizeClassMapT> struct MakeAllocator { + using Value = scudo::SizeClassAllocator64<Config<BaseConfig, SizeClassMapT>>; +}; + +template <typename SizeClassMapT> +struct MakeAllocator<TestConfig1, SizeClassMapT> { + using Value = scudo::SizeClassAllocator32<Config<TestConfig1, SizeClassMapT>>; +}; + +namespace testing { +namespace internal { +#define SCUDO_DEFINE_GTEST_TYPE_NAME(TYPE) \ + template <> std::string GetTypeName<TYPE>() { return #TYPE; } +SCUDO_DEFINE_GTEST_TYPE_NAME(TestConfig1) +SCUDO_DEFINE_GTEST_TYPE_NAME(TestConfig2) +SCUDO_DEFINE_GTEST_TYPE_NAME(TestConfig3) +#undef SCUDO_DEFINE_GTEST_TYPE_NAME +} // namespace internal +} // namespace testing + +template <class BaseConfig> struct ScudoPrimaryTest : public ::testing::Test {}; + +using ScudoPrimaryTestTypes = testing::Types< #if !SCUDO_FUCHSIA - testPrimary<scudo::SizeClassAllocator32<TestConfig1<SizeClassMap>>>(); + TestConfig1, #endif - testPrimary<scudo::SizeClassAllocator64<TestConfig2<SizeClassMap>>>(); - testPrimary<scudo::SizeClassAllocator64<TestConfig3<SizeClassMap>>>(); + TestConfig2, TestConfig3>; +TYPED_TEST_CASE(ScudoPrimaryTest, ScudoPrimaryTestTypes); + +TYPED_TEST(ScudoPrimaryTest, BasicPrimary) { + using SizeClassMap = scudo::DefaultSizeClassMap; + testPrimary<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); } struct SmallRegionsConfig { @@ -178,13 +205,9 @@ template <typename Primary> static void testIteratePrimary() { Str.output(); } -TEST(ScudoPrimaryTest, PrimaryIterate) { +TYPED_TEST(ScudoPrimaryTest, PrimaryIterate) { using SizeClassMap = scudo::DefaultSizeClassMap; -#if !SCUDO_FUCHSIA - testIteratePrimary<scudo::SizeClassAllocator32<TestConfig1<SizeClassMap>>>(); -#endif - testIteratePrimary<scudo::SizeClassAllocator64<TestConfig2<SizeClassMap>>>(); - testIteratePrimary<scudo::SizeClassAllocator64<TestConfig3<SizeClassMap>>>(); + testIteratePrimary<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); } static std::mutex Mutex; @@ -239,13 +262,9 @@ template <typename Primary> static void testPrimaryThreaded() { Str.output(); } -TEST(ScudoPrimaryTest, PrimaryThreaded) { +TYPED_TEST(ScudoPrimaryTest, PrimaryThreaded) { using SizeClassMap = scudo::SvelteSizeClassMap; -#if !SCUDO_FUCHSIA - testPrimaryThreaded<scudo::SizeClassAllocator32<TestConfig1<SizeClassMap>>>(); -#endif - testPrimaryThreaded<scudo::SizeClassAllocator64<TestConfig2<SizeClassMap>>>(); - testPrimaryThreaded<scudo::SizeClassAllocator64<TestConfig3<SizeClassMap>>>(); + testPrimaryThreaded<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); } // Through a simple allocation that spans two pages, verify that releaseToOS @@ -270,11 +289,7 @@ template <typename Primary> static void testReleaseToOS() { EXPECT_GT(Allocator->releaseToOS(), 0U); } -TEST(ScudoPrimaryTest, ReleaseToOS) { +TYPED_TEST(ScudoPrimaryTest, ReleaseToOS) { using SizeClassMap = scudo::DefaultSizeClassMap; -#if !SCUDO_FUCHSIA - testReleaseToOS<scudo::SizeClassAllocator32<TestConfig1<SizeClassMap>>>(); -#endif - testReleaseToOS<scudo::SizeClassAllocator64<TestConfig2<SizeClassMap>>>(); - testReleaseToOS<scudo::SizeClassAllocator64<TestConfig3<SizeClassMap>>>(); + testReleaseToOS<typename MakeAllocator<TypeParam, SizeClassMap>::Value>(); } </cut>

3 years, 10 months

1
0
0 0

[ACTIVITY] report week ending 20 Aug

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] + We needed an rc4 (as usual) + Tried to work through some of my code review patchlog, notably some big alignment-related series from RTH + Sent patchseries for some small things: + implement last few bits of HSTR trap-to-hypervisor functionality + actually take an exception if PSTATE.IL gets set + don't assert if user asks for both an EL3 guest CPU and KVM * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + worked through rth's code review comments for fp insn patches these are now ready to send out once QEMU makes its 6.1 release and the previous slice of reviewed patches can get into the tree -- PMM

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3 - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3 Culprit: <cut> commit a838a4f69f500fc8e39fb4c9a1476f162ccf8423 Author: David Green <david.green(a)arm.com> Date: Mon Feb 15 13:17:21 2021 +0000 [ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881 </cut> Results regressed to (for first_bad == a838a4f69f500fc8e39fb4c9a1476f162ccf8423) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-a838a4f69f500fc8e39fb4c9a1476f162ccf8423/results_id: 1 # 482.sphinx3,sphinx_livepretend_base.default regressed by 104 # 482.sphinx3,[.] vector_gautbl_eval_logs3 regressed by 115 from (for last_good == 20e3a6cb6270b68139f74529ab8efdfad1263533) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-20e3a6cb6270b68139f74529ab8efdfad1263533/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4025 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4022 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-a838a4f69f500fc8e39fb4c9a1476f162ccf8423 cd investigate-llvm-a838a4f69f500fc8e39fb4c9a1476f162ccf8423 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a838a4f69f500fc8e39fb4c9a1476f162ccf8423 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 20e3a6cb6270b68139f74529ab8efdfad1263533 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit a838a4f69f500fc8e39fb4c9a1476f162ccf8423 Author: David Green <david.green(a)arm.com> Date: Mon Feb 15 13:17:21 2021 +0000 [ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881 --- llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp | 40 +++++++++++++------ llvm/test/CodeGen/ARM/indexed-mem.ll | 6 +-- .../Thumb2/LowOverheadLoops/fast-fp-loops.ll | 9 ++--- .../Thumb2/LowOverheadLoops/mve-float-loops.ll | 45 ++++++++-------------- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 6 +-- llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll | 9 ++--- llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll | 9 ++--- llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll | 18 +++------ llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll | 6 +-- llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll | 3 +- 10 files changed, 66 insertions(+), 85 deletions(-) diff --git a/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp b/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp index aa1fe4e4ffda..5fe61809f31b 100644 --- a/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp +++ b/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp @@ -1238,19 +1238,37 @@ findIncDecBefore(MachineBasicBlock::iterator MBBI, Register Reg, /// Searches for a increment or decrement of \p Reg after \p MBBI. static MachineBasicBlock::iterator findIncDecAfter(MachineBasicBlock::iterator MBBI, Register Reg, - ARMCC::CondCodes Pred, Register PredReg, int &Offset) { + ARMCC::CondCodes Pred, Register PredReg, int &Offset, + const TargetRegisterInfo *TRI) { Offset = 0; MachineBasicBlock &MBB = *MBBI->getParent(); MachineBasicBlock::iterator EndMBBI = MBB.end(); MachineBasicBlock::iterator NextMBBI = std::next(MBBI); - // Skip debug values. - while (NextMBBI != EndMBBI && NextMBBI->isDebugInstr()) - ++NextMBBI; - if (NextMBBI == EndMBBI) - return EndMBBI; + while (NextMBBI != EndMBBI) { + // Skip debug values. + while (NextMBBI != EndMBBI && NextMBBI->isDebugInstr()) + ++NextMBBI; + if (NextMBBI == EndMBBI) + return EndMBBI; + + unsigned Off = isIncrementOrDecrement(*NextMBBI, Reg, Pred, PredReg); + if (Off) { + Offset = Off; + return NextMBBI; + } - Offset = isIncrementOrDecrement(*NextMBBI, Reg, Pred, PredReg); - return Offset == 0 ? EndMBBI : NextMBBI; + // SP can only be combined if it is the next instruction after the original + // MBBI, otherwise we may be incrementing the stack pointer (invalidating + // anything below the new pointer) when its frame elements are still in + // use. Other registers can attempt to look further, until a different use + // or def of the register is found. + if (Reg == ARM::SP || NextMBBI->readsRegister(Reg, TRI) || + NextMBBI->definesRegister(Reg, TRI)) + return EndMBBI; + + ++NextMBBI; + } + return EndMBBI; } /// Fold proceeding/trailing inc/dec of base register into the @@ -1296,7 +1314,7 @@ bool ARMLoadStoreOpt::MergeBaseUpdateLSMultiple(MachineInstr *MI) { } else if (Mode == ARM_AM::ib && Offset == -Bytes) { Mode = ARM_AM::da; } else { - MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset); + MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset, TRI); if (((Mode != ARM_AM::ia && Mode != ARM_AM::ib) || Offset != Bytes) && ((Mode != ARM_AM::da && Mode != ARM_AM::db) || Offset != -Bytes)) { @@ -1483,7 +1501,7 @@ bool ARMLoadStoreOpt::MergeBaseUpdateLoadStore(MachineInstr *MI) { } else if (Offset == -Bytes) { NewOpc = getPreIndexedLoadStoreOpcode(Opcode, ARM_AM::sub); } else { - MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset); + MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset, TRI); if (Offset == Bytes) { NewOpc = getPostIndexedLoadStoreOpcode(Opcode, ARM_AM::add); } else if (!isAM5 && Offset == -Bytes) { @@ -1614,7 +1632,7 @@ bool ARMLoadStoreOpt::MergeBaseUpdateLSDouble(MachineInstr &MI) const { if (Offset == 8 || Offset == -8) { NewOpc = Opcode == ARM::t2LDRDi8 ? ARM::t2LDRD_PRE : ARM::t2STRD_PRE; } else { - MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset); + MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset, TRI); if (Offset == 8 || Offset == -8) { NewOpc = Opcode == ARM::t2LDRDi8 ? ARM::t2LDRD_POST : ARM::t2STRD_POST; } else diff --git a/llvm/test/CodeGen/ARM/indexed-mem.ll b/llvm/test/CodeGen/ARM/indexed-mem.ll index a5f8409a50a2..295bb377d732 100644 --- a/llvm/test/CodeGen/ARM/indexed-mem.ll +++ b/llvm/test/CodeGen/ARM/indexed-mem.ll @@ -220,16 +220,14 @@ define i32* @pre_dec_ldrd(i32* %base) { define i32* @post_inc_ldrd(i32* %base, i32* %addr.3) { ; CHECK-V8M-LABEL: post_inc_ldrd: ; CHECK-V8M: @ %bb.0: -; CHECK-V8M-NEXT: ldrd r2, r3, [r0] -; CHECK-V8M-NEXT: adds r0, #8 +; CHECK-V8M-NEXT: ldrd r2, r3, [r0], #8 ; CHECK-V8M-NEXT: add r2, r3 ; CHECK-V8M-NEXT: str r2, [r1] ; CHECK-V8M-NEXT: bx lr ; ; CHECK-V8A-LABEL: post_inc_ldrd: ; CHECK-V8A: @ %bb.0: -; CHECK-V8A-NEXT: ldm r0, {r2, r3} -; CHECK-V8A-NEXT: add r0, r0, #8 +; CHECK-V8A-NEXT: ldm r0!, {r2, r3} ; CHECK-V8A-NEXT: add r2, r2, r3 ; CHECK-V8A-NEXT: str r2, [r1] ; CHECK-V8A-NEXT: bx lr diff --git a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll index f8fb8476c322..8b27a9348418 100644 --- a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll +++ b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll @@ -82,13 +82,10 @@ define arm_aapcs_vfpcc void @fast_float_mul(float* nocapture %a, float* nocaptur ; CHECK-NEXT: add.w r0, r0, r3, lsl #2 ; CHECK-NEXT: .LBB0_10: @ %for.body.epil ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s0, [r1] -; CHECK-NEXT: adds r1, #4 -; CHECK-NEXT: vldr s2, [r2] -; CHECK-NEXT: adds r2, #4 +; CHECK-NEXT: vldmia r1!, {s0} +; CHECK-NEXT: vldmia r2!, {s2} ; CHECK-NEXT: vmul.f32 s0, s2, s0 -; CHECK-NEXT: vstr s0, [r0] -; CHECK-NEXT: adds r0, #4 +; CHECK-NEXT: vstmia r0!, {s0} ; CHECK-NEXT: le lr, .LBB0_10 ; CHECK-NEXT: .LBB0_11: @ %for.cond.cleanup ; CHECK-NEXT: pop {r4, r5, r6, r7, pc} diff --git a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll index f962458ddb11..d143976927b2 100644 --- a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll +++ b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll @@ -43,14 +43,11 @@ define arm_aapcs_vfpcc void @float_float_mul(float* nocapture readonly %a, float ; CHECK-NEXT: add.w r7, r2, r12, lsl #2 ; CHECK-NEXT: .LBB0_6: @ %for.body.prol ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s0, [r6] -; CHECK-NEXT: adds r6, #4 -; CHECK-NEXT: vldr s2, [r5] -; CHECK-NEXT: adds r5, #4 +; CHECK-NEXT: vldmia r6!, {s0} ; CHECK-NEXT: add.w r12, r12, #1 +; CHECK-NEXT: vldmia r5!, {s2} ; CHECK-NEXT: vmul.f32 s0, s2, s0 -; CHECK-NEXT: vstr s0, [r7] -; CHECK-NEXT: adds r7, #4 +; CHECK-NEXT: vstmia r7!, {s0} ; CHECK-NEXT: le lr, .LBB0_6 ; CHECK-NEXT: .LBB0_7: @ %for.body.prol.loopexit ; CHECK-NEXT: cmp r4, #3 @@ -261,14 +258,11 @@ define arm_aapcs_vfpcc void @float_float_add(float* nocapture readonly %a, float ; CHECK-NEXT: add.w r7, r2, r12, lsl #2 ; CHECK-NEXT: .LBB1_6: @ %for.body.prol ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s0, [r6] -; CHECK-NEXT: adds r6, #4 -; CHECK-NEXT: vldr s2, [r5] -; CHECK-NEXT: adds r5, #4 +; CHECK-NEXT: vldmia r6!, {s0} ; CHECK-NEXT: add.w r12, r12, #1 +; CHECK-NEXT: vldmia r5!, {s2} ; CHECK-NEXT: vadd.f32 s0, s2, s0 -; CHECK-NEXT: vstr s0, [r7] -; CHECK-NEXT: adds r7, #4 +; CHECK-NEXT: vstmia r7!, {s0} ; CHECK-NEXT: le lr, .LBB1_6 ; CHECK-NEXT: .LBB1_7: @ %for.body.prol.loopexit ; CHECK-NEXT: cmp r4, #3 @@ -479,14 +473,11 @@ define arm_aapcs_vfpcc void @float_float_sub(float* nocapture readonly %a, float ; CHECK-NEXT: add.w r7, r2, r12, lsl #2 ; CHECK-NEXT: .LBB2_6: @ %for.body.prol ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s0, [r6] -; CHECK-NEXT: adds r6, #4 -; CHECK-NEXT: vldr s2, [r5] -; CHECK-NEXT: adds r5, #4 +; CHECK-NEXT: vldmia r6!, {s0} ; CHECK-NEXT: add.w r12, r12, #1 +; CHECK-NEXT: vldmia r5!, {s2} ; CHECK-NEXT: vsub.f32 s0, s2, s0 -; CHECK-NEXT: vstr s0, [r7] -; CHECK-NEXT: adds r7, #4 +; CHECK-NEXT: vstmia r7!, {s0} ; CHECK-NEXT: le lr, .LBB2_6 ; CHECK-NEXT: .LBB2_7: @ %for.body.prol.loopexit ; CHECK-NEXT: cmp r4, #3 @@ -706,13 +697,11 @@ define arm_aapcs_vfpcc void @float_int_mul(float* nocapture readonly %a, i32* no ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 ; CHECK-NEXT: ldr r4, [r6], #4 ; CHECK-NEXT: add.w r12, r12, #1 -; CHECK-NEXT: vldr s2, [r5] -; CHECK-NEXT: adds r5, #4 +; CHECK-NEXT: vldmia r5!, {s2} ; CHECK-NEXT: vmov s0, r4 ; CHECK-NEXT: vcvt.f32.s32 s0, s0 ; CHECK-NEXT: vmul.f32 s0, s2, s0 -; CHECK-NEXT: vstr s0, [r7] -; CHECK-NEXT: adds r7, #4 +; CHECK-NEXT: vstmia r7!, {s0} ; CHECK-NEXT: le lr, .LBB3_9 ; CHECK-NEXT: .LBB3_10: @ %for.body.prol.loopexit ; CHECK-NEXT: cmp.w r8, #3 @@ -1025,8 +1014,7 @@ define arm_aapcs_vfpcc void @half_half_mul(half* nocapture readonly %a, half* no ; CHECK-NEXT: adds r1, #2 ; CHECK-NEXT: vmul.f16 s0, s2, s0 ; CHECK-NEXT: vcvtb.f32.f16 s0, s0 -; CHECK-NEXT: vstr s0, [r2] -; CHECK-NEXT: adds r2, #4 +; CHECK-NEXT: vstmia r2!, {s0} ; CHECK-NEXT: le lr, .LBB5_7 ; CHECK-NEXT: .LBB5_8: @ %for.cond.cleanup ; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, pc} @@ -1140,8 +1128,7 @@ define arm_aapcs_vfpcc void @half_half_add(half* nocapture readonly %a, half* no ; CHECK-NEXT: adds r1, #2 ; CHECK-NEXT: vadd.f16 s0, s2, s0 ; CHECK-NEXT: vcvtb.f32.f16 s0, s0 -; CHECK-NEXT: vstr s0, [r2] -; CHECK-NEXT: adds r2, #4 +; CHECK-NEXT: vstmia r2!, {s0} ; CHECK-NEXT: le lr, .LBB6_7 ; CHECK-NEXT: .LBB6_8: @ %for.cond.cleanup ; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, pc} @@ -1255,8 +1242,7 @@ define arm_aapcs_vfpcc void @half_half_sub(half* nocapture readonly %a, half* no ; CHECK-NEXT: adds r1, #2 ; CHECK-NEXT: vsub.f16 s0, s2, s0 ; CHECK-NEXT: vcvtb.f32.f16 s0, s0 -; CHECK-NEXT: vstr s0, [r2] -; CHECK-NEXT: adds r2, #4 +; CHECK-NEXT: vstmia r2!, {s0} ; CHECK-NEXT: le lr, .LBB7_7 ; CHECK-NEXT: .LBB7_8: @ %for.cond.cleanup ; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, pc} @@ -1376,8 +1362,7 @@ define arm_aapcs_vfpcc void @half_short_mul(half* nocapture readonly %a, i16* no ; CHECK-NEXT: vcvt.f16.s32 s2, s2 ; CHECK-NEXT: vmul.f16 s0, s0, s2 ; CHECK-NEXT: vcvtb.f32.f16 s0, s0 -; CHECK-NEXT: vstr s0, [r2] -; CHECK-NEXT: adds r2, #4 +; CHECK-NEXT: vstmia r2!, {s0} ; CHECK-NEXT: le lr, .LBB8_7 ; CHECK-NEXT: .LBB8_8: @ %for.cond.cleanup ; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, pc} diff --git a/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll b/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll index 0156cfe25f8e..7e4603e4b4c6 100644 --- a/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll +++ b/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll @@ -1442,8 +1442,7 @@ define arm_aapcs_vfpcc void @arm_biquad_cascade_stereo_df2T_f32(%struct.arm_biqu ; CHECK-NEXT: adds r1, #8 ; CHECK-NEXT: vfma.f32 q5, q4, r5 ; CHECK-NEXT: vfma.f32 q3, q5, q2 -; CHECK-NEXT: vstmia r7, {s20, s21} -; CHECK-NEXT: adds r7, #8 +; CHECK-NEXT: vstmia r7!, {s20, s21} ; CHECK-NEXT: vfma.f32 q3, q4, q1 ; CHECK-NEXT: vstrw.32 q3, [r4] ; CHECK-NEXT: le lr, .LBB17_3 @@ -2069,8 +2068,7 @@ define void @arm_biquad_cascade_df2T_f32(%struct.arm_biquad_cascade_df2T_instanc ; CHECK-NEXT: .LBB20_5: @ %while.body ; CHECK-NEXT: @ Parent Loop BB20_3 Depth=1 ; CHECK-NEXT: @ => This Inner Loop Header: Depth=2 -; CHECK-NEXT: ldrd r7, r4, [r1] -; CHECK-NEXT: adds r1, #8 +; CHECK-NEXT: ldrd r7, r4, [r1], #8 ; CHECK-NEXT: vfma.f32 q6, q3, r7 ; CHECK-NEXT: vmov r7, s24 ; CHECK-NEXT: vmov q1, q6 diff --git a/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll b/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll index b34896e32859..0eb1226f60db 100644 --- a/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll +++ b/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll @@ -309,14 +309,11 @@ define void @fma8(float* noalias nocapture readonly %A, float* noalias nocapture ; CHECK-NEXT: add.w r2, r2, r12, lsl #2 ; CHECK-NEXT: .LBB2_7: @ %for.body ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s0, [r0] -; CHECK-NEXT: adds r0, #4 -; CHECK-NEXT: vldr s2, [r1] -; CHECK-NEXT: adds r1, #4 +; CHECK-NEXT: vldmia r0!, {s0} +; CHECK-NEXT: vldmia r1!, {s2} ; CHECK-NEXT: vldr s4, [r2] ; CHECK-NEXT: vfma.f32 s4, s2, s0 -; CHECK-NEXT: vstr s4, [r2] -; CHECK-NEXT: adds r2, #4 +; CHECK-NEXT: vstmia r2!, {s4} ; CHECK-NEXT: le lr, .LBB2_7 ; CHECK-NEXT: .LBB2_8: @ %for.cond.cleanup ; CHECK-NEXT: pop {r4, r5, r6, pc} diff --git a/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll b/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll index 070d9b744836..1b6cdfc517be 100644 --- a/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll +++ b/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll @@ -44,14 +44,11 @@ define void @fma(float* noalias nocapture readonly %A, float* noalias nocapture ; CHECK-NEXT: add.w r2, r2, r12, lsl #2 ; CHECK-NEXT: .LBB0_7: @ %for.body ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s0, [r0] -; CHECK-NEXT: adds r0, #4 -; CHECK-NEXT: vldr s2, [r1] -; CHECK-NEXT: adds r1, #4 +; CHECK-NEXT: vldmia r0!, {s0} +; CHECK-NEXT: vldmia r1!, {s2} ; CHECK-NEXT: vldr s4, [r2] ; CHECK-NEXT: vfma.f32 s4, s2, s0 -; CHECK-NEXT: vstr s4, [r2] -; CHECK-NEXT: adds r2, #4 +; CHECK-NEXT: vstmia r2!, {s4} ; CHECK-NEXT: le lr, .LBB0_7 ; CHECK-NEXT: .LBB0_8: @ %for.cond.cleanup ; CHECK-NEXT: pop {r4, r5, r6, pc} diff --git a/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll b/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll index 93a1535a42fe..f69eeb773a9f 100644 --- a/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll +++ b/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll @@ -38,12 +38,10 @@ define arm_aapcs_vfpcc void @ssatmul_s_q31(i32* nocapture readonly %pSrcA, i32* ; CHECK-NEXT: vmvn.i32 q1, #0x80000000 ; CHECK-NEXT: .LBB0_4: @ %vector.body ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: ldrd r5, r4, [r0] +; CHECK-NEXT: ldrd r5, r4, [r0], #8 ; CHECK-NEXT: mov.w r3, #-1 -; CHECK-NEXT: ldrd r8, r7, [r1] -; CHECK-NEXT: adds r0, #8 +; CHECK-NEXT: ldrd r8, r7, [r1], #8 ; CHECK-NEXT: smull r4, r7, r7, r4 -; CHECK-NEXT: adds r1, #8 ; CHECK-NEXT: asrl r4, r7, #31 ; CHECK-NEXT: smull r6, r5, r8, r5 ; CHECK-NEXT: rsbs.w r9, r4, #-2147483648 @@ -95,8 +93,7 @@ define arm_aapcs_vfpcc void @ssatmul_s_q31(i32* nocapture readonly %pSrcA, i32* ; CHECK-NEXT: vorr q2, q2, q4 ; CHECK-NEXT: vmov r3, s10 ; CHECK-NEXT: vmov r4, s8 -; CHECK-NEXT: strd r4, r3, [r2] -; CHECK-NEXT: adds r2, #8 +; CHECK-NEXT: strd r4, r3, [r2], #8 ; CHECK-NEXT: le lr, .LBB0_4 ; CHECK-NEXT: @ %bb.5: @ %middle.block ; CHECK-NEXT: ldrd r7, r3, [sp] @ 8-byte Folded Reload @@ -744,10 +741,8 @@ define arm_aapcs_vfpcc void @usatmul_2_q31(i32* nocapture readonly %pSrcA, i32* ; CHECK-NEXT: add.w r12, r0, r5, lsl #2 ; CHECK-NEXT: .LBB3_4: @ %vector.body ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: ldrd r4, r7, [r0] -; CHECK-NEXT: adds r0, #8 -; CHECK-NEXT: ldrd r5, r10, [r1] -; CHECK-NEXT: adds r1, #8 +; CHECK-NEXT: ldrd r4, r7, [r0], #8 +; CHECK-NEXT: ldrd r5, r10, [r1], #8 ; CHECK-NEXT: umull r4, r5, r5, r4 ; CHECK-NEXT: lsrl r4, r5, #31 ; CHECK-NEXT: subs.w r6, r4, #-1 @@ -773,8 +768,7 @@ define arm_aapcs_vfpcc void @usatmul_2_q31(i32* nocapture readonly %pSrcA, i32* ; CHECK-NEXT: vorn q0, q1, q0 ; CHECK-NEXT: vmov r4, s2 ; CHECK-NEXT: vmov r5, s0 -; CHECK-NEXT: strd r5, r4, [r2] -; CHECK-NEXT: adds r2, #8 +; CHECK-NEXT: strd r5, r4, [r2], #8 ; CHECK-NEXT: le lr, .LBB3_4 ; CHECK-NEXT: @ %bb.5: @ %middle.block ; CHECK-NEXT: ldr r7, [sp] @ 4-byte Reload diff --git a/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll b/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll index 803f20571672..4393e4646bab 100644 --- a/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll +++ b/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll @@ -521,8 +521,7 @@ define float @fadd_f32(float* nocapture readonly %x, i32 %n) { ; CHECK-NEXT: add.w r0, r0, r2, lsl #2 ; CHECK-NEXT: .LBB5_8: @ %for.body ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s2, [r0] -; CHECK-NEXT: adds r0, #4 +; CHECK-NEXT: vldmia r0!, {s2} ; CHECK-NEXT: vadd.f32 s0, s2, s0 ; CHECK-NEXT: le lr, .LBB5_8 ; CHECK-NEXT: .LBB5_9: @ %for.cond.cleanup @@ -620,8 +619,7 @@ define float @fmul_f32(float* nocapture readonly %x, i32 %n) { ; CHECK-NEXT: add.w r0, r0, r2, lsl #2 ; CHECK-NEXT: .LBB6_8: @ %for.body ; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: vldr s2, [r0] -; CHECK-NEXT: adds r0, #4 +; CHECK-NEXT: vldmia r0!, {s2} ; CHECK-NEXT: vmul.f32 s0, s2, s0 ; CHECK-NEXT: le lr, .LBB6_8 ; CHECK-NEXT: .LBB6_9: @ %for.cond.cleanup diff --git a/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll b/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll index 80e65f1ee855..d26757fc99e8 100644 --- a/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll +++ b/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll @@ -176,8 +176,7 @@ define void @arm_cmplx_mag_squared_f32(float* nocapture readonly %pSrc, float* n ; CHECK-NEXT: adds r3, #8 ; CHECK-NEXT: vmul.f32 s0, s0, s0 ; CHECK-NEXT: vfma.f32 s0, s2, s2 -; CHECK-NEXT: vstr s0, [r12] -; CHECK-NEXT: add.w r12, r12, #4 +; CHECK-NEXT: vstmia r12!, {s0} ; CHECK-NEXT: le lr, .LBB1_7 ; CHECK-NEXT: .LBB1_8: @ %while.end ; CHECK-NEXT: pop {r4, r5, r7, pc} </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap_debug - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap_debug. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap_debug Culprit: <cut> commit d881460deb1f0bdfc3e8fa2d391a03a9763cbff4 Author: Harald Anlauf <anlauf(a)gmx.de> Date: Thu Aug 19 21:00:45 2021 +0200 Fortran - simplify length of substring with constant bounds gcc/fortran/ChangeLog: PR fortran/100950 * simplify.c (substring_has_constant_len): New. (gfc_simplify_len): Handle case of substrings with constant bounds. gcc/testsuite/ChangeLog: PR fortran/100950 * gfortran.dg/pr100950.f90: New test. </cut> Results regressed to (for first_bad == d881460deb1f0bdfc3e8fa2d391a03a9763cbff4) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:12:48 make[3]: [armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1 (ignored) # 00:21:17 /home/tcwg-buildslave/workspace/tcwg_gnu_15/abe/snapshots/gcc.git~master/gcc/fortran/simplify.c:4557:22: error: unknown conversion type character ‘l’ in format [-Werror=format=] # 00:21:17 /home/tcwg-buildslave/workspace/tcwg_gnu_15/abe/snapshots/gcc.git~master/gcc/fortran/simplify.c:4557:22: error: format ‘%L’ expects argument of type ‘locus*’, but argument 2 has type ‘long long int’ [-Werror=format=] # 00:21:17 /home/tcwg-buildslave/workspace/tcwg_gnu_15/abe/snapshots/gcc.git~master/gcc/fortran/simplify.c:4557:22: error: too many arguments for format [-Werror=format-extra-args] # 00:21:17 /home/tcwg-buildslave/workspace/tcwg_gnu_15/abe/snapshots/gcc.git~master/gcc/fortran/simplify.c:4570:22: error: unknown conversion type character ‘l’ in format [-Werror=format=] # 00:21:17 /home/tcwg-buildslave/workspace/tcwg_gnu_15/abe/snapshots/gcc.git~master/gcc/fortran/simplify.c:4570:22: error: format ‘%L’ expects argument of type ‘locus*’, but argument 2 has type ‘long long int’ [-Werror=format=] # 00:21:17 /home/tcwg-buildslave/workspace/tcwg_gnu_15/abe/snapshots/gcc.git~master/gcc/fortran/simplify.c:4570:22: error: too many arguments for format [-Werror=format-extra-args] # 00:21:38 make[3]: *** [fortran/simplify.o] Error 1 # 00:28:40 make[2]: *** [all-stage2-gcc] Error 2 # 00:28:40 make[1]: *** [stage2-bubble] Error 2 from (for last_good == 77bf9f83b8e388de8bedb259991f588a7b8a7f57) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_debug: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-d881460deb1f0bdfc3e8fa2d391a03a9763cbff4 cd investigate-gcc-d881460deb1f0bdfc3e8fa2d391a03a9763cbff4 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach d881460deb1f0bdfc3e8fa2d391a03a9763cbff4 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 77bf9f83b8e388de8bedb259991f588a7b8a7f57 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap_de… Full commit (up to 1000 lines): <cut> commit d881460deb1f0bdfc3e8fa2d391a03a9763cbff4 Author: Harald Anlauf <anlauf(a)gmx.de> Date: Thu Aug 19 21:00:45 2021 +0200 Fortran - simplify length of substring with constant bounds gcc/fortran/ChangeLog: PR fortran/100950 * simplify.c (substring_has_constant_len): New. (gfc_simplify_len): Handle case of substrings with constant bounds. gcc/testsuite/ChangeLog: PR fortran/100950 * gfortran.dg/pr100950.f90: New test. --- gcc/fortran/simplify.c | 75 +++++++++++++++++++++++++++++++++- gcc/testsuite/gfortran.dg/pr100950.f90 | 53 ++++++++++++++++++++++++ 2 files changed, 127 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c index c27b47aa98f..492867e12cb 100644 --- a/gcc/fortran/simplify.c +++ b/gcc/fortran/simplify.c @@ -4512,6 +4512,78 @@ gfc_simplify_leadz (gfc_expr *e) } +/* Check for constant length of a substring. */ + +static bool +substring_has_constant_len (gfc_expr *e) +{ + gfc_ref *ref; + HOST_WIDE_INT istart, iend, length; + bool equal_length = false; + + if (e->ts.type != BT_CHARACTER) + return false; + + for (ref = e->ref; ref; ref = ref->next) + if (ref->type != REF_COMPONENT && ref->type != REF_ARRAY) + break; + + if (!ref + || ref->type != REF_SUBSTRING + || !ref->u.ss.start + || ref->u.ss.start->expr_type != EXPR_CONSTANT + || !ref->u.ss.end + || ref->u.ss.end->expr_type != EXPR_CONSTANT + || !ref->u.ss.length) + return false; + + /* For non-deferred strings the given length shall be constant. */ + if (!e->ts.deferred + && (!ref->u.ss.length->length + || ref->u.ss.length->length->expr_type != EXPR_CONSTANT)) + return false; + + /* Basic checks on substring starting and ending indices. */ + if (!gfc_resolve_substring (ref, &equal_length)) + return false; + + istart = gfc_mpz_get_hwi (ref->u.ss.start->value.integer); + iend = gfc_mpz_get_hwi (ref->u.ss.end->value.integer); + + if (istart <= iend) + { + if (istart < 1) + { + gfc_error ("Substring start index (" HOST_WIDE_INT_PRINT_DEC + ") at %L below 1", + istart, &ref->u.ss.start->where); + return false; + } + + /* For deferred strings use end index as proxy for length. */ + if (e->ts.deferred) + length = iend; + else + length = gfc_mpz_get_hwi (ref->u.ss.length->length->value.integer); + if (iend > length) + { + gfc_error ("Substring end index (" HOST_WIDE_INT_PRINT_DEC + ") at %L exceeds string length", + iend, &ref->u.ss.end->where); + return false; + } + length = iend - istart + 1; + } + else + length = 0; + + /* Fix substring length. */ + e->value.character.length = length; + + return true; +} + + gfc_expr * gfc_simplify_len (gfc_expr *e, gfc_expr *kind) { @@ -4521,7 +4593,8 @@ gfc_simplify_len (gfc_expr *e, gfc_expr *kind) if (k == -1) return &gfc_bad_expr; - if (e->expr_type == EXPR_CONSTANT) + if (e->expr_type == EXPR_CONSTANT + || substring_has_constant_len (e)) { result = gfc_get_constant_expr (BT_INTEGER, k, &e->where); mpz_set_si (result->value.integer, e->value.character.length); diff --git a/gcc/testsuite/gfortran.dg/pr100950.f90 b/gcc/testsuite/gfortran.dg/pr100950.f90 new file mode 100644 index 00000000000..cb9d126bc18 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr100950.f90 @@ -0,0 +1,53 @@ +! { dg-do run } +! { dg-additional-options "-fdump-tree-original" } +! PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514 + +program p + character(8), parameter :: u = "123" + character(8) :: x = "", s + character(2) :: w(2) = [character(len(x(3:4))) :: 'a','b' ] + character(*), parameter :: y(*) = [character(len(u(3:4))) :: 'a','b' ] + character(*), parameter :: z(*) = [character(len(x(3:4))) :: 'a','b' ] + character(*), parameter :: t(*) = [character(len(x( :2))) :: 'a','b' ] + character(*), parameter :: v(*) = [character(len(x(7: ))) :: 'a','b' ] + type t_ + character(len=5) :: s + character(len=8) :: t(4) + character(len=8), pointer :: u(:) + character(len=:), allocatable :: str + character(len=:), allocatable :: str2(:) + end type t_ + type(t_) :: q, r(1) + integer, parameter :: lq = len (q%s(3:4)), lr = len (r%s(3:4)) + integer, parameter :: l1 = len (q %t(1)(3:4)) + integer, parameter :: l2 = len (q %t(:)(3:4)) + integer, parameter :: l3 = len (q %str (3:4)) + integer, parameter :: l4 = len (r(:)%t(1)(3:4)) + integer, parameter :: l5 = len (r(1)%t(:)(3:4)) + integer, parameter :: l6 = len (r(1)%str (3:4)) + integer, parameter :: l7 = len (r(1)%str2(1)(3:4)) + integer, parameter :: l8 = len (r(1)%str2(:)(3:4)) + + if (len (y) /= 2) stop 1 + if (len (z) /= 2) stop 2 + if (any (w /= y)) stop 3 + if (len ([character(len(u(3:4))) :: 'a','b' ]) /= 2) stop 4 + if (len ([character(len(x(3:4))) :: 'a','b' ]) /= 2) stop 5 + if (any ([character(len(x(3:4))) :: 'a','b' ] /= y)) stop 6 + write(s,*) [character(len(x(3:4))) :: 'a','b' ] + if (s /= " a b ") stop 7 + if (len (t) /= 2) stop 8 + if (len (v) /= 2) stop 9 + if (lq /= 2 .or. lr /= 2) stop 10 + if (l1 /= 2 .or. l2 /= 2 .or. l4 /= 2 .or. l5 /= 2) stop 11 + if (l3 /= 2 .or. l6 /= 2 .or. l7 /= 2 .or. l8 /= 2) stop 12 + + block + integer, parameter :: l9 = len (r(1)%u(:)(3:4)) + if (l9 /= 2) stop 13 + end block +end + +! { dg-final { scan-tree-dump-times "_gfortran_stop_numeric" 2 "original" } } +! { dg-final { scan-tree-dump "_gfortran_stop_numeric \$3, 0\$;" "original" } } +! { dg-final { scan-tree-dump "_gfortran_stop_numeric \$7, 0\$;" "original" } } </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz - Build # 7 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz Culprit: <cut> commit 9ecccaf9771d3f3bb68ef69d34965b1aad874bd6 Merge: 1e28eed17697 12aca1ce9ee3 Author: Rob Clark <robdclark(a)chromium.org> Date: Wed Apr 7 11:04:47 2021 -0700 Merge tag 'drm-msm-fixes-2021-04-02' into msm-next Pull in fixes from previous cycle </cut> Results regressed to (for first_bad == 9ecccaf9771d3f3bb68ef69d34965b1aad874bd6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz artifacts/build-9ecccaf9771d3f3bb68ef69d34965b1aad874bd6/results_id: 1 # 482.sphinx3,[.] OUTLINED_FUNCTION_4 regressed by 175 from (for last_good == 12aca1ce9ee33af3751aec5e55a5900747cbdd4b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz artifacts/build-12aca1ce9ee33af3751aec5e55a5900747cbdd4b/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz/3996 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz/3987 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-9ecccaf9771d3f3bb68ef69d34965b1aad874bd6 cd investigate-linux-9ecccaf9771d3f3bb68ef69d34965b1aad874bd6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 9ecccaf9771d3f3bb68ef69d34965b1aad874bd6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 12aca1ce9ee33af3751aec5e55a5900747cbdd4b ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 9ecccaf9771d3f3bb68ef69d34965b1aad874bd6 Merge: 1e28eed17697 12aca1ce9ee3 Author: Rob Clark <robdclark(a)chromium.org> Date: Wed Apr 7 11:04:47 2021 -0700 Merge tag 'drm-msm-fixes-2021-04-02' into msm-next Pull in fixes from previous cycle drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 +- drivers/gpu/drm/msm/adreno/a5xx_power.c | 2 +- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 2 +- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 108 ++++++++++++++++++++--------- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 12 ++-- drivers/gpu/drm/msm/dp/dp_aux.c | 7 ++ drivers/gpu/drm/msm/dsi/pll/dsi_pll.c | 2 +- drivers/gpu/drm/msm/dsi/pll/dsi_pll.h | 6 +- drivers/gpu/drm/msm/dsi/pll/dsi_pll_7nm.c | 11 +-- drivers/gpu/drm/msm/msm_atomic.c | 7 +- drivers/gpu/drm/msm/msm_drv.c | 13 ++++ drivers/gpu/drm/msm/msm_fence.c | 2 +- drivers/gpu/drm/msm/msm_kms.h | 8 +-- 14 files changed, 128 insertions(+), 60 deletions(-) </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3_LTO - Build # 6 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit cedfa38fc46d7531c44ea230b767d8286767f350 Author: Jin Lin <jinl(a)uber.com> Date: Fri Apr 23 22:37:08 2021 -0700 Preserve the lexical order for global variables during llvm-link merge The order of global variables is generated in the order of recursively materializing variables if the global variable has the attribute of hasLocalLinkage or hasLinkOnceLinkage during the module merging. In practice, it is often the exact reverse of source order. This new order may cause performance regression. The change is to preserve the original lexical order for global variables. Reviewed By: jdoerfert, dexonsmith Differential Revision: https://reviews.llvm.org/D94202 </cut> Results regressed to (for first_bad == cedfa38fc46d7531c44ea230b767d8286767f350) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-cedfa38fc46d7531c44ea230b767d8286767f350/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 103 from (for last_good == 9579af2bd7f39b2118039b66b1a762cf05e7b102) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-9579af2bd7f39b2118039b66b1a762cf05e7b102/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3990 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3980 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-cedfa38fc46d7531c44ea230b767d8286767f350 cd investigate-llvm-cedfa38fc46d7531c44ea230b767d8286767f350 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach cedfa38fc46d7531c44ea230b767d8286767f350 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9579af2bd7f39b2118039b66b1a762cf05e7b102 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit cedfa38fc46d7531c44ea230b767d8286767f350 Author: Jin Lin <jinl(a)uber.com> Date: Fri Apr 23 22:37:08 2021 -0700 Preserve the lexical order for global variables during llvm-link merge The order of global variables is generated in the order of recursively materializing variables if the global variable has the attribute of hasLocalLinkage or hasLinkOnceLinkage during the module merging. In practice, it is often the exact reverse of source order. This new order may cause performance regression. The change is to preserve the original lexical order for global variables. Reviewed By: jdoerfert, dexonsmith Differential Revision: https://reviews.llvm.org/D94202 --- llvm/lib/Linker/IRMover.cpp | 14 ++++++++++++++ llvm/test/Linker/Inputs/globalorder-2.ll | 14 ++++++++++++++ llvm/test/Linker/comdat.ll | 2 +- llvm/test/Linker/comdat14.ll | 2 +- llvm/test/Linker/ctors.ll | 2 +- llvm/test/Linker/ctors2.ll | 2 +- llvm/test/Linker/ctors3.ll | 2 +- llvm/test/Linker/globalorder.ll | 27 +++++++++++++++++++++++++++ llvm/test/Linker/link-flags.ll | 2 +- llvm/test/Linker/metadata-attach.ll | 18 +++++++++--------- llvm/test/Linker/testlink.ll | 2 +- llvm/test/ThinLTO/X86/import-constant.ll | 8 ++++---- llvm/test/ThinLTO/X86/index-const-prop.ll | 6 +++--- llvm/test/ThinLTO/X86/index-const-prop2.ll | 6 +++--- llvm/test/ThinLTO/X86/writeonly.ll | 8 ++++---- llvm/test/ThinLTO/X86/writeonly2.ll | 4 ++-- 16 files changed, 87 insertions(+), 32 deletions(-) diff --git a/llvm/lib/Linker/IRMover.cpp b/llvm/lib/Linker/IRMover.cpp index 1433d074595b..9489f9655469 100644 --- a/llvm/lib/Linker/IRMover.cpp +++ b/llvm/lib/Linker/IRMover.cpp @@ -1509,6 +1509,20 @@ Error IRLinker::run() { }); } + // Reorder the globals just added to the destination module to match their + // original order in the source module. + Module::GlobalListType &Globals = DstM.getGlobalList(); + for (GlobalVariable &GV : SrcM->globals()) { + if (GV.hasAppendingLinkage()) + continue; + Value *NewValue = Mapper.mapValue(GV); + if (NewValue) { + auto *NewGV = dyn_cast<GlobalVariable>(NewValue->stripPointerCasts()); + if (NewGV) + Globals.splice(Globals.end(), Globals, NewGV->getIterator()); + } + } + // Merge the module flags into the DstM module. return linkModuleFlagsMetadata(); } diff --git a/llvm/test/Linker/Inputs/globalorder-2.ll b/llvm/test/Linker/Inputs/globalorder-2.ll new file mode 100644 index 000000000000..a984ebf78a6a --- /dev/null +++ b/llvm/test/Linker/Inputs/globalorder-2.ll @@ -0,0 +1,14 @@ +@var5 = internal global i32 0, align 4 +@var6 = internal global i32 0, align 4 +@var7 = global i32* @var5, align 4 +@var8 = global i32* @var6, align 4 + +define i32 @foo2() { +entry: + %0 = load i32*, i32** @var7, align 4 + %1 = load i32, i32* %0, align 4 + %2 = load i32*, i32** @var8, align 4 + %3 = load i32, i32* %2, align 4 + %add = add nsw i32 %3, %1 + ret i32 %add +} diff --git a/llvm/test/Linker/comdat.ll b/llvm/test/Linker/comdat.ll index 2a2ec3bbb6b2..e3bb1ea39260 100644 --- a/llvm/test/Linker/comdat.ll +++ b/llvm/test/Linker/comdat.ll @@ -23,9 +23,9 @@ $any = comdat any ; CHECK: $foo = comdat largest ; CHECK: $any = comdat any +; CHECK: @foo = global i64 43, comdat{{$}} ; CHECK: @qux = global i64 12, comdat{{$}} ; CHECK: @any = global i64 6, comdat{{$}} -; CHECK: @foo = global i64 43, comdat{{$}} ; CHECK-NOT: @in_unselected_group = global i32 13, comdat $qux ; CHECK: define i32 @baz() comdat($qux) diff --git a/llvm/test/Linker/comdat14.ll b/llvm/test/Linker/comdat14.ll index 9c6eb7c4cc1d..1a1ba47aa017 100644 --- a/llvm/test/Linker/comdat14.ll +++ b/llvm/test/Linker/comdat14.ll @@ -5,5 +5,5 @@ $c = comdat any @v = global i32 0, comdat ($c) ; CHECK: @v = global i32 0, comdat($c) -; CHECK: @v2 = external dllexport global i32 ; CHECK: @v3 = external global i32 +; CHECK: @v2 = external dllexport global i32 diff --git a/llvm/test/Linker/ctors.ll b/llvm/test/Linker/ctors.ll index 37dba23d4c91..f68ca3e54762 100644 --- a/llvm/test/Linker/ctors.ll +++ b/llvm/test/Linker/ctors.ll @@ -6,12 +6,12 @@ ; Test the bitcode writer too. It used to crash. ; RUN: llvm-link %s %p/Inputs/ctors.ll -o %t.bc +; ALL: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 65535, void ()* @f, i8* @v }] @v = weak global i8 0 ; CHECK1: @v = weak global i8 0 ; CHECK2: @v = weak global i8 1 @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 65535, void ()* @f, i8* @v }] -; ALL: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 65535, void ()* @f, i8* @v }] define weak void @f() { ret void diff --git a/llvm/test/Linker/ctors2.ll b/llvm/test/Linker/ctors2.ll index 9b7a70eb7cd1..c02973faf4d0 100644 --- a/llvm/test/Linker/ctors2.ll +++ b/llvm/test/Linker/ctors2.ll @@ -3,5 +3,5 @@ $foo = comdat any @foo = global i8 0, comdat -; CHECK: @foo = global i8 0, comdat ; CHECK: @llvm.global_ctors = appending global [0 x { i32, void ()*, i8* }] zeroinitializer +; CHECK: @foo = global i8 0, comdat diff --git a/llvm/test/Linker/ctors3.ll b/llvm/test/Linker/ctors3.ll index e62b92dca0b4..d522df58e891 100644 --- a/llvm/test/Linker/ctors3.ll +++ b/llvm/test/Linker/ctors3.ll @@ -4,5 +4,5 @@ $foo = comdat any %t = type { i8 } @foo = global %t zeroinitializer, comdat -; CHECK: @foo = global %t zeroinitializer, comdat ; CHECK: @llvm.global_ctors = appending global [0 x { i32, void ()*, i8* }] zeroinitializer +; CHECK: @foo = global %t zeroinitializer, comdat diff --git a/llvm/test/Linker/globalorder.ll b/llvm/test/Linker/globalorder.ll new file mode 100644 index 000000000000..caab97a6ba86 --- /dev/null +++ b/llvm/test/Linker/globalorder.ll @@ -0,0 +1,27 @@ +; Test the order of global variables during llvm-link + +; RUN: llvm-link %s %S/Inputs/globalorder-2.ll -o %t.bc +; RUN: llvm-dis -o - %t.bc | FileCheck %s + +@var1 = internal global i32 0, align 4 +@var2 = internal global i32 0, align 4 +@var3 = global i32* @var1, align 4 +@var4 = global i32* @var2, align 4 + +define i32 @foo() { +entry: + %0 = load i32*, i32** @var3, align 4 + %1 = load i32, i32* %0, align 4 + %2 = load i32*, i32** @var4, align 4 + %3 = load i32, i32* %2, align 4 + %add = add nsw i32 %3, %1 + ret i32 %add +} +; CHECK: @var1 = +; CHECK-NEXT: @var2 = +; CHECK-NEXT: @var3 = +; CHECK-NEXT: @var4 = +; CHECK-NEXT: @var5 = +; CHECK-NEXT: @var6 = +; CHECK-NEXT: @var7 = +; CHECK-NEXT: @var8 = diff --git a/llvm/test/Linker/link-flags.ll b/llvm/test/Linker/link-flags.ll index 1a57e8aa4d28..4e8eaa62dd78 100644 --- a/llvm/test/Linker/link-flags.ll +++ b/llvm/test/Linker/link-flags.ll @@ -9,8 +9,8 @@ CI-LABEL: @X = internal global i32 5 CU-LABEL:@U = global i32 6 CI-LABEL:@U = internal global i32 6 CN-NOT:@U -DI-LABEL: @Y = global i8 42 DI-LABEL: @llvm.used = appending global [2 x i8*] [i8* @Y, i8* bitcast (i64 ()* @foo to i8*)], section "llvm.metadata" +DI-LABEL: @Y = global i8 42 B-LABEL: define void @bar() { diff --git a/llvm/test/Linker/metadata-attach.ll b/llvm/test/Linker/metadata-attach.ll index 368c72a9c5f8..6b6c8dd9ed35 100644 --- a/llvm/test/Linker/metadata-attach.ll +++ b/llvm/test/Linker/metadata-attach.ll @@ -6,17 +6,17 @@ ; CHECK-LINKED1: @g1 = global i32 0, !attach !0{{$}} @g1 = global i32 0, !attach !0 -; CHECK: @g3 = weak global i32 1, !attach !0{{$}} ; CHECK: @g2 = external global i32, !attach !0{{$}} +; CHECK: @g3 = weak global i32 1, !attach !0{{$}} ; CHECK-LINKED1: @g2 = global i32 1, !attach !1{{$}} @g2 = external global i32, !attach !0 ; CHECK-LINKED1: @g3 = global i32 2, !attach !1{{$}} @g3 = weak global i32 1, !attach !0 -; CHECK-LINKED2: @g2 = global i32 1, !attach !0{{$}} -; CHECK-LINKED2: @g3 = global i32 2, !attach !0{{$}} -; CHECK-LINKED2: @g1 = global i32 0, !attach !1{{$}} +; CHECK-LINKED2: @g1 = global i32 0, !attach !0{{$}} +; CHECK-LINKED2: @g2 = global i32 1, !attach !1{{$}} +; CHECK-LINKED2: @g3 = global i32 2, !attach !1{{$}} ; CHECK: define void @f1() !attach !0 { ; CHECK-LINKED1: define void @f1() !attach !0 { @@ -36,14 +36,14 @@ define weak void @f3() !attach !0 { ret void } -; CHECK-LINKED2: define void @f2() !attach !0 { -; CHECK-LINKED2: define void @f3() !attach !0 { -; CHECK-LINKED2: define void @f1() !attach !1 { +; CHECK-LINKED2: define void @f2() !attach !1 { +; CHECK-LINKED2: define void @f3() !attach !1 { +; CHECK-LINKED2: define void @f1() !attach !0 { ; CHECK-LINKED1: !0 = !{i32 0} ; CHECK-LINKED1: !1 = !{i32 1} -; CHECK-LINKED2: !0 = !{i32 1} -; CHECK-LINKED2: !1 = !{i32 0} +; CHECK-LINKED2: !0 = !{i32 0} +; CHECK-LINKED2: !1 = !{i32 1} !0 = !{i32 0} diff --git a/llvm/test/Linker/testlink.ll b/llvm/test/Linker/testlink.ll index 6a316a3bf846..69870b50413c 100644 --- a/llvm/test/Linker/testlink.ll +++ b/llvm/test/Linker/testlink.ll @@ -1,7 +1,7 @@ ; RUN: llvm-link %s %S/Inputs/testlink.ll -S | FileCheck %s -; CHECK: %Ty2 = type { %Ty1* } ; CHECK: %Ty1 = type { %Ty2* } +; CHECK: %Ty2 = type { %Ty1* } %Ty1 = type opaque %Ty2 = type { %Ty1* } diff --git a/llvm/test/ThinLTO/X86/import-constant.ll b/llvm/test/ThinLTO/X86/import-constant.ll index 1bc2a1c2f7a4..e5254948f591 100644 --- a/llvm/test/ThinLTO/X86/import-constant.ll +++ b/llvm/test/ThinLTO/X86/import-constant.ll @@ -28,9 +28,9 @@ ; PROMOTE: @_ZL3Obj.llvm.{{.*}} = hidden constant %struct.S { i32 4, i32 8, i32* @val } ; @outer is a write-only variable, so it's been converted to zeroinitializer. -; IMPORT: @outer = internal local_unnamed_addr global %struct.Q zeroinitializer +; IMPORT: @val = available_externally global i32 42 ; IMPORT-NEXT: @_ZL3Obj.llvm.{{.*}} = available_externally hidden constant %struct.S { i32 4, i32 8, i32* @val } -; IMPORT-NEXT: @val = available_externally global i32 42 +; IMPORT-NEXT: @outer = internal local_unnamed_addr global %struct.Q zeroinitializer ; OPT: @outer = internal unnamed_addr global %struct.Q zeroinitializer @@ -39,8 +39,8 @@ ; OPT-NEXT: store %struct.S* null, %struct.S** getelementptr inbounds (%struct.Q, %struct.Q* @outer, i64 0, i32 0) ; OPT-NEXT: ret i32 12 -; NOREFS: @outer = internal local_unnamed_addr global %struct.Q zeroinitializer -; NOREFS-NEXT: @_ZL3Obj.llvm.{{.*}} = external hidden constant %struct.S +; NOREFS: @_ZL3Obj.llvm.{{.*}} = external hidden constant %struct.S +; NOREFS-NEXT: @outer = internal local_unnamed_addr global %struct.Q zeroinitializer target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" diff --git a/llvm/test/ThinLTO/X86/index-const-prop.ll b/llvm/test/ThinLTO/X86/index-const-prop.ll index 9718aec8a944..b8028d65fa58 100644 --- a/llvm/test/ThinLTO/X86/index-const-prop.ll +++ b/llvm/test/ThinLTO/X86/index-const-prop.ll @@ -19,14 +19,14 @@ ; RUN: llvm-lto -thinlto-action=import -exported-symbol main -exported-symbol gBar %t1.bc -thinlto-index=%t3.index.bc -o %t1.imported2.bc ; RUN: llvm-dis %t1.imported2.bc -o - | FileCheck %s --check-prefix=IMPORT2 -; IMPORT: @gFoo.llvm.0 = internal unnamed_addr global i32 1, align 4, !dbg !0 -; IMPORT-NEXT: @gBar = internal local_unnamed_addr global i32 2, align 4, !dbg !5 +; IMPORT: @gBar = internal local_unnamed_addr global i32 2, align 4, !dbg !0 +; IMPORT-NEXT: @gFoo.llvm.0 = internal unnamed_addr global i32 1, align 4, !dbg !5 ; IMPORT: !DICompileUnit({{.*}}) ; OPTIMIZE: define i32 @main ; OPTIMIZE-NEXT: ret i32 3 -; IMPORT2: @gBar = available_externally local_unnamed_addr global i32 2, align 4, !dbg !5 +; IMPORT2: @gBar = available_externally local_unnamed_addr global i32 2, align 4, !dbg !0 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" diff --git a/llvm/test/ThinLTO/X86/index-const-prop2.ll b/llvm/test/ThinLTO/X86/index-const-prop2.ll index 5bf40fc688d5..68fe058a8637 100644 --- a/llvm/test/ThinLTO/X86/index-const-prop2.ll +++ b/llvm/test/ThinLTO/X86/index-const-prop2.ll @@ -57,13 +57,13 @@ ; with corresponsing stores ; RUN: llvm-dis %t5.2.5.precodegen.bc -o - | FileCheck %s --check-prefix=CODEGEN2-SRC -; IMPORT: @gFoo.llvm.0 = internal unnamed_addr global i32 1, align 4 -; IMPORT-NEXT: @gBar = internal local_unnamed_addr global i32 2, align 4 +; IMPORT: @gBar = internal local_unnamed_addr global i32 2, align 4 +; IMPORT-NEXT: @gFoo.llvm.0 = internal unnamed_addr global i32 1, align 4 ; IMPORT: !DICompileUnit({{.*}}) ; Write only variables are imported with a zero initializer. -; IMPORT-WRITEONLY: @gFoo.llvm.0 = internal unnamed_addr global i32 0 ; IMPORT-WRITEONLY: @gBar = internal local_unnamed_addr global i32 0 +; IMPORT-WRITEONLY: @gFoo.llvm.0 = internal unnamed_addr global i32 0 ; CODEGEN: i32 @main() ; CODEGEN-NEXT: ret i32 3 diff --git a/llvm/test/ThinLTO/X86/writeonly.ll b/llvm/test/ThinLTO/X86/writeonly.ll index 7616f192fbfc..65c93a79afa9 100644 --- a/llvm/test/ThinLTO/X86/writeonly.ll +++ b/llvm/test/ThinLTO/X86/writeonly.ll @@ -11,8 +11,8 @@ ; RUN: llvm-dis %t1.imported.bc -o - | FileCheck %s --check-prefix=IMPORT ; RUN: llvm-lto -thinlto-action=optimize %t1.imported.bc -o - | llvm-dis - -o - | FileCheck %s --check-prefix=OPTIMIZE -; IMPORT: @gFoo.llvm.0 = internal unnamed_addr global i32 0, align 4, !dbg !0 -; IMPORT-NEXT: @gBar = internal local_unnamed_addr global i32 0, align 4, !dbg !5 +; IMPORT: @gBar = internal local_unnamed_addr global i32 0, align 4, !dbg !0 +; IMPORT-NEXT: @gFoo.llvm.0 = internal unnamed_addr global i32 0, align 4, !dbg !5 ; IMPORT: !DICompileUnit({{.*}}) ; STATS: 2 module-summary-index - Number of live global variables marked write only @@ -29,8 +29,8 @@ ; RUN: llvm-lto -propagate-attrs=false -thinlto-action=import -exported-symbol=main %t1.bc -thinlto-index=%t3.index.bc -o %t1.imported.bc -stats 2>&1 | FileCheck %s --check-prefix=STATS-NOPROP ; RUN: llvm-dis %t1.imported.bc -o - | FileCheck %s --check-prefix=IMPORT-NOPROP ; STATS-NOPROP-NOT: Number of live global variables marked write only -; IMPORT-NOPROP: @gFoo.llvm.0 = available_externally -; IMPORT-NOPROP-NEXT: @gBar = available_externally +; IMPORT-NOPROP: @gBar = available_externally +; IMPORT-NOPROP-NEXT: @gFoo.llvm.0 = available_externally target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" diff --git a/llvm/test/ThinLTO/X86/writeonly2.ll b/llvm/test/ThinLTO/X86/writeonly2.ll index 2648727f0997..f31f9b35a253 100644 --- a/llvm/test/ThinLTO/X86/writeonly2.ll +++ b/llvm/test/ThinLTO/X86/writeonly2.ll @@ -19,8 +19,8 @@ ; with corresponsing stores ; RUN: llvm-dis %t3.2.5.precodegen.bc -o - | FileCheck %s --check-prefix=CODEGEN-SRC -; IMPORT: @gFoo.llvm.0 = internal unnamed_addr global i32 0, align 4 -; IMPORT-NEXT: @gBar = internal local_unnamed_addr global i32 0, align 4 +; IMPORT: @gBar = internal local_unnamed_addr global i32 0, align 4 +; IMPORT-NEXT: @gFoo.llvm.0 = internal unnamed_addr global i32 0, align 4 ; IMPORT: !DICompileUnit({{.*}}) ; CODEGEN-NOT: gFoo </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/gnu-release-aarch64-spec2k6-O2 - Build # 28 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2 Culprit: <cut> commit ea8dd3b6cea2a4d4dab7e2997b88a170f8093ce6 Author: Jan Hubicka <jh(a)suse.cz> Date: Tue Nov 19 19:56:26 2019 +0100 Avoid redundant computations in edge_badness. * ipa-inline.c (inlining_speedup): New function. (edge_badness): Use it. From-SVN: r278459 </cut> Results regressed to (for first_bad == ea8dd3b6cea2a4d4dab7e2997b88a170f8093ce6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O2 artifacts/build-ea8dd3b6cea2a4d4dab7e2997b88a170f8093ce6/results_id: 1 # 447.dealII,[.] _ZNK12SparseMatrixIdE5vmultI6VectorIdES3_EEvRT regressed by 116 from (for last_good == 4aa5fd8aca1140adf0917dc53397efddc7fd4c11) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O2 artifacts/build-4aa5fd8aca1140adf0917dc53397efddc7fd4c11/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/3988 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/3979 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-ea8dd3b6cea2a4d4dab7e2997b88a170f8093ce6 cd investigate-gcc-ea8dd3b6cea2a4d4dab7e2997b88a170f8093ce6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach ea8dd3b6cea2a4d4dab7e2997b88a170f8093ce6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 4aa5fd8aca1140adf0917dc53397efddc7fd4c11 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Full commit (up to 1000 lines): <cut> commit ea8dd3b6cea2a4d4dab7e2997b88a170f8093ce6 Author: Jan Hubicka <jh(a)suse.cz> Date: Tue Nov 19 19:56:26 2019 +0100 Avoid redundant computations in edge_badness. * ipa-inline.c (inlining_speedup): New function. (edge_badness): Use it. From-SVN: r278459 --- gcc/ChangeLog | 5 +++++ gcc/ipa-inline.c | 37 ++++++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 7 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3516235f606..710f8dab674 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2019-11-18 Jan Hubicka <jh(a)suse.cz> + + * ipa-inline.c (inlining_speedup): New function. + (edge_badness): Use it. + 2019-11-19 Zoran Jovanovic <zoran.jovanovic(a)mips.com> Dragan Mladjenovic <dmladjenovic(a)wavecomp.com> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index 1f77ba25ce0..becea8a3e8e 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -768,6 +768,33 @@ compute_inlined_call_time (struct cgraph_edge *edge, return time; } +/* Determine time saved by inlininig EDGE of frequency FREQ + where callee's runtime w/o inlineing is UNINLINED_TYPE + and with inlined is INLINED_TYPE. */ + +inline sreal +inlining_speedup (struct cgraph_edge *edge, + sreal freq, + sreal uninlined_time, + sreal inlined_time) +{ + sreal speedup = uninlined_time - inlined_time; + /* Handling of call_time should match one in ipa-inline-fnsummary.c + (estimate_edge_size_and_time). */ + sreal call_time = ipa_call_summaries->get (edge)->call_stmt_time; + + if (freq > 0) + { + speedup = (speedup + call_time); + if (freq != 1) + speedup = speedup * freq; + } + else if (freq == 0) + speedup = speedup >> 11; + gcc_checking_assert (speedup >= 0); + return speedup; +} + /* Return true if the speedup for inlining E is bigger than PARAM_MAX_INLINE_MIN_SPEEDUP. */ @@ -1149,10 +1176,8 @@ edge_badness (struct cgraph_edge *edge, bool dump) sreal numerator, denominator; int overall_growth; sreal freq = edge->sreal_frequency (); - sreal inlined_time = compute_inlined_call_time (edge, edge_time, freq); - numerator = (compute_uninlined_call_time (edge, unspec_edge_time, freq) - - inlined_time); + numerator = inlining_speedup (edge, freq, unspec_edge_time, edge_time); if (numerator <= 0) numerator = ((sreal) 1 >> 8); if (caller->count.ipa ().nonzero_p ()) @@ -1235,16 +1260,14 @@ edge_badness (struct cgraph_edge *edge, bool dump) fprintf (dump_file, " %f: guessed profile. frequency %f, count %" PRId64 " caller count %" PRId64 - " time w/o inlining %f, time with inlining %f" + " time saved %f" " overall growth %i (current) %i (original)" " %i (compensated)\n", badness.to_double (), freq.to_double (), edge->count.ipa ().initialized_p () ? edge->count.ipa ().to_gcov_type () : -1, caller->count.ipa ().initialized_p () ? caller->count.ipa ().to_gcov_type () : -1, - compute_uninlined_call_time (edge, - unspec_edge_time, freq).to_double (), - inlined_time.to_double (), + inlining_speedup (edge, freq, unspec_edge_time, edge_time).to_double (), estimate_growth (callee), callee_info->growth, overall_growth); } </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e Author: peter klausler <pklausler(a)nvidia.com> Date: Tue Feb 2 10:51:14 2021 -0800 [flang] Add -fsyntax-only to f18; retain -fparse-only synonym Now that semantics is working, the standard -fsyntax-only option of GNU and Clang should be used as the name of the option that causes f18 to just run the front-end. Support both options in f18, silently accepting the old option as a synonym for the new one (as preferred by the code owner), and replace all instances of the old -fparse-only option with -fsyntax-only throughout the source base. Differential Revision: https://reviews.llvm.org/D95887 </cut> Results regressed to (for first_bad == 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e/results_id: 1 # 445.gobmk,gobmk_base.default regressed by 103 from (for last_good == 81b69879c946533c71cc484bd8d9202bf1e34bfe) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-81b69879c946533c71cc484bd8d9202bf1e34bfe/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/3985 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/3981 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e cd investigate-llvm-34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e ../artifacts/test.sh # Reproduce last_good build git checkout --detach 81b69879c946533c71cc484bd8d9202bf1e34bfe ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e Author: peter klausler <pklausler(a)nvidia.com> Date: Tue Feb 2 10:51:14 2021 -0800 [flang] Add -fsyntax-only to f18; retain -fparse-only synonym Now that semantics is working, the standard -fsyntax-only option of GNU and Clang should be used as the name of the option that causes f18 to just run the front-end. Support both options in f18, silently accepting the old option as a synonym for the new one (as preferred by the code owner), and replace all instances of the old -fparse-only option with -fsyntax-only throughout the source base. Differential Revision: https://reviews.llvm.org/D95887 --- flang/docs/ImplementingASemanticCheck.md | 2 +- flang/docs/Overview.md | 4 ++-- flang/test/Evaluate/test_folding.sh | 2 +- flang/test/Flang-Driver/parse-error.f95 | 2 +- flang/test/Flang-Driver/syntax-only.f90 | 2 +- flang/test/Frontend/prescanner-diag.f90 | 2 +- flang/test/Lower/pre-fir-tree01.f90 | 2 +- flang/test/Lower/pre-fir-tree02.f90 | 2 +- flang/test/Lower/pre-fir-tree03.f90 | 2 +- flang/test/Lower/pre-fir-tree04.f90 | 2 +- flang/test/Lower/pre-fir-tree05.f90 | 2 +- flang/test/Semantics/call17.f90 | 2 +- flang/test/Semantics/data05.f90 | 2 +- flang/test/Semantics/data08.f90 | 2 +- flang/test/Semantics/data09.f90 | 2 +- flang/test/Semantics/empty.f90 | 4 ++-- flang/test/Semantics/final02.f90 | 2 +- flang/test/Semantics/getdefinition01.f90 | 8 ++++---- flang/test/Semantics/getdefinition02.f | 6 +++--- flang/test/Semantics/getdefinition03-a.f90 | 4 ++-- flang/test/Semantics/getdefinition04.f90 | 2 +- flang/test/Semantics/getdefinition05.f90 | 4 ++-- flang/test/Semantics/getsymbols01.f90 | 2 +- flang/test/Semantics/getsymbols02.f90 | 6 +++--- flang/test/Semantics/getsymbols03-a.f90 | 2 +- flang/test/Semantics/getsymbols04.f90 | 2 +- flang/test/Semantics/getsymbols05.f90 | 2 +- flang/test/Semantics/missing_newline.f90 | 4 ++-- flang/test/Semantics/mod-file-rewriter.f90 | 8 ++++---- flang/test/Semantics/modifiable01.f90 | 2 +- flang/test/Semantics/offsets01.f90 | 2 +- flang/test/Semantics/offsets02.f90 | 2 +- flang/test/Semantics/offsets03.f90 | 2 +- flang/test/Semantics/oldparam01.f90 | 2 +- flang/test/Semantics/oldparam02.f90 | 2 +- flang/test/Semantics/oldparam03.f90 | 2 +- flang/test/Semantics/resolve100.f90 | 2 +- flang/test/Semantics/rewrite01.f90 | 2 +- flang/test/Semantics/test_errors.sh | 2 +- flang/test/Semantics/test_modfile.sh | 2 +- flang/test/Semantics/typeinfo01.f90 | 2 +- flang/tools/f18-parse-demo/f18-parse-demo.cpp | 10 +++++----- flang/tools/f18/CMakeLists.txt | 2 +- flang/tools/f18/f18.cpp | 10 +++++----- 44 files changed, 67 insertions(+), 67 deletions(-) diff --git a/flang/docs/ImplementingASemanticCheck.md b/flang/docs/ImplementingASemanticCheck.md index 35b107e4988e..4a6fe133f21f 100644 --- a/flang/docs/ImplementingASemanticCheck.md +++ b/flang/docs/ImplementingASemanticCheck.md @@ -67,7 +67,7 @@ of the call to `intentOutFunc()`: I also used this program to produce a parse tree for the program using the command: ```bash - f18 -fdebug-dump-parse-tree -fparse-only testfun.f90 + f18 -fdebug-dump-parse-tree -fsyntax-only testfun.f90 ``` Here's the relevant fragment of the parse tree produced by the compiler: diff --git a/flang/docs/Overview.md b/flang/docs/Overview.md index 987858943845..4ef04f865083 100644 --- a/flang/docs/Overview.md +++ b/flang/docs/Overview.md @@ -46,7 +46,7 @@ See: [Preprocessing.md](Preprocessing.md). **Entry point:** `parser::Parsing::Parse` **Command:** - - `f18 -fdebug-dump-parse-tree -fparse-only src.f90` dumps the parse tree + - `f18 -fdebug-dump-parse-tree -fsyntax-only src.f90` dumps the parse tree - `f18 -funparse src.f90` converts the parse tree to normalized Fortran ## Validate Labels and Canonicalize Do Statements @@ -74,7 +74,7 @@ See: [Preprocessing.md](Preprocessing.md). **Entry points:** `semantics::ResolveNames`, `semantics::RewriteParseTree` -**Command:** `f18 -fdebug-dump-symbols -fparse-only src.f90` dumps the +**Command:** `f18 -fdebug-dump-symbols -fsyntax-only src.f90` dumps the tree of scopes and symbols in each scope ## Check DO CONCURRENT Constraints diff --git a/flang/test/Evaluate/test_folding.sh b/flang/test/Evaluate/test_folding.sh index 81ecbea55274..951ef36ecd2c 100755 --- a/flang/test/Evaluate/test_folding.sh +++ b/flang/test/Evaluate/test_folding.sh @@ -32,7 +32,7 @@ temp=$1 mkdir -p $temp shift -CMD="$* -fdebug-dump-symbols -fparse-only" +CMD="$* -fdebug-dump-symbols -fsyntax-only" # Check if tests should assume folding is using libpgmath if [[ $LIBPGMATH ]]; then diff --git a/flang/test/Flang-Driver/parse-error.f95 b/flang/test/Flang-Driver/parse-error.f95 index 84a63665659d..8266d042df02 100644 --- a/flang/test/Flang-Driver/parse-error.f95 +++ b/flang/test/Flang-Driver/parse-error.f95 @@ -1,5 +1,5 @@ ! RUN: not %flang-new -fc1 -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix=ERROR -! RUN: not %f18 -parse-only %s 2>&1 | FileCheck %s --check-prefix=ERROR +! RUN: not %f18 -syntax-only %s 2>&1 | FileCheck %s --check-prefix=ERROR ! REQUIRES: new-flang-driver diff --git a/flang/test/Flang-Driver/syntax-only.f90 b/flang/test/Flang-Driver/syntax-only.f90 index f04dd713aeab..0eed3cca2fe9 100644 --- a/flang/test/Flang-Driver/syntax-only.f90 +++ b/flang/test/Flang-Driver/syntax-only.f90 @@ -1,5 +1,5 @@ ! RUN: not %flang-new -fc1 -fsyntax-only %s 2>&1 | FileCheck %s -! RUN: not %f18 -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -fsyntax-only %s 2>&1 | FileCheck %s ! REQUIRES: new-flang-driver diff --git a/flang/test/Frontend/prescanner-diag.f90 b/flang/test/Frontend/prescanner-diag.f90 index 4c7e6e369beb..26536092b41b 100644 --- a/flang/test/Frontend/prescanner-diag.f90 +++ b/flang/test/Frontend/prescanner-diag.f90 @@ -6,7 +6,7 @@ ! RUN: %flang-new -fc1 -E -I %S/Inputs/ %s 2>&1 | FileCheck %s ! Test with -fsyntax-only (i.e. ParseSyntaxOnlyAction, stops after semantic checks) -! RUN: %f18 -fparse-only -I %S/Inputs/ %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only -I %S/Inputs/ %s 2>&1 | FileCheck %s ! RUN: %flang-new -fsyntax-only -I %S/Inputs/ %s 2>&1 | FileCheck %s ! RUN: %flang-new -fc1 -fsyntax-only -I %S/Inputs/ %s 2>&1 | FileCheck %s diff --git a/flang/test/Lower/pre-fir-tree01.f90 b/flang/test/Lower/pre-fir-tree01.f90 index 6b27add4659f..5e59ff784f97 100644 --- a/flang/test/Lower/pre-fir-tree01.f90 +++ b/flang/test/Lower/pre-fir-tree01.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only %s | FileCheck %s ! Test structure of the Pre-FIR tree diff --git a/flang/test/Lower/pre-fir-tree02.f90 b/flang/test/Lower/pre-fir-tree02.f90 index 2d50a9394985..1db49605d98d 100644 --- a/flang/test/Lower/pre-fir-tree02.f90 +++ b/flang/test/Lower/pre-fir-tree02.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only %s | FileCheck %s ! Test Pre-FIR Tree captures all the intended nodes from the parse-tree ! Coarray and OpenMP related nodes are tested in other files. diff --git a/flang/test/Lower/pre-fir-tree03.f90 b/flang/test/Lower/pre-fir-tree03.f90 index 1c8651b64f83..efc923a3fe84 100644 --- a/flang/test/Lower/pre-fir-tree03.f90 +++ b/flang/test/Lower/pre-fir-tree03.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only -fopenmp %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only -fopenmp %s | FileCheck %s ! Test Pre-FIR Tree captures OpenMP related constructs diff --git a/flang/test/Lower/pre-fir-tree04.f90 b/flang/test/Lower/pre-fir-tree04.f90 index 34212fbb1ff0..3f2beaf5fb47 100644 --- a/flang/test/Lower/pre-fir-tree04.f90 +++ b/flang/test/Lower/pre-fir-tree04.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only %s | FileCheck %s ! Test Pre-FIR Tree captures all the coarray related statements diff --git a/flang/test/Lower/pre-fir-tree05.f90 b/flang/test/Lower/pre-fir-tree05.f90 index 98af5c2de944..3acc38b20d35 100644 --- a/flang/test/Lower/pre-fir-tree05.f90 +++ b/flang/test/Lower/pre-fir-tree05.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only -fopenacc %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only -fopenacc %s | FileCheck %s ! Test structure of the Pre-FIR tree with OpenACC construct diff --git a/flang/test/Semantics/call17.f90 b/flang/test/Semantics/call17.f90 index 1f4d2c4d9186..ace626037dd8 100644 --- a/flang/test/Semantics/call17.f90 +++ b/flang/test/Semantics/call17.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only $s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only $s 2>&1 | FileCheck %s ! Regression test: don't emit a bogus error about an invalid specification expression ! in the declaration of a binding diff --git a/flang/test/Semantics/data05.f90 b/flang/test/Semantics/data05.f90 index a138b067942e..f3c7aa9be6e3 100644 --- a/flang/test/Semantics/data05.f90 +++ b/flang/test/Semantics/data05.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s module m interface integer function ifunc(n) diff --git a/flang/test/Semantics/data08.f90 b/flang/test/Semantics/data08.f90 index 86a87f90163f..4040ac8e148d 100644 --- a/flang/test/Semantics/data08.f90 +++ b/flang/test/Semantics/data08.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-dump-symbols -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK: DATA statement value initializes 'jx' of type 'INTEGER(4)' with CHARACTER ! CHECK: DATA statement value initializes 'jy' of type 'INTEGER(4)' with CHARACTER ! CHECK: DATA statement value initializes 'jz' of type 'INTEGER(4)' with CHARACTER diff --git a/flang/test/Semantics/data09.f90 b/flang/test/Semantics/data09.f90 index 1550787b0d4c..4510b830ef7f 100644 --- a/flang/test/Semantics/data09.f90 +++ b/flang/test/Semantics/data09.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only -fdebug-dump-symbols %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only -fdebug-dump-symbols %s 2>&1 | FileCheck %s ! CHECK: init:[INTEGER(4)::1065353216_4,1073741824_4,1077936128_4,1082130432_4] ! Verify that the closure of EQUIVALENCE'd symbols with any DATA ! initialization produces a combined initializer. diff --git a/flang/test/Semantics/empty.f90 b/flang/test/Semantics/empty.f90 index e47c2e65342c..ff8f64258741 100644 --- a/flang/test/Semantics/empty.f90 +++ b/flang/test/Semantics/empty.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only %s +! RUN: %f18 -fsyntax-only %s ! RUN: rm -rf %t && mkdir %t ! RUN: touch %t/empty.f90 -! RUN: %f18 -fparse-only %t/empty.f90 +! RUN: %f18 -fsyntax-only %t/empty.f90 diff --git a/flang/test/Semantics/final02.f90 b/flang/test/Semantics/final02.f90 index b58f91f3a228..f613a4228481 100644 --- a/flang/test/Semantics/final02.f90 +++ b/flang/test/Semantics/final02.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fparse-only %s 2>&1 | FileCheck %s +!RUN: %f18 -fsyntax-only %s 2>&1 | FileCheck %s module m type :: t1 integer :: n diff --git a/flang/test/Semantics/getdefinition01.f90 b/flang/test/Semantics/getdefinition01.f90 index 57141b671754..06f2cc0f8dda 100644 --- a/flang/test/Semantics/getdefinition01.f90 +++ b/flang/test/Semantics/getdefinition01.f90 @@ -16,12 +16,12 @@ contains end module ! RUN and CHECK lines at the bottom as this test is sensitive to line numbers -! RUN: %f18 -fget-definition 6 17 18 -fparse-only %s | FileCheck --check-prefix=CHECK1 %s -! RUN: %f18 -fget-definition 7 20 23 -fparse-only %s | FileCheck --check-prefix=CHECK2 %s -! RUN: %f18 -fget-definition 14 3 4 -fparse-only %s | FileCheck --check-prefix=CHECK3 %s +! RUN: %f18 -fget-definition 6 17 18 -fsyntax-only %s | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 7 20 23 -fsyntax-only %s | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 14 3 4 -fsyntax-only %s | FileCheck --check-prefix=CHECK3 %s ! CHECK1: x:{{.*}}getdefinition01.f90, 5, 21-22 ! CHECK2: yyy:{{.*}}getdefinition01.f90, 5, 24-27 ! CHECK3: x:{{.*}}getdefinition01.f90, 13, 24-25 -! RUN: not %f18 -fget-definition -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s +! RUN: not %f18 -fget-definition -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s ! CHECK-ERROR: Invalid argument to -fget-definitions diff --git a/flang/test/Semantics/getdefinition02.f b/flang/test/Semantics/getdefinition02.f index ee7a96750da5..b32536968dc0 100644 --- a/flang/test/Semantics/getdefinition02.f +++ b/flang/test/Semantics/getdefinition02.f @@ -17,9 +17,9 @@ end module ! RUN and CHECK lines here as test is sensitive to line numbers -! RUN: %f18 -fget-definition 7 9 10 -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK1 %s -! RUN: %f18 -fget-definition 8 26 29 -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK2 %s -! RUN: %f18 -fget-definition 15 9 10 -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK3 %s +! RUN: %f18 -fget-definition 7 9 10 -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 8 26 29 -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 15 9 10 -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK3 %s ! CHECK1: x:{{.*}}getdefinition02.f, 5, 27-28 ! CHECK2: yyy:{{.*}}getdefinition02.f, 5, 30-33 ! CHECK3: x:{{.*}}getdefinition02.f, 14, 30-31 diff --git a/flang/test/Semantics/getdefinition03-a.f90 b/flang/test/Semantics/getdefinition03-a.f90 index ecf8c9b48389..6e61637b3546 100644 --- a/flang/test/Semantics/getdefinition03-a.f90 +++ b/flang/test/Semantics/getdefinition03-a.f90 @@ -7,7 +7,7 @@ program main x = f end program -! RUN: %f18 -fget-definition 7 6 7 -fparse-only %s | FileCheck --check-prefix=CHECK1 %s -! RUN: %f18 -fget-definition 7 2 3 -fparse-only %s | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 7 6 7 -fsyntax-only %s | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 7 2 3 -fsyntax-only %s | FileCheck --check-prefix=CHECK2 %s ! CHECK1: f:{{.*}}getdefinition03-b.f90, 2, 12-13 ! CHECK2: x:{{.*}}getdefinition03-a.f90, 6, 13-14 diff --git a/flang/test/Semantics/getdefinition04.f90 b/flang/test/Semantics/getdefinition04.f90 index 72429599595b..bc01f7979b42 100644 --- a/flang/test/Semantics/getdefinition04.f90 +++ b/flang/test/Semantics/getdefinition04.f90 @@ -6,5 +6,5 @@ program main x = y end program -! RUN: %f18 -fget-definition 6 3 4 -fparse-only %s | FileCheck %s +! RUN: %f18 -fget-definition 6 3 4 -fsyntax-only %s | FileCheck %s ! CHECK: x:{{.*}}getdefinition04.f90, 3, 14-15 diff --git a/flang/test/Semantics/getdefinition05.f90 b/flang/test/Semantics/getdefinition05.f90 index 315e9570d352..91952bb7fcc3 100644 --- a/flang/test/Semantics/getdefinition05.f90 +++ b/flang/test/Semantics/getdefinition05.f90 @@ -12,8 +12,8 @@ program main end program !! Inner x -! RUN: %f18 -fget-definition 9 5 6 -fparse-only %s | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 9 5 6 -fsyntax-only %s | FileCheck --check-prefix=CHECK1 %s ! CHECK1: x:{{.*}}getdefinition05.f90, 7, 16-17 !! Outer y -! RUN: %f18 -fget-definition 11 7 8 -fparse-only %s | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 11 7 8 -fsyntax-only %s | FileCheck --check-prefix=CHECK2 %s ! CHECK2: y:{{.*}}getdefinition05.f90, 5, 14-15 diff --git a/flang/test/Semantics/getsymbols01.f90 b/flang/test/Semantics/getsymbols01.f90 index bdb7bf053823..d26aa774ace4 100644 --- a/flang/test/Semantics/getsymbols01.f90 +++ b/flang/test/Semantics/getsymbols01.f90 @@ -15,7 +15,7 @@ contains end function end module -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK-COUNT-1:f:{{.*}}getsymbols01.f90, 12, 26-27 ! CHECK-COUNT-1:mm1:{{.*}}getsymbols01.f90, 2, 8-11 ! CHECK-COUNT-1:s:{{.*}}getsymbols01.f90, 5, 18-19 diff --git a/flang/test/Semantics/getsymbols02.f90 b/flang/test/Semantics/getsymbols02.f90 index 0119ab16daa8..1667548f81c3 100644 --- a/flang/test/Semantics/getsymbols02.f90 +++ b/flang/test/Semantics/getsymbols02.f90 @@ -7,8 +7,8 @@ PROGRAM helloworld i = callget5() ENDPROGRAM -! RUN: %f18 -fparse-only %S/Inputs/getsymbols02-a.f90 -! RUN: %f18 -fparse-only %S/Inputs/getsymbols02-b.f90 -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only %S/Inputs/getsymbols02-a.f90 +! RUN: %f18 -fsyntax-only %S/Inputs/getsymbols02-b.f90 +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK: callget5: .{{[/\\]}}mm2b.mod, ! CHECK: get5: .{{[/\\]}}mm2a.mod, diff --git a/flang/test/Semantics/getsymbols03-a.f90 b/flang/test/Semantics/getsymbols03-a.f90 index 3cbba425d875..fddf513bcc51 100644 --- a/flang/test/Semantics/getsymbols03-a.f90 +++ b/flang/test/Semantics/getsymbols03-a.f90 @@ -7,7 +7,7 @@ program main x = f end program -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK:f:{{.*}}getsymbols03-b.f90, 2, 12-13 ! CHECK:main:{{.*}}getsymbols03-a.f90, 4, 9-13 ! CHECK:mm3:{{.*}}getsymbols03-a.f90, 5, 6-9 diff --git a/flang/test/Semantics/getsymbols04.f90 b/flang/test/Semantics/getsymbols04.f90 index fc9b177abd90..ac8f2d0a7e44 100644 --- a/flang/test/Semantics/getsymbols04.f90 +++ b/flang/test/Semantics/getsymbols04.f90 @@ -6,7 +6,7 @@ program main x = y end program -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK:x:{{.*}}getsymbols04.f90, 3, 14-15 ! CHECK:x:{{.*}}getsymbols04.f90, 5, 11-12 ! CHECK:y:{{.*}}getsymbols04.f90, 4, 14-15 diff --git a/flang/test/Semantics/getsymbols05.f90 b/flang/test/Semantics/getsymbols05.f90 index 624f37a74b76..6b07678e42d0 100644 --- a/flang/test/Semantics/getsymbols05.f90 +++ b/flang/test/Semantics/getsymbols05.f90 @@ -9,7 +9,7 @@ program main x = y end program -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK:x:{{.*}}getsymbols05.f90, 3, 14-15 ! CHECK:x:{{.*}}getsymbols05.f90, 6, 16-17 ! CHECK:y:{{.*}}getsymbols05.f90, 4, 14-15 diff --git a/flang/test/Semantics/missing_newline.f90 b/flang/test/Semantics/missing_newline.f90 index 6dfafba7db86..82f9c9ceb612 100644 --- a/flang/test/Semantics/missing_newline.f90 +++ b/flang/test/Semantics/missing_newline.f90 @@ -1,4 +1,4 @@ ! RUN: echo -n "end program" > %t.f90 -! RUN: %f18 -fparse-only %t.f90 +! RUN: %f18 -fsyntax-only %t.f90 ! RUN: echo -ne "\rend program" > %t.f90 -! RUN: %f18 -fparse-only %t.f90 +! RUN: %f18 -fsyntax-only %t.f90 diff --git a/flang/test/Semantics/mod-file-rewriter.f90 b/flang/test/Semantics/mod-file-rewriter.f90 index 81252910e690..2856dd6dbdf3 100644 --- a/flang/test/Semantics/mod-file-rewriter.f90 +++ b/flang/test/Semantics/mod-file-rewriter.f90 @@ -1,8 +1,8 @@ ! RUN: rm -fr %t && mkdir %t && cd %t -! RUN: %f18 -fparse-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED -! RUN: %f18 -fparse-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED -! RUN: %f18 -fparse-only -fdebug-module-writer %p/Inputs/mod-file-unchanged.f90 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED -! RUN: %f18 -fparse-only -fdebug-module-writer %p/Inputs/mod-file-changed.f90 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %p/Inputs/mod-file-unchanged.f90 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %p/Inputs/mod-file-changed.f90 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED module m real :: x(10) diff --git a/flang/test/Semantics/modifiable01.f90 b/flang/test/Semantics/modifiable01.f90 index 391a643e3368..dfa9396565e0 100644 --- a/flang/test/Semantics/modifiable01.f90 +++ b/flang/test/Semantics/modifiable01.f90 @@ -1,4 +1,4 @@ -! RUN: not %f18 -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -fsyntax-only %s 2>&1 | FileCheck %s ! Test WhyNotModifiable() explanations module prot diff --git a/flang/test/Semantics/offsets01.f90 b/flang/test/Semantics/offsets01.f90 index f5491f7b9438..78183d5cc0fc 100644 --- a/flang/test/Semantics/offsets01.f90 +++ b/flang/test/Semantics/offsets01.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Size and alignment of intrinsic types subroutine s1 diff --git a/flang/test/Semantics/offsets02.f90 b/flang/test/Semantics/offsets02.f90 index f2ed1d1dbe71..b76572e1761d 100644 --- a/flang/test/Semantics/offsets02.f90 +++ b/flang/test/Semantics/offsets02.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Size and alignment of derived types diff --git a/flang/test/Semantics/offsets03.f90 b/flang/test/Semantics/offsets03.f90 index d28b3694bddd..c1c2de464a01 100644 --- a/flang/test/Semantics/offsets03.f90 +++ b/flang/test/Semantics/offsets03.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Size and alignment with EQUIVALENCE and COMMON diff --git a/flang/test/Semantics/oldparam01.f90 b/flang/test/Semantics/oldparam01.f90 index 43f33a52f364..b78869ff4d90 100644 --- a/flang/test/Semantics/oldparam01.f90 +++ b/flang/test/Semantics/oldparam01.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -falternative-parameter-statement -fdebug-dump-symbols -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -falternative-parameter-statement -fdebug-dump-symbols -fsyntax-only %s 2>&1 | FileCheck %s ! Non-error tests for "old style" PARAMETER statements diff --git a/flang/test/Semantics/oldparam02.f90 b/flang/test/Semantics/oldparam02.f90 index 72ea5c410c7a..fd58988ba0b0 100644 --- a/flang/test/Semantics/oldparam02.f90 +++ b/flang/test/Semantics/oldparam02.f90 @@ -1,4 +1,4 @@ -! RUN: not %f18 -falternative-parameter-statement -fdebug-dump-symbols -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -falternative-parameter-statement -fdebug-dump-symbols -fsyntax-only %s 2>&1 | FileCheck %s ! Error tests for "old style" PARAMETER statements subroutine subr(x1,x2,x3,x4,x5) diff --git a/flang/test/Semantics/oldparam03.f90 b/flang/test/Semantics/oldparam03.f90 index cbdb07057226..bc80f00a1966 100644 --- a/flang/test/Semantics/oldparam03.f90 +++ b/flang/test/Semantics/oldparam03.f90 @@ -1,4 +1,4 @@ -! RUN: not %f18 -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -fsyntax-only %s 2>&1 | FileCheck %s ! Ensure that old-style PARAMETER statements are disabled by default. diff --git a/flang/test/Semantics/resolve100.f90 b/flang/test/Semantics/resolve100.f90 index 1e84be24c5f2..52fca54b94f7 100644 --- a/flang/test/Semantics/resolve100.f90 +++ b/flang/test/Semantics/resolve100.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s program p ! CHECK: a size=4 offset=0: ObjectEntity type: LOGICAL(4) diff --git a/flang/test/Semantics/rewrite01.f90 b/flang/test/Semantics/rewrite01.f90 index 221994593422..cd5453eee6be 100644 --- a/flang/test/Semantics/rewrite01.f90 +++ b/flang/test/Semantics/rewrite01.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only -fdebug-dump-parse-tree %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only -fdebug-dump-parse-tree %s 2>&1 | FileCheck %s ! Ensure that READ(CVAR) [, item-list] is corrected when CVAR is a ! character variable so as to be a formatted read from the default ! unit, not an unformatted read from an internal unit (which is not diff --git a/flang/test/Semantics/test_errors.sh b/flang/test/Semantics/test_errors.sh index 5411482e4d3b..10feccb2f9f1 100755 --- a/flang/test/Semantics/test_errors.sh +++ b/flang/test/Semantics/test_errors.sh @@ -2,7 +2,7 @@ # Compile a source file and check errors against those listed in the file. # Change the compiler by setting the F18 environment variable. -F18_OPTIONS="-fparse-only" +F18_OPTIONS="-fsyntax-only" srcdir=$(dirname $0) source $srcdir/common.sh [[ ! -f $src ]] && die "File not found: $src" diff --git a/flang/test/Semantics/test_modfile.sh b/flang/test/Semantics/test_modfile.sh index 9205451c176d..a2aef65a101b 100755 --- a/flang/test/Semantics/test_modfile.sh +++ b/flang/test/Semantics/test_modfile.sh @@ -2,7 +2,7 @@ # Compile a source file and compare generated .mod files against expected. set -e -F18_OPTIONS="-fdebug-resolve-names -fparse-only" +F18_OPTIONS="-fdebug-resolve-names -fsyntax-only" srcdir=$(dirname $0) source $srcdir/common.sh diff --git a/flang/test/Semantics/typeinfo01.f90 b/flang/test/Semantics/typeinfo01.f90 index 834120ccb430..2274896fcd68 100644 --- a/flang/test/Semantics/typeinfo01.f90 +++ b/flang/test/Semantics/typeinfo01.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Tests for derived type runtime descriptions module m01 diff --git a/flang/tools/f18-parse-demo/f18-parse-demo.cpp b/flang/tools/f18-parse-demo/f18-parse-demo.cpp index 4ccc65e0631d..2033ef6c3bc2 100644 --- a/flang/tools/f18-parse-demo/f18-parse-demo.cpp +++ b/flang/tools/f18-parse-demo/f18-parse-demo.cpp @@ -87,7 +87,7 @@ struct DriverOptions { bool warnOnNonstandardUsage{false}; // -Mstandard bool warningsAreErrors{false}; // -Werror Fortran::parser::Encoding encoding{Fortran::parser::Encoding::LATIN_1}; - bool parseOnly{false}; + bool syntaxOnly{false}; bool dumpProvenance{false}; bool dumpCookedChars{false}; bool dumpUnparse{false}; @@ -217,7 +217,7 @@ std::string CompileFortran( Fortran::common::LanguageFeature::BackslashEscapes)); return {}; } - if (driver.parseOnly) { + if (driver.syntaxOnly) { return {}; } @@ -369,8 +369,8 @@ int main(int argc, char *const argv[]) { driver.dumpUnparse = true; } else if (arg == "-ftime-parse") { driver.timeParse = true; - } else if (arg == "-fparse-only") { - driver.parseOnly = true; + } else if (arg == "-fparse-only" || arg == "-fsyntax-only") { + driver.syntaxOnly = true; } else if (arg == "-c") { driver.compileOnly = true; } else if (arg == "-o") { @@ -405,7 +405,7 @@ int main(int argc, char *const argv[]) { << " -ed enable fixed form D lines\n" << " -E prescan & preprocess only\n" << " -ftime-parse measure parsing time\n" - << " -fparse-only parse only, no output except messages\n" + << " -fsyntax-only parse only, no output except messages\n" << " -funparse parse & reformat only, no code " "generation\n" << " -fdump-provenance dump the provenance table (no code)\n" diff --git a/flang/tools/f18/CMakeLists.txt b/flang/tools/f18/CMakeLists.txt index 2e5350aecdc6..41237fe06003 100644 --- a/flang/tools/f18/CMakeLists.txt +++ b/flang/tools/f18/CMakeLists.txt @@ -46,7 +46,7 @@ foreach(filename ${MODULES}) set(depends ${include}/__fortran_builtins.mod) endif() add_custom_command(OUTPUT ${include}/${filename}.mod - COMMAND f18 -fparse-only -I${include} + COMMAND f18 -fsyntax-only -I${include} ${FLANG_SOURCE_DIR}/module/${filename}.f90 WORKING_DIRECTORY ${include} DEPENDS f18 ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${depends} diff --git a/flang/tools/f18/f18.cpp b/flang/tools/f18/f18.cpp index fecd37d49936..4546353fe010 100644 --- a/flang/tools/f18/f18.cpp +++ b/flang/tools/f18/f18.cpp @@ -92,7 +92,7 @@ struct DriverOptions { bool warningsAreErrors{false}; // -Werror bool byteswapio{false}; // -byteswapio Fortran::parser::Encoding encoding{Fortran::parser::Encoding::UTF_8}; - bool parseOnly{false}; + bool syntaxOnly{false}; bool dumpProvenance{false}; bool dumpCookedChars{false}; bool dumpUnparse{false}; @@ -327,7 +327,7 @@ std::string CompileFortran(std::string path, Fortran::parser::Options options, exitStatus = EXIT_FAILURE; } } - if (driver.parseOnly) { + if (driver.syntaxOnly) { return {}; } @@ -544,8 +544,8 @@ int main(int argc, char *const argv[]) { driver.dumpUnparseWithSymbols = true; } else if (arg == "-funparse-typed-exprs-to-f18-fc") { driver.unparseTypedExprsToF18_FC = true; - } else if (arg == "-fparse-only") { - driver.parseOnly = true; + } else if (arg == "-fparse-only" || arg == "-fsyntax-only") { + driver.syntaxOnly = true; } else if (arg == "-c") { driver.compileOnly = true; } else if (arg == "-o") { @@ -649,7 +649,7 @@ int main(int argc, char *const argv[]) { << " -module dir module output directory (default .)\n" << " -flatin interpret source as Latin-1 (ISO 8859-1) " "rather than UTF-8\n" - << " -fparse-only parse only, no output except messages\n" + << " -fsyntax-only parsing and semantics only, no output except messages\n" << " -funparse parse & reformat only, no code " "generation\n" << " -funparse-with-symbols parse, resolve symbols, and unparse\n" </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap Culprit: <cut> commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b Author: Thomas Schwinge <thomas(a)codesourcery.com> Date: Fri Aug 13 17:53:12 2021 +0200 Add more self-tests for 'hash_map' with Value type with non-trivial constructor/destructor ... to document the current behavior. gcc/ * hash-map-tests.c (test_map_of_type_with_ctor_and_dtor): Extend. (test_map_of_type_with_ctor_and_dtor_expand): Add function. (hash_map_tests_c_tests): Call it. </cut> Results regressed to (for first_bad == e4f16e9f357a38ec702fb69a0ffab9d292a6af9b) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:02:51 cc1: internal compiler error: in fail, at selftest.c:47 # 00:02:51 cc1plus: internal compiler error: in fail, at selftest.c:47 # 00:02:51 make[3]: *** [s-selftest-c] Error 1 # 00:02:51 make[3]: *** [s-selftest-c++] Error 1 # 00:02:51 make[2]: *** [all-stage1-gcc] Error 2 # 00:02:51 make[1]: *** [stage1-bubble] Error 2 # 00:02:51 make: *** [all] Error 2 from (for last_good == 602fca427df6c5f7452677cfcdd16a5b9a3ca86a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-e4f16e9f357a38ec702fb69a0ffab9d292a6af9b cd investigate-gcc-e4f16e9f357a38ec702fb69a0ffab9d292a6af9b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach e4f16e9f357a38ec702fb69a0ffab9d292a6af9b ../artifacts/test.sh # Reproduce last_good build git checkout --detach 602fca427df6c5f7452677cfcdd16a5b9a3ca86a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Full commit (up to 1000 lines): <cut> commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b Author: Thomas Schwinge <thomas(a)codesourcery.com> Date: Fri Aug 13 17:53:12 2021 +0200 Add more self-tests for 'hash_map' with Value type with non-trivial constructor/destructor ... to document the current behavior. gcc/ * hash-map-tests.c (test_map_of_type_with_ctor_and_dtor): Extend. (test_map_of_type_with_ctor_and_dtor_expand): Add function. (hash_map_tests_c_tests): Call it. --- gcc/hash-map-tests.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) diff --git a/gcc/hash-map-tests.c b/gcc/hash-map-tests.c index 5b6b192cd28..257f2be0c26 100644 --- a/gcc/hash-map-tests.c +++ b/gcc/hash-map-tests.c @@ -278,6 +278,156 @@ test_map_of_type_with_ctor_and_dtor () ASSERT_TRUE (val_t::ndefault + val_t::ncopy == val_t::ndtor); } + + + /* Verify basic construction and destruction of Value objects. */ + { + /* Configure, arbitrary. */ + const size_t N_init = 0; + const int N_elem = 28; + + void *a[N_elem]; + for (size_t i = 0; i < N_elem; ++i) + a[i] = &a[i]; + + val_t::ndefault = 0; + val_t::ncopy = 0; + val_t::nassign = 0; + val_t::ndtor = 0; + Map m (N_init); + ASSERT_EQ (val_t::ndefault + + val_t::ncopy + + val_t::nassign + + val_t::ndtor, 0); + + for (int i = 0; i < N_elem; ++i) + { + m.get_or_insert (a[i]); + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, 0); + ASSERT_EQ (val_t::nassign, 0); + ASSERT_EQ (val_t::ndtor, i); + + m.remove (a[i]); + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, 0); + ASSERT_EQ (val_t::nassign, 0); + ASSERT_EQ (val_t::ndtor, 1 + i); + } + } +} + +/* Verify aspects of 'hash_table::expand'. */ + +static void +test_map_of_type_with_ctor_and_dtor_expand (bool remove_some_inline) +{ + /* Configure, so that hash table expansion triggers a few times. */ + const size_t N_init = 0; + const int N_elem = 70; + size_t expand_c_expected = 4; + size_t expand_c = 0; + + void *a[N_elem]; + for (size_t i = 0; i < N_elem; ++i) + a[i] = &a[i]; + + typedef hash_map <void *, val_t> Map; + + /* Note that we are starting with a fresh 'Map'. Even if an existing one has + been cleared out completely, there remain 'deleted' elements, and these + would disturb the following logic, where we don't have access to the + actual 'm_n_deleted' value. */ + size_t m_n_deleted = 0; + + val_t::ndefault = 0; + val_t::ncopy = 0; + val_t::nassign = 0; + val_t::ndtor = 0; + Map m (N_init); + + /* In the following, in particular related to 'expand', we're adapting from + the internal logic of 'hash_table', glossing over "some details" not + relevant for this testing here. */ + + /* Per 'hash_table::hash_table'. */ + size_t m_size; + { + unsigned int size_prime_index_ = hash_table_higher_prime_index (N_init); + m_size = prime_tab[size_prime_index_].prime; + } + + int n_expand_moved = 0; + + for (int i = 0; i < N_elem; ++i) + { + size_t elts = m.elements (); + + /* Per 'hash_table::find_slot_with_hash'. */ + size_t m_n_elements = elts + m_n_deleted; + bool expand = m_size * 3 <= m_n_elements * 4; + + m.get_or_insert (a[i]); + if (expand) + { + ++expand_c; + + /* Per 'hash_table::expand'. */ + { + unsigned int nindex = hash_table_higher_prime_index (elts * 2); + m_size = prime_tab[nindex].prime; + } + m_n_deleted = 0; + + /* All non-deleted elements have been moved. */ + n_expand_moved += i; + if (remove_some_inline) + n_expand_moved -= (i + 2) / 3; + } + + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, n_expand_moved); + ASSERT_EQ (val_t::nassign, 0); + if (remove_some_inline) + ASSERT_EQ (val_t::ndtor, (i + 2) / 3); + else + ASSERT_EQ (val_t::ndtor, 0); + + /* Remove some inline. This never triggers an 'expand' here, but via + 'm_n_deleted' does influence any following one. */ + if (remove_some_inline + && !(i % 3)) + { + m.remove (a[i]); + /* Per 'hash_table::remove_elt_with_hash'. */ + m_n_deleted++; + + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, n_expand_moved); + ASSERT_EQ (val_t::nassign, 0); + ASSERT_EQ (val_t::ndtor, 1 + (i + 2) / 3); + } + } + ASSERT_EQ (expand_c, expand_c_expected); + + int ndefault = val_t::ndefault; + int ncopy = val_t::ncopy; + int nassign = val_t::nassign; + int ndtor = val_t::ndtor; + + for (int i = 0; i < N_elem; ++i) + { + if (remove_some_inline + && !(i % 3)) + continue; + + m.remove (a[i]); + ++ndtor; + ASSERT_EQ (val_t::ndefault, ndefault); + ASSERT_EQ (val_t::ncopy, ncopy); + ASSERT_EQ (val_t::nassign, nassign); + ASSERT_EQ (val_t::ndtor, ndtor); + } } /* Test calling empty on a hash_map that has a key type with non-zero @@ -309,6 +459,8 @@ hash_map_tests_c_tests () test_map_of_strings_to_int (); test_map_of_int_to_strings (); test_map_of_type_with_ctor_and_dtor (); + test_map_of_type_with_ctor_and_dtor_expand (false); + test_map_of_type_with_ctor_and_dtor_expand (true); test_nonzero_empty_key (); } </cut>

3 years, 10 months

3
2
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 6 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO Culprit: <cut> commit 310b35304cdf5a230c042904655583c5532d3e91 Author: Rong Xu <xur(a)google.com> Date: Tue Feb 16 10:53:38 2021 -0800 [SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455 </cut> Results regressed to (for first_bad == 310b35304cdf5a230c042904655583c5532d3e91) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-310b35304cdf5a230c042904655583c5532d3e91/results_id: 1 # 482.sphinx3,sphinx_livepretend_base.default regressed by 106 from (for last_good == cddc53ef088b68586094c9841a76b41bee3994a4) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-cddc53ef088b68586094c9841a76b41bee3994a4/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/3917 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/3930 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-310b35304cdf5a230c042904655583c5532d3e91 cd investigate-llvm-310b35304cdf5a230c042904655583c5532d3e91 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 310b35304cdf5a230c042904655583c5532d3e91 ../artifacts/test.sh # Reproduce last_good build git checkout --detach cddc53ef088b68586094c9841a76b41bee3994a4 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 310b35304cdf5a230c042904655583c5532d3e91 Author: Rong Xu <xur(a)google.com> Date: Tue Feb 16 10:53:38 2021 -0800 [SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455 --- .../llvm/ProfileData/SampleProfileLoaderBaseImpl.h | 862 +++++++++++++++++ .../llvm/ProfileData/SampleProfileLoaderBaseUtil.h | 97 ++ llvm/lib/ProfileData/CMakeLists.txt | 1 + .../ProfileData/SampleProfileLoaderBaseUtil.cpp | 192 ++++ llvm/lib/Transforms/IPO/SampleProfile.cpp | 1001 +------------------- 5 files changed, 1161 insertions(+), 992 deletions(-) diff --git a/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h new file mode 100644 index 000000000000..f02bacb6edc3 --- /dev/null +++ b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h @@ -0,0 +1,862 @@ +////===- SampleProfileLoadBaseImpl.h - Profile loader base impl --*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +/// \file +/// This file provides the interface for the sampled PGO profile loader base +/// implementation. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H +#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H + +#include "llvm/ADT/ArrayRef.h" +#include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/DenseSet.h" +#include "llvm/ADT/SmallPtrSet.h" +#include "llvm/ADT/SmallSet.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/Analysis/LoopInfo.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/PostDominators.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/IR/BasicBlock.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/DebugInfoMetadata.h" +#include "llvm/IR/DebugLoc.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/Instruction.h" +#include "llvm/IR/Instructions.h" +#include "llvm/IR/Module.h" +#include "llvm/ProfileData/SampleProf.h" +#include "llvm/ProfileData/SampleProfReader.h" +#include "llvm/ProfileData/SampleProfileLoaderBaseUtil.h" +#include "llvm/Support/CommandLine.h" +#include "llvm/Support/GenericDomTree.h" +#include "llvm/Support/raw_ostream.h" + +namespace llvm { +using namespace llvm; +using namespace sampleprof; +using ProfileCount = Function::ProfileCount; +namespace sampleprofutil { +bool callsiteIsHot(const SampleCoverageTracker *CT, + const FunctionSamples *CallsiteFS, ProfileSummaryInfo *PSI, + bool ProfAccForSymsInList); +} // namespace sampleprofutil +using namespace sampleprofutil; + +#define DEBUG_TYPE "sample-profile-impl" + +using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>; +using EquivalenceClassMap = DenseMap<const BasicBlock *, const BasicBlock *>; +using Edge = std::pair<const BasicBlock *, const BasicBlock *>; +using EdgeWeightMap = DenseMap<Edge, uint64_t>; +using BlockEdgeMap = + DenseMap<const BasicBlock *, SmallVector<const BasicBlock *, 8>>; + +extern cl::opt<unsigned> SampleProfileMaxPropagateIterations; +extern cl::opt<unsigned> SampleProfileRecordCoverage; +extern cl::opt<unsigned> SampleProfileSampleCoverage; +extern cl::opt<bool> NoWarnSampleUnused; + +class SampleProfileLoaderBaseImpl { +public: + SampleProfileLoaderBaseImpl(std::string Name) : Filename(Name) {} + void dump() { Reader->dump(); } + +protected: + friend class SampleCoverageTracker; + + unsigned getFunctionLoc(Function &F); + virtual ErrorOr<uint64_t> getInstWeight(const Instruction &Inst); + ErrorOr<uint64_t> getInstWeightImpl(const Instruction &Inst); + ErrorOr<uint64_t> getBlockWeight(const BasicBlock *BB); + mutable DenseMap<const DILocation *, const FunctionSamples *> + DILocation2SampleMap; + virtual const FunctionSamples * + findFunctionSamples(const Instruction &I) const; + void printEdgeWeight(raw_ostream &OS, Edge E); + void printBlockWeight(raw_ostream &OS, const BasicBlock *BB) const; + void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB); + bool computeBlockWeights(Function &F); + void findEquivalenceClasses(Function &F); + template <bool IsPostDom> + void findEquivalencesFor(BasicBlock *BB1, ArrayRef<BasicBlock *> Descendants, + DominatorTreeBase<BasicBlock, IsPostDom> *DomTree); + + void propagateWeights(Function &F); + uint64_t visitEdge(Edge E, unsigned *NumUnknownEdges, Edge *UnknownEdge); + void buildEdges(Function &F); + bool propagateThroughEdges(Function &F, bool UpdateBlockCount); + void clearFunctionData(); + void computeDominanceAndLoopInfo(Function &F); + bool + computeAndPropagateWeights(Function &F, + const DenseSet<GlobalValue::GUID> &InlinedGUIDs); + void emitCoverageRemarks(Function &F); + + /// Map basic blocks to their computed weights. + /// + /// The weight of a basic block is defined to be the maximum + /// of all the instruction weights in that block. + BlockWeightMap BlockWeights; + + /// Map edges to their computed weights. + /// + /// Edge weights are computed by propagating basic block weights in + /// SampleProfile::propagateWeights. + EdgeWeightMap EdgeWeights; + + /// Set of visited blocks during propagation. + SmallPtrSet<const BasicBlock *, 32> VisitedBlocks; + + /// Set of visited edges during propagation. + SmallSet<Edge, 32> VisitedEdges; + + /// Equivalence classes for block weights. + /// + /// Two blocks BB1 and BB2 are in the same equivalence class if they + /// dominate and post-dominate each other, and they are in the same loop + /// nest. When this happens, the two blocks are guaranteed to execute + /// the same number of times. + EquivalenceClassMap EquivalenceClass; + + /// Dominance, post-dominance and loop information. + std::unique_ptr<DominatorTree> DT; + std::unique_ptr<PostDominatorTree> PDT; + std::unique_ptr<LoopInfo> LI; + + /// Predecessors for each basic block in the CFG. + BlockEdgeMap Predecessors; + + /// Successors for each basic block in the CFG. + BlockEdgeMap Successors; + + /// Profile coverage tracker. + SampleCoverageTracker CoverageTracker; + + /// Profile reader object. + std::unique_ptr<SampleProfileReader> Reader; + + /// Samples collected for the body of this function. + FunctionSamples *Samples = nullptr; + + /// Name of the profile file to load. + std::string Filename; + + /// Profile Summary Info computed from sample profile. + ProfileSummaryInfo *PSI = nullptr; + + /// Optimization Remark Emitter used to emit diagnostic remarks. + OptimizationRemarkEmitter *ORE = nullptr; +}; + +/// Clear all the per-function data used to load samples and propagate weights. +void SampleProfileLoaderBaseImpl::clearFunctionData() { + BlockWeights.clear(); + EdgeWeights.clear(); + VisitedBlocks.clear(); + VisitedEdges.clear(); + EquivalenceClass.clear(); + DT = nullptr; + PDT = nullptr; + LI = nullptr; + Predecessors.clear(); + Successors.clear(); + CoverageTracker.clear(); +} + +#ifndef NDEBUG +/// Print the weight of edge \p E on stream \p OS. +/// +/// \param OS Stream to emit the output to. +/// \param E Edge to print. +void SampleProfileLoaderBaseImpl::printEdgeWeight(raw_ostream &OS, Edge E) { + OS << "weight[" << E.first->getName() << "->" << E.second->getName() + << "]: " << EdgeWeights[E] << "\n"; +} + +/// Print the equivalence class of block \p BB on stream \p OS. +/// +/// \param OS Stream to emit the output to. +/// \param BB Block to print. +void SampleProfileLoaderBaseImpl::printBlockEquivalence(raw_ostream &OS, + const BasicBlock *BB) { + const BasicBlock *Equiv = EquivalenceClass[BB]; + OS << "equivalence[" << BB->getName() + << "]: " << ((Equiv) ? EquivalenceClass[BB]->getName() : "NONE") << "\n"; +} + +/// Print the weight of block \p BB on stream \p OS. +/// +/// \param OS Stream to emit the output to. +/// \param BB Block to print. +void SampleProfileLoaderBaseImpl::printBlockWeight(raw_ostream &OS, + const BasicBlock *BB) const { + const auto &I = BlockWeights.find(BB); + uint64_t W = (I == BlockWeights.end() ? 0 : I->second); + OS << "weight[" << BB->getName() << "]: " << W << "\n"; +} +#endif + +/// Get the weight for an instruction. +/// +/// The "weight" of an instruction \p Inst is the number of samples +/// collected on that instruction at runtime. To retrieve it, we +/// need to compute the line number of \p Inst relative to the start of its +/// function. We use HeaderLineno to compute the offset. We then +/// look up the samples collected for \p Inst using BodySamples. +/// +/// \param Inst Instruction to query. +/// +/// \returns the weight of \p Inst. +ErrorOr<uint64_t> +SampleProfileLoaderBaseImpl::getInstWeight(const Instruction &Inst) { + return getInstWeightImpl(Inst); +} + +ErrorOr<uint64_t> +SampleProfileLoaderBaseImpl::getInstWeightImpl(const Instruction &Inst) { + const FunctionSamples *FS = findFunctionSamples(Inst); + if (!FS) + return std::error_code(); + + const DebugLoc &DLoc = Inst.getDebugLoc(); + if (!DLoc) + return std::error_code(); + + const DILocation *DIL = DLoc; + uint32_t LineOffset = FunctionSamples::getOffset(DIL); + uint32_t Discriminator = DIL->getBaseDiscriminator(); + ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator); + if (R) { + bool FirstMark = + CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get()); + if (FirstMark) { + ORE->emit([&]() { + OptimizationRemarkAnalysis Remark(DEBUG_TYPE, "AppliedSamples", &Inst); + Remark << "Applied " << ore::NV("NumSamples", *R); + Remark << " samples from profile (offset: "; + Remark << ore::NV("LineOffset", LineOffset); + if (Discriminator) { + Remark << "."; + Remark << ore::NV("Discriminator", Discriminator); + } + Remark << ")"; + return Remark; + }); + } + LLVM_DEBUG(dbgs() << " " << DLoc.getLine() << "." + << DIL->getBaseDiscriminator() << ":" << Inst + << " (line offset: " << LineOffset << "." + << DIL->getBaseDiscriminator() << " - weight: " << R.get() + << ")\n"); + } + return R; +} + +/// Compute the weight of a basic block. +/// +/// The weight of basic block \p BB is the maximum weight of all the +/// instructions in BB. +/// +/// \param BB The basic block to query. +/// +/// \returns the weight for \p BB. +ErrorOr<uint64_t> +SampleProfileLoaderBaseImpl::getBlockWeight(const BasicBlock *BB) { + uint64_t Max = 0; + bool HasWeight = false; + for (auto &I : BB->getInstList()) { + const ErrorOr<uint64_t> &R = getInstWeight(I); + if (R) { + Max = std::max(Max, R.get()); + HasWeight = true; + } + } + return HasWeight ? ErrorOr<uint64_t>(Max) : std::error_code(); +} + +/// Compute and store the weights of every basic block. +/// +/// This populates the BlockWeights map by computing +/// the weights of every basic block in the CFG. +/// +/// \param F The function to query. +bool SampleProfileLoaderBaseImpl::computeBlockWeights(Function &F) { + bool Changed = false; + LLVM_DEBUG(dbgs() << "Block weights\n"); + for (const auto &BB : F) { + ErrorOr<uint64_t> Weight = getBlockWeight(&BB); + if (Weight) { + BlockWeights[&BB] = Weight.get(); + VisitedBlocks.insert(&BB); + Changed = true; + } + LLVM_DEBUG(printBlockWeight(dbgs(), &BB)); + } + + return Changed; +} + +/// Get the FunctionSamples for an instruction. +/// +/// The FunctionSamples of an instruction \p Inst is the inlined instance +/// in which that instruction is coming from. We traverse the inline stack +/// of that instruction, and match it with the tree nodes in the profile. +/// +/// \param Inst Instruction to query. +/// +/// \returns the FunctionSamples pointer to the inlined instance. +const FunctionSamples *SampleProfileLoaderBaseImpl::findFunctionSamples( + const Instruction &Inst) const { + const DILocation *DIL = Inst.getDebugLoc(); + if (!DIL) + return Samples; + + auto it = DILocation2SampleMap.try_emplace(DIL, nullptr); + if (it.second) { + it.first->second = Samples->findFunctionSamples(DIL, Reader->getRemapper()); + } + return it.first->second; +} + +/// Find equivalence classes for the given block. +/// +/// This finds all the blocks that are guaranteed to execute the same +/// number of times as \p BB1. To do this, it traverses all the +/// descendants of \p BB1 in the dominator or post-dominator tree. +/// +/// A block BB2 will be in the same equivalence class as \p BB1 if +/// the following holds: +/// +/// 1- \p BB1 is a descendant of BB2 in the opposite tree. So, if BB2 +/// is a descendant of \p BB1 in the dominator tree, then BB2 should +/// dominate BB1 in the post-dominator tree. +/// +/// 2- Both BB2 and \p BB1 must be in the same loop. +/// +/// For every block BB2 that meets those two requirements, we set BB2's +/// equivalence class to \p BB1. +/// +/// \param BB1 Block to check. +/// \param Descendants Descendants of \p BB1 in either the dom or pdom tree. +/// \param DomTree Opposite dominator tree. If \p Descendants is filled +/// with blocks from \p BB1's dominator tree, then +/// this is the post-dominator tree, and vice versa. +template <bool IsPostDom> +void SampleProfileLoaderBaseImpl::findEquivalencesFor( + BasicBlock *BB1, ArrayRef<BasicBlock *> Descendants, + DominatorTreeBase<BasicBlock, IsPostDom> *DomTree) { + const BasicBlock *EC = EquivalenceClass[BB1]; + uint64_t Weight = BlockWeights[EC]; + for (const auto *BB2 : Descendants) { + bool IsDomParent = DomTree->dominates(BB2, BB1); + bool IsInSameLoop = LI->getLoopFor(BB1) == LI->getLoopFor(BB2); + if (BB1 != BB2 && IsDomParent && IsInSameLoop) { + EquivalenceClass[BB2] = EC; + // If BB2 is visited, then the entire EC should be marked as visited. + if (VisitedBlocks.count(BB2)) { + VisitedBlocks.insert(EC); + } + + // If BB2 is heavier than BB1, make BB2 have the same weight + // as BB1. + // + // Note that we don't worry about the opposite situation here + // (when BB2 is lighter than BB1). We will deal with this + // during the propagation phase. Right now, we just want to + // make sure that BB1 has the largest weight of all the + // members of its equivalence set. + Weight = std::max(Weight, BlockWeights[BB2]); + } + } + if (EC == &EC->getParent()->getEntryBlock()) { + BlockWeights[EC] = Samples->getHeadSamples() + 1; + } else { + BlockWeights[EC] = Weight; + } +} + +/// Find equivalence classes. +/// +/// Since samples may be missing from blocks, we can fill in the gaps by setting +/// the weights of all the blocks in the same equivalence class to the same +/// weight. To compute the concept of equivalence, we use dominance and loop +/// information. Two blocks B1 and B2 are in the same equivalence class if B1 +/// dominates B2, B2 post-dominates B1 and both are in the same loop. +/// +/// \param F The function to query. +void SampleProfileLoaderBaseImpl::findEquivalenceClasses(Function &F) { + SmallVector<BasicBlock *, 8> DominatedBBs; + LLVM_DEBUG(dbgs() << "\nBlock equivalence classes\n"); + // Find equivalence sets based on dominance and post-dominance information. + for (auto &BB : F) { + BasicBlock *BB1 = &BB; + + // Compute BB1's equivalence class once. + if (EquivalenceClass.count(BB1)) { + LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1)); + continue; + } + + // By default, blocks are in their own equivalence class. + EquivalenceClass[BB1] = BB1; + + // Traverse all the blocks dominated by BB1. We are looking for + // every basic block BB2 such that: + // + // 1- BB1 dominates BB2. + // 2- BB2 post-dominates BB1. + // 3- BB1 and BB2 are in the same loop nest. + // + // If all those conditions hold, it means that BB2 is executed + // as many times as BB1, so they are placed in the same equivalence + // class by making BB2's equivalence class be BB1. + DominatedBBs.clear(); + DT->getDescendants(BB1, DominatedBBs); + findEquivalencesFor(BB1, DominatedBBs, PDT.get()); + + LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1)); + } + + // Assign weights to equivalence classes. + // + // All the basic blocks in the same equivalence class will execute + // the same number of times. Since we know that the head block in + // each equivalence class has the largest weight, assign that weight + // to all the blocks in that equivalence class. + LLVM_DEBUG( + dbgs() << "\nAssign the same weight to all blocks in the same class\n"); + for (auto &BI : F) { + const BasicBlock *BB = &BI; + const BasicBlock *EquivBB = EquivalenceClass[BB]; + if (BB != EquivBB) + BlockWeights[BB] = BlockWeights[EquivBB]; + LLVM_DEBUG(printBlockWeight(dbgs(), BB)); + } +} + +/// Visit the given edge to decide if it has a valid weight. +/// +/// If \p E has not been visited before, we copy to \p UnknownEdge +/// and increment the count of unknown edges. +/// +/// \param E Edge to visit. +/// \param NumUnknownEdges Current number of unknown edges. +/// \param UnknownEdge Set if E has not been visited before. +/// +/// \returns E's weight, if known. Otherwise, return 0. +uint64_t SampleProfileLoaderBaseImpl::visitEdge(Edge E, + unsigned *NumUnknownEdges, + Edge *UnknownEdge) { + if (!VisitedEdges.count(E)) { + (*NumUnknownEdges)++; + *UnknownEdge = E; + return 0; + } + + return EdgeWeights[E]; +} + +/// Propagate weights through incoming/outgoing edges. +/// +/// If the weight of a basic block is known, and there is only one edge +/// with an unknown weight, we can calculate the weight of that edge. +/// +/// Similarly, if all the edges have a known count, we can calculate the +/// count of the basic block, if needed. +/// +/// \param F Function to process. +/// \param UpdateBlockCount Whether we should update basic block counts that +/// has already been annotated. +/// +/// \returns True if new weights were assigned to edges or blocks. +bool SampleProfileLoaderBaseImpl::propagateThroughEdges(Function &F, + bool UpdateBlockCount) { + bool Changed = false; + LLVM_DEBUG(dbgs() << "\nPropagation through edges\n"); + for (const auto &BI : F) { + const BasicBlock *BB = &BI; + const BasicBlock *EC = EquivalenceClass[BB]; + + // Visit all the predecessor and successor edges to determine + // which ones have a weight assigned already. Note that it doesn't + // matter that we only keep track of a single unknown edge. The + // only case we are interested in handling is when only a single + // edge is unknown (see setEdgeOrBlockWeight). + for (unsigned i = 0; i < 2; i++) { + uint64_t TotalWeight = 0; + unsigned NumUnknownEdges = 0, NumTotalEdges = 0; + Edge UnknownEdge, SelfReferentialEdge, SingleEdge; + + if (i == 0) { + // First, visit all predecessor edges. + NumTotalEdges = Predecessors[BB].size(); + for (auto *Pred : Predecessors[BB]) { + Edge E = std::make_pair(Pred, BB); + TotalWeight += visitEdge(E, &NumUnknownEdges, &UnknownEdge); + if (E.first == E.second) + SelfReferentialEdge = E; + } + if (NumTotalEdges == 1) { + SingleEdge = std::make_pair(Predecessors[BB][0], BB); + } + } else { + // On the second round, visit all successor edges. + NumTotalEdges = Successors[BB].size(); + for (auto *Succ : Successors[BB]) { + Edge E = std::make_pair(BB, Succ); + TotalWeight += visitEdge(E, &NumUnknownEdges, &UnknownEdge); + } + if (NumTotalEdges == 1) { + SingleEdge = std::make_pair(BB, Successors[BB][0]); + } + } + + // After visiting all the edges, there are three cases that we + // can handle immediately: + // + // - All the edge weights are known (i.e., NumUnknownEdges == 0). + // In this case, we simply check that the sum of all the edges + // is the same as BB's weight. If not, we change BB's weight + // to match. Additionally, if BB had not been visited before, + // we mark it visited. + // + // - Only one edge is unknown and BB has already been visited. + // In this case, we can compute the weight of the edge by + // subtracting the total block weight from all the known + // edge weights. If the edges weight more than BB, then the + // edge of the last remaining edge is set to zero. + // + // - There exists a self-referential edge and the weight of BB is + // known. In this case, this edge can be based on BB's weight. + // We add up all the other known edges and set the weight on + // the self-referential edge as we did in the previous case. + // + // In any other case, we must continue iterating. Eventually, + // all edges will get a weight, or iteration will stop when + // it reaches SampleProfileMaxPropagateIterations. + if (NumUnknownEdges <= 1) { + uint64_t &BBWeight = BlockWeights[EC]; + if (NumUnknownEdges == 0) { + if (!VisitedBlocks.count(EC)) { + // If we already know the weight of all edges, the weight of the + // basic block can be computed. It should be no larger than the sum + // of all edge weights. + if (TotalWeight > BBWeight) { + BBWeight = TotalWeight; + Changed = true; + LLVM_DEBUG(dbgs() << "All edge weights for " << BB->getName() + << " known. Set weight for block: "; + printBlockWeight(dbgs(), BB);); + } + } else if (NumTotalEdges == 1 && + EdgeWeights[SingleEdge] < BlockWeights[EC]) { + // If there is only one edge for the visited basic block, use the + // block weight to adjust edge weight if edge weight is smaller. + EdgeWeights[SingleEdge] = BlockWeights[EC]; + Changed = true; + } + } else if (NumUnknownEdges == 1 && VisitedBlocks.count(EC)) { + // If there is a single unknown edge and the block has been + // visited, then we can compute E's weight. + if (BBWeight >= TotalWeight) + EdgeWeights[UnknownEdge] = BBWeight - TotalWeight; + else + EdgeWeights[UnknownEdge] = 0; + const BasicBlock *OtherEC; + if (i == 0) + OtherEC = EquivalenceClass[UnknownEdge.first]; + else + OtherEC = EquivalenceClass[UnknownEdge.second]; + // Edge weights should never exceed the BB weights it connects. + if (VisitedBlocks.count(OtherEC) && + EdgeWeights[UnknownEdge] > BlockWeights[OtherEC]) + EdgeWeights[UnknownEdge] = BlockWeights[OtherEC]; + VisitedEdges.insert(UnknownEdge); + Changed = true; + LLVM_DEBUG(dbgs() << "Set weight for edge: "; + printEdgeWeight(dbgs(), UnknownEdge)); + } + } else if (VisitedBlocks.count(EC) && BlockWeights[EC] == 0) { + // If a block Weights 0, all its in/out edges should weight 0. + if (i == 0) { + for (auto *Pred : Predecessors[BB]) { + Edge E = std::make_pair(Pred, BB); + EdgeWeights[E] = 0; + VisitedEdges.insert(E); + } + } else { + for (auto *Succ : Successors[BB]) { + Edge E = std::make_pair(BB, Succ); + EdgeWeights[E] = 0; + VisitedEdges.insert(E); + } + } + } else if (SelfReferentialEdge.first && VisitedBlocks.count(EC)) { + uint64_t &BBWeight = BlockWeights[BB]; + // We have a self-referential edge and the weight of BB is known. + if (BBWeight >= TotalWeight) + EdgeWeights[SelfReferentialEdge] = BBWeight - TotalWeight; + else + EdgeWeights[SelfReferentialEdge] = 0; + VisitedEdges.insert(SelfReferentialEdge); + Changed = true; + LLVM_DEBUG(dbgs() << "Set self-referential edge weight to: "; + printEdgeWeight(dbgs(), SelfReferentialEdge)); + } + if (UpdateBlockCount && !VisitedBlocks.count(EC) && TotalWeight > 0) { + BlockWeights[EC] = TotalWeight; + VisitedBlocks.insert(EC); + Changed = true; + } + } + } + + return Changed; +} + +/// Build in/out edge lists for each basic block in the CFG. +/// +/// We are interested in unique edges. If a block B1 has multiple +/// edges to another block B2, we only add a single B1->B2 edge. +void SampleProfileLoaderBaseImpl::buildEdges(Function &F) { + for (auto &BI : F) { + BasicBlock *B1 = &BI; + + // Add predecessors for B1. + SmallPtrSet<BasicBlock *, 16> Visited; + if (!Predecessors[B1].empty()) + llvm_unreachable("Found a stale predecessors list in a basic block."); + for (BasicBlock *B2 : predecessors(B1)) + if (Visited.insert(B2).second) + Predecessors[B1].push_back(B2); + + // Add successors for B1. + Visited.clear(); + if (!Successors[B1].empty()) + llvm_unreachable("Found a stale successors list in a basic block."); + for (BasicBlock *B2 : successors(B1)) + if (Visited.insert(B2).second) + Successors[B1].push_back(B2); + } +} + +/// Propagate weights into edges +/// +/// The following rules are applied to every block BB in the CFG: +/// +/// - If BB has a single predecessor/successor, then the weight +/// of that edge is the weight of the block. +/// +/// - If all incoming or outgoing edges are known except one, and the +/// weight of the block is already known, the weight of the unknown +/// edge will be the weight of the block minus the sum of all the known +/// edges. If the sum of all the known edges is larger than BB's weight, +/// we set the unknown edge weight to zero. +/// +/// - If there is a self-referential edge, and the weight of the block is +/// known, the weight for that edge is set to the weight of the block +/// minus the weight of the other incoming edges to that block (if +/// known). +void SampleProfileLoaderBaseImpl::propagateWeights(Function &F) { + bool Changed = true; + unsigned I = 0; + + // If BB weight is larger than its corresponding loop's header BB weight, + // use the BB weight to replace the loop header BB weight. + for (auto &BI : F) { + BasicBlock *BB = &BI; + Loop *L = LI->getLoopFor(BB); + if (!L) { + continue; + } + BasicBlock *Header = L->getHeader(); + if (Header && BlockWeights[BB] > BlockWeights[Header]) { + BlockWeights[Header] = BlockWeights[BB]; + } + } + + // Before propagation starts, build, for each block, a list of + // unique predecessors and successors. This is necessary to handle + // identical edges in multiway branches. Since we visit all blocks and all + // edges of the CFG, it is cleaner to build these lists once at the start + // of the pass. + buildEdges(F); + + // Propagate until we converge or we go past the iteration limit. + while (Changed && I++ < SampleProfileMaxPropagateIterations) { + Changed = propagateThroughEdges(F, false); + } + + // The first propagation propagates BB counts from annotated BBs to unknown + // BBs. The 2nd propagation pass resets edges weights, and use all BB weights + // to propagate edge weights. + VisitedEdges.clear(); + Changed = true; + while (Changed && I++ < SampleProfileMaxPropagateIterations) { + Changed = propagateThroughEdges(F, false); + } + + // The 3rd propagation pass allows adjust annotated BB weights that are + // obviously wrong. + Changed = true; + while (Changed && I++ < SampleProfileMaxPropagateIterations) { + Changed = propagateThroughEdges(F, true); + } +} + +/// Generate branch weight metadata for all branches in \p F. +/// +/// Branch weights are computed out of instruction samples using a +/// propagation heuristic. Propagation proceeds in 3 phases: +/// +/// 1- Assignment of block weights. All the basic blocks in the function +/// are initial assigned the same weight as their most frequently +/// executed instruction. +/// +/// 2- Creation of equivalence classes. Since samples may be missing from +/// blocks, we can fill in the gaps by setting the weights of all the +/// blocks in the same equivalence class to the same weight. To compute +/// the concept of equivalence, we use dominance and loop information. +/// Two blocks B1 and B2 are in the same equivalence class if B1 +/// dominates B2, B2 post-dominates B1 and both are in the same loop. +/// +/// 3- Propagation of block weights into edges. This uses a simple +/// propagation heuristic. The following rules are applied to every +/// block BB in the CFG: +/// +/// - If BB has a single predecessor/successor, then the weight +/// of that edge is the weight of the block. +/// +/// - If all the edges are known except one, and the weight of the +/// block is already known, the weight of the unknown edge will +/// be the weight of the block minus the sum of all the known +/// edges. If the sum of all the known edges is larger than BB's weight, +/// we set the unknown edge weight to zero. +/// +/// - If there is a self-referential edge, and the weight of the block is +/// known, the weight for that edge is set to the weight of the block +/// minus the weight of the other incoming edges to that block (if +/// known). +/// +/// Since this propagation is not guaranteed to finalize for every CFG, we +/// only allow it to proceed for a limited number of iterations (controlled +/// by -sample-profile-max-propagate-iterations). +/// +/// FIXME: Try to replace this propagation heuristic with a scheme +/// that is guaranteed to finalize. A work-list approach similar to +/// the standard value propagation algorithm used by SSA-CCP might +/// work here. +/// +/// \param F The function to query. +/// +/// \returns true if \p F was modified. Returns false, otherwise. +bool SampleProfileLoaderBaseImpl::computeAndPropagateWeights( + Function &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) { + bool Changed = (InlinedGUIDs.size() != 0); + + // Compute basic block weights. + Changed |= computeBlockWeights(F); + + if (Changed) { + // Add an entry count to the function using the samples gathered at the + // function entry. + // Sets the GUIDs that are inlined in the profiled binary. This is used + // for ThinLink to make correct liveness analysis, and also make the IR + // match the profiled binary before annotation. + F.setEntryCount( + ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real), + &InlinedGUIDs); + + // Compute dominance and loop info needed for propagation. + computeDominanceAndLoopInfo(F); + + // Find equivalence classes. + findEquivalenceClasses(F); + + // Propagate weights to all edges. + propagateWeights(F); + } + + return Changed; +} + +void SampleProfileLoaderBaseImpl::emitCoverageRemarks(Function &F) { + // If coverage checking was requested, compute it now. + if (SampleProfileRecordCoverage) { + unsigned Used = CoverageTracker.countUsedRecords(Samples, PSI); + unsigned Total = CoverageTracker.countBodyRecords(Samples, PSI); + unsigned Coverage = CoverageTracker.computeCoverage(Used, Total); + if (Coverage < SampleProfileRecordCoverage) { + F.getContext().diagnose(DiagnosticInfoSampleProfile( + F.getSubprogram()->getFilename(), getFunctionLoc(F), + Twine(Used) + " of " + Twine(Total) + " available profile records (" + + Twine(Coverage) + "%) were applied", + DS_Warning)); + } + } + + if (SampleProfileSampleCoverage) { + uint64_t Used = CoverageTracker.getTotalUsedSamples(); + uint64_t Total = CoverageTracker.countBodySamples(Samples, PSI); + unsigned Coverage = CoverageTracker.computeCoverage(Used, Total); + if (Coverage < SampleProfileSampleCoverage) { + F.getContext().diagnose(DiagnosticInfoSampleProfile( + F.getSubprogram()->getFilename(), getFunctionLoc(F), + Twine(Used) + " of " + Twine(Total) + " available profile samples (" + + Twine(Coverage) + "%) were applied", + DS_Warning)); + } + } +} + +/// Get the line number for the function header. +/// +/// This looks up function \p F in the current compilation unit and +/// retrieves the line number where the function is defined. This is +/// line 0 for all the samples read from the profile file. Every line +/// number is relative to this line. +/// +/// \param F Function object to query. +/// +/// \returns the line number where \p F is defined. If it returns 0, +/// it means that there is no debug information available for \p F. +unsigned SampleProfileLoaderBaseImpl::getFunctionLoc(Function &F) { + if (DISubprogram *S = F.getSubprogram()) + return S->getLine(); + + if (NoWarnSampleUnused) + return 0; + + // If the start of \p F is missing, emit a diagnostic to inform the user + // about the missed opportunity. + F.getContext().diagnose(DiagnosticInfoSampleProfile( + "No debug information found in function " + F.getName() + + ": Function profile not used", + DS_Warning)); + return 0; +} + +void SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(Function &F) { + DT.reset(new DominatorTree); + DT->recalculate(F); + + PDT.reset(new PostDominatorTree(F)); + + LI.reset(new LoopInfo); + LI->analyze(*DT); +} + +#undef DEBUG_TYPE + +} // namespace llvm +#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H diff --git a/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h new file mode 100644 index 000000000000..37dc8d8187d9 --- /dev/null +++ b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h @@ -0,0 +1,97 @@ +////===- SampleProfileLoadBaseUtil.h - Profile loader util func --*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +/// \file +/// This file provides the utility functions for the sampled PGO loader base +/// implementation. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H +#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H + +#include "llvm/ADT/DenseMap.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/IR/BasicBlock.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/DebugLoc.h" +#include "llvm/IR/Function.h" +#include "llvm/ProfileData/SampleProf.h" +#include "llvm/Support/CommandLine.h" + +namespace llvm { +using namespace sampleprof; + +extern cl::opt<unsigned> SampleProfileMaxPropagateIterations; +extern cl::opt<unsigned> SampleProfileRecordCoverage; +extern cl::opt<unsigned> SampleProfileSampleCoverage; +extern cl::opt<bool> NoWarnSampleUnused; + +namespace sampleprofutil { + +class SampleCoverageTracker { +public: + bool markSamplesUsed(const FunctionSamples *FS, uint32_t LineOffset, + uint32_t Discriminator, uint64_t Samples); + unsigned computeCoverage(unsigned Used, unsigned Total) const; + unsigned countUsedRecords(const FunctionSamples *FS, + ProfileSummaryInfo *PSI) const; + unsigned countBodyRecords(const FunctionSamples *FS, + ProfileSummaryInfo *PSI) const; + uint64_t getTotalUsedSamples() const { return TotalUsedSamples; } + uint64_t countBodySamples(const FunctionSamples *FS, + ProfileSummaryInfo *PSI) const; + + void clear() { + SampleCoverage.clear(); + TotalUsedSamples = 0; + } + void setProfAccForSymsInList(bool V) { ProfAccForSymsInList = V; } + +private: + using BodySampleCoverageMap = std::map<LineLocation, unsigned>; + using FunctionSamplesCoverageMap = + DenseMap<const FunctionSamples *, BodySampleCoverageMap>; + + /// Coverage map for sampling records. + /// + /// This map keeps a record of sampling records that have been matched to + /// an IR instruction. This is used to detect some form of staleness in + /// profiles (see flag -sample-profile-check-coverage). + /// + /// Each entry in the map corresponds to a FunctionSamples instance. This is + /// another map that counts how many times the sample record at the + /// given location has been used. + FunctionSamplesCoverageMap SampleCoverage; + + /// Number of samples used from the profile. + /// + /// When a sampling record is used for the first time, the samples from + /// that record are added to this accumulator. Coverage is later computed + /// based on the total number of samples available in this function and + /// its callsites. + /// + /// Note that this accumulator tracks samples used from a single function + /// and all the inlined callsites. Strictly, we should have a map of counters + /// keyed by FunctionSamples pointers, but these stats are cleared after + /// every function, so we just need to keep a single counter. + uint64_t TotalUsedSamples = 0; + + // For symbol in profile symbol list, whether to regard their profiles + // to be accurate. This is passed from the SampleLoader instance. + bool ProfAccForSymsInList = false; +}; + +/// Return true if the given callsite is hot wrt to hot cutoff threshold. +bool callsiteIsHot(const FunctionSamples *CallsiteFS, ProfileSummaryInfo *PSI, + bool ProfAccForSymsInList); + +} // end of namespace sampleprofutil +} // end of namespace llvm + +#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H diff --git a/llvm/lib/ProfileData/CMakeLists.txt b/llvm/lib/ProfileData/CMakeLists.txt index 2a377e4d74d3..4125fac918ab 100644 --- a/llvm/lib/ProfileData/CMakeLists.txt +++ b/llvm/lib/ProfileData/CMakeLists.txt @@ -5,6 +5,7 @@ add_llvm_component_library(LLVMProfileData InstrProfWriter.cpp ProfileSummaryBuilder.cpp </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2 - Build # 15 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Culprit: <cut> commit a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 Author: Sanjay Patel <spatel(a)rotateright.com> Date: Wed Aug 11 12:41:47 2021 -0400 [InstCombine] avoid breaking up min/max (cmp+sel) idioms This is a quick fix for a motivating case that looks like this: https://godbolt.org/z/GeMqzMc38 As noted, we might be able to restore the min/max patterns with select folds, or we just wait for this to become easier with canonicalization to min/max intrinsics. </cut> Results regressed to (for first_bad == a0a9c9e188f5b97ff8b74287d1536f57ec5dda54) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-a0a9c9e188f5b97ff8b74287d1536f57ec5dda54/results_id: 1 # 464.h264ref,h264ref_base.default regressed by 106 # 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 146 from (for last_good == 5bf4ab0e79e1a8552019918a662bdf7af8b3825a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-5bf4ab0e79e1a8552019918a662bdf7af8b3825a/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/3875 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/3877 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 cd investigate-llvm-a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5bf4ab0e79e1a8552019918a662bdf7af8b3825a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 Author: Sanjay Patel <spatel(a)rotateright.com> Date: Wed Aug 11 12:41:47 2021 -0400 [InstCombine] avoid breaking up min/max (cmp+sel) idioms This is a quick fix for a motivating case that looks like this: https://godbolt.org/z/GeMqzMc38 As noted, we might be able to restore the min/max patterns with select folds, or we just wait for this to become easier with canonicalization to min/max intrinsics. --- llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp | 12 +++++++++--- llvm/test/Transforms/InstCombine/icmp-add.ll | 13 ++++++------- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp index 2e20bca300d3..71037616585c 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp @@ -5755,9 +5755,6 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) { if (Instruction *Res = foldICmpWithDominatingICmp(I)) return Res; - if (Instruction *Res = foldICmpBinOp(I, Q)) - return Res; - if (Instruction *Res = foldICmpUsingKnownBits(I)) return Res; @@ -5803,6 +5800,15 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) { } } + // The folds in here may rely on wrapping flags and special constants, so + // they can break up min/max idioms in some cases but not seemingly similar + // patterns. + // FIXME: It may be possible to enhance select folding to make this + // unnecessary. It may also be moot if we canonicalize to min/max + // intrinsics. + if (Instruction *Res = foldICmpBinOp(I, Q)) + return Res; + if (Instruction *Res = foldICmpInstWithConstant(I)) return Res; diff --git a/llvm/test/Transforms/InstCombine/icmp-add.ll b/llvm/test/Transforms/InstCombine/icmp-add.ll index 187e0ad1a31b..1750b5685c50 100644 --- a/llvm/test/Transforms/InstCombine/icmp-add.ll +++ b/llvm/test/Transforms/InstCombine/icmp-add.ll @@ -972,7 +972,6 @@ define i1 @slt_offset_nsw(i8 %a, i8 %c) { ret i1 %ov } -; FIXME: ; In the following 4 tests, we could push the inc/dec ; through the min/max, but we should not break up the ; min/max idiom by using different icmp and select @@ -980,9 +979,9 @@ define i1 @slt_offset_nsw(i8 %a, i8 %c) { define i32 @increment_max(i32 %x) { ; CHECK-LABEL: @increment_max( -; CHECK-NEXT: [[A:%.*]] = add nsw i32 [[X:%.*]], 1 -; CHECK-NEXT: [[C_INV:%.*]] = icmp slt i32 [[X]], 0 -; CHECK-NEXT: [[S:%.*]] = select i1 [[C_INV]], i32 0, i32 [[A]] +; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], -1 +; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 -1 +; CHECK-NEXT: [[S:%.*]] = add nsw i32 [[TMP2]], 1 ; CHECK-NEXT: ret i32 [[S]] ; %a = add nsw i32 %x, 1 @@ -1019,9 +1018,9 @@ define i32 @increment_min(i32 %x) { define i32 @decrement_min(i32 %x) { ; CHECK-LABEL: @decrement_min( -; CHECK-NEXT: [[A:%.*]] = add nsw i32 [[X:%.*]], -1 -; CHECK-NEXT: [[C_INV:%.*]] = icmp sgt i32 [[X]], 0 -; CHECK-NEXT: [[S:%.*]] = select i1 [[C_INV]], i32 0, i32 [[A]] +; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], 1 +; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 1 +; CHECK-NEXT: [[S:%.*]] = add nsw i32 [[TMP2]], -1 ; CHECK-NEXT: ret i32 [[S]] ; %a = add nsw i32 %x, -1 </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Os - Build # 4 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os Culprit: <cut> commit dae7adda949993bd96aa50c551dc64ddebba7923 Author: Matt Jacobson <mhjacobson(a)me.com> Date: Fri Aug 6 10:12:00 2021 +0800 [AVR][clang] Pass '-fno-use-init-array' to cc1 as default On AVR, '.ctors' is used, not '.init_array'. Make this the default unless specifically overridden by driver argument. This matches gcc, and it matches the behavior in (e.g.) the NetBSD driver (for certain OS variants). Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D107610 </cut> Results regressed to (for first_bad == dae7adda949993bd96aa50c551dc64ddebba7923) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-dae7adda949993bd96aa50c551dc64ddebba7923/results_id: 1 # 400.perlbench,perlbench_base.default regressed by 94393 # 453.povray,povray_base.default regressed by 102 # 470.lbm,lbm_base.default regressed by 103 # 470.lbm,[.] LBM_performStreamCollide regressed by 118 from (for last_good == 66b1e629d89543cb7542c184f7dfb32deee732e1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-66b1e629d89543cb7542c184f7dfb32deee732e1/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Os/3855 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Os/3852 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-dae7adda949993bd96aa50c551dc64ddebba7923 cd investigate-llvm-dae7adda949993bd96aa50c551dc64ddebba7923 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach dae7adda949993bd96aa50c551dc64ddebba7923 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 66b1e629d89543cb7542c184f7dfb32deee732e1 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit dae7adda949993bd96aa50c551dc64ddebba7923 Author: Matt Jacobson <mhjacobson(a)me.com> Date: Fri Aug 6 10:12:00 2021 +0800 [AVR][clang] Pass '-fno-use-init-array' to cc1 as default On AVR, '.ctors' is used, not '.init_array'. Make this the default unless specifically overridden by driver argument. This matches gcc, and it matches the behavior in (e.g.) the NetBSD driver (for certain OS variants). Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D107610 --- clang/lib/Driver/ToolChains/AVR.cpp | 10 ++++++++++ clang/lib/Driver/ToolChains/AVR.h | 7 ++++++- clang/test/Driver/avr-toolchain.c | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/clang/lib/Driver/ToolChains/AVR.cpp b/clang/lib/Driver/ToolChains/AVR.cpp index 5b097f9b2ed9..18c6f41e22b1 100644 --- a/clang/lib/Driver/ToolChains/AVR.cpp +++ b/clang/lib/Driver/ToolChains/AVR.cpp @@ -370,6 +370,16 @@ void AVRToolChain::AddClangSystemIncludeArgs(const ArgList &DriverArgs, addSystemInclude(DriverArgs, CC1Args, AVRInc); } +void AVRToolChain::addClangTargetOptions( + const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args, + Action::OffloadKind DeviceOffloadKind) const { + // By default, use `.ctors` (not `.init_array`), as required by libgcc, which + // runs constructors/destructors on AVR. + if (!DriverArgs.hasFlag(options::OPT_fuse_init_array, + options::OPT_fno_use_init_array, false)) + CC1Args.push_back("-fno-use-init-array"); +} + Tool *AVRToolChain::buildLinker() const { return new tools::AVR::Linker(getTriple(), *this, LinkStdlib); } diff --git a/clang/lib/Driver/ToolChains/AVR.h b/clang/lib/Driver/ToolChains/AVR.h index f612aa691182..2d027957ed76 100644 --- a/clang/lib/Driver/ToolChains/AVR.h +++ b/clang/lib/Driver/ToolChains/AVR.h @@ -11,8 +11,8 @@ #include "Gnu.h" #include "clang/Driver/InputInfo.h" -#include "clang/Driver/ToolChain.h" #include "clang/Driver/Tool.h" +#include "clang/Driver/ToolChain.h" namespace clang { namespace driver { @@ -26,6 +26,11 @@ public: AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args) const override; + void + addClangTargetOptions(const llvm::opt::ArgList &DriverArgs, + llvm::opt::ArgStringList &CC1Args, + Action::OffloadKind DeviceOffloadKind) const override; + protected: Tool *buildLinker() const override; diff --git a/clang/test/Driver/avr-toolchain.c b/clang/test/Driver/avr-toolchain.c index 692063dc2c34..877f650a3d02 100644 --- a/clang/test/Driver/avr-toolchain.c +++ b/clang/test/Driver/avr-toolchain.c @@ -1,7 +1,7 @@ // A basic clang -cc1 command-line. // RUN: %clang %s -### -no-canonical-prefixes -target avr 2>&1 | FileCheck -check-prefix=CC1 %s -// CC1: clang{{.*}} "-cc1" "-triple" "avr" +// CC1: clang{{.*}} "-cc1" "-triple" "avr" {{.*}} "-fno-use-init-array" // RUN: %clang %s -### -no-canonical-prefixes -target avr --sysroot %S/Inputs/basic_avr_tree 2>&1 | FileCheck -check-prefix CC1A %s // CC1A: clang{{.*}} "-cc1" "-triple" "avr" {{.*}} "-internal-isystem" {{".*avr/include"}} </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3 - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3 Culprit: <cut> commit cd6de0e8de4a5fd558580be4b1a07116914fc8ed Author: Sjoerd Meijer <sjoerd.meijer(a)arm.com> Date: Fri Feb 12 15:15:05 2021 +0000 [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600 </cut> Results regressed to (for first_bad == cd6de0e8de4a5fd558580be4b1a07116914fc8ed) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-cd6de0e8de4a5fd558580be4b1a07116914fc8ed/results_id: 1 # 482.sphinx3,sphinx_livepretend_base.default regressed by 103 # 482.sphinx3,[.] vector_gautbl_eval_logs3 regressed by 111 from (for last_good == 4bd5bd40094c7b8b691cf394d813efc48d82acfd) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-4bd5bd40094c7b8b691cf394d813efc48d82acfd/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3835 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3840 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-cd6de0e8de4a5fd558580be4b1a07116914fc8ed cd investigate-llvm-cd6de0e8de4a5fd558580be4b1a07116914fc8ed git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach cd6de0e8de4a5fd558580be4b1a07116914fc8ed ../artifacts/test.sh # Reproduce last_good build git checkout --detach 4bd5bd40094c7b8b691cf394d813efc48d82acfd ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit cd6de0e8de4a5fd558580be4b1a07116914fc8ed Author: Sjoerd Meijer <sjoerd.meijer(a)arm.com> Date: Fri Feb 12 15:15:05 2021 +0000 [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600 --- llvm/include/llvm/Analysis/TargetTransformInfo.h | 25 ++++++++++++---------- .../llvm/Analysis/TargetTransformInfoImpl.h | 7 +++--- llvm/lib/Analysis/TargetTransformInfo.cpp | 10 ++++----- llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | 22 ++++++++++--------- llvm/lib/Target/ARM/ARMTargetTransformInfo.h | 4 ++-- .../Target/Hexagon/HexagonTargetTransformInfo.cpp | 5 +++-- .../Target/Hexagon/HexagonTargetTransformInfo.h | 3 ++- llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp | 24 ++++++++++++--------- 8 files changed, 55 insertions(+), 45 deletions(-) diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index c3d7d2cc80a4..79303dab92a2 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -638,13 +638,15 @@ public: DominatorTree *DT, AssumptionCache *AC, TargetLibraryInfo *LibInfo) const; - /// \return True is LSR should make efforts to create/preserve post-inc - /// addressing mode expressions. - bool shouldFavorPostInc() const; + enum AddressingModeKind { + AMK_PreIndexed, + AMK_PostIndexed, + AMK_None + }; - /// Return true if LSR should make efforts to generate indexed addressing - /// modes that operate across loop iterations. - bool shouldFavorBackedgeIndex(const Loop *L) const; + /// Return the preferred addressing mode LSR should make efforts to generate. + AddressingModeKind getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const; /// Return true if the target supports masked store. bool isLegalMaskedStore(Type *DataType, Align Alignment) const; @@ -1454,8 +1456,8 @@ public: virtual bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE, LoopInfo *LI, DominatorTree *DT, AssumptionCache *AC, TargetLibraryInfo *LibInfo) = 0; - virtual bool shouldFavorPostInc() const = 0; - virtual bool shouldFavorBackedgeIndex(const Loop *L) const = 0; + virtual AddressingModeKind + getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const = 0; virtual bool isLegalMaskedStore(Type *DataType, Align Alignment) = 0; virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment) = 0; virtual bool isLegalNTStore(Type *DataType, Align Alignment) = 0; @@ -1796,9 +1798,10 @@ public: TargetLibraryInfo *LibInfo) override { return Impl.canSaveCmp(L, BI, SE, LI, DT, AC, LibInfo); } - bool shouldFavorPostInc() const override { return Impl.shouldFavorPostInc(); } - bool shouldFavorBackedgeIndex(const Loop *L) const override { - return Impl.shouldFavorBackedgeIndex(L); + AddressingModeKind + getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const override { + return Impl.getPreferredAddressingMode(L, SE); } bool isLegalMaskedStore(Type *DataType, Align Alignment) override { return Impl.isLegalMaskedStore(DataType, Alignment); diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index 84de5038df42..a9c9d3cb9f4f 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -209,9 +209,10 @@ public: return false; } - bool shouldFavorPostInc() const { return false; } - - bool shouldFavorBackedgeIndex(const Loop *L) const { return false; } + TTI::AddressingModeKind + getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const { + return TTI::AMK_None; + } bool isLegalMaskedStore(Type *DataType, Align Alignment) const { return false; diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 16992d099e0a..3db4b0b0d553 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -409,12 +409,10 @@ bool TargetTransformInfo::canSaveCmp(Loop *L, BranchInst **BI, return TTIImpl->canSaveCmp(L, BI, SE, LI, DT, AC, LibInfo); } -bool TargetTransformInfo::shouldFavorPostInc() const { - return TTIImpl->shouldFavorPostInc(); -} - -bool TargetTransformInfo::shouldFavorBackedgeIndex(const Loop *L) const { - return TTIImpl->shouldFavorBackedgeIndex(L); +TTI::AddressingModeKind +TargetTransformInfo::getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const { + return TTIImpl->getPreferredAddressingMode(L, SE); } bool TargetTransformInfo::isLegalMaskedStore(Type *DataType, diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 80f1f2a2a8f7..8c2a79efc674 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -100,18 +100,20 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller, return MatchExact && MatchSubset; } -bool ARMTTIImpl::shouldFavorBackedgeIndex(const Loop *L) const { - if (L->getHeader()->getParent()->hasOptSize()) - return false; +TTI::AddressingModeKind +ARMTTIImpl::getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const { if (ST->hasMVEIntegerOps()) - return false; - return ST->isMClass() && ST->isThumb2() && L->getNumBlocks() == 1; -} + return TTI::AMK_PostIndexed; -bool ARMTTIImpl::shouldFavorPostInc() const { - if (ST->hasMVEIntegerOps()) - return true; - return false; + if (L->getHeader()->getParent()->hasOptSize()) + return TTI::AMK_None; + + if (ST->isMClass() && ST->isThumb2() && + L->getNumBlocks() == 1) + return TTI::AMK_PreIndexed; + + return TTI::AMK_None; } Optional<Instruction *> diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h index b8de27101a61..808128929000 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h @@ -103,8 +103,8 @@ public: bool enableInterleavedAccessVectorization() { return true; } - bool shouldFavorBackedgeIndex(const Loop *L) const; - bool shouldFavorPostInc() const; + TTI::AddressingModeKind + getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const; /// Floating-point computation using ARMv8 AArch32 Advanced /// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp index af7bc4682249..89e7df0aa27e 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp @@ -80,8 +80,9 @@ void HexagonTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, } } -bool HexagonTTIImpl::shouldFavorPostInc() const { - return true; +AddressingModeKind::getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const { + return AMK_PostIndexed; } /// --- Vector TTI begin --- diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h index dc075d6147b6..ebaa619837f0 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h @@ -67,7 +67,8 @@ public: TTI::PeelingPreferences &PP); /// Bias LSR towards creating post-increment opportunities. - bool shouldFavorPostInc() const; + AddressingModeKind getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const; // L1 cache prefetch. unsigned getPrefetchDistance() const override; diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp index 5dec9b542076..2f90df70a3c3 100644 --- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp +++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp @@ -1227,13 +1227,15 @@ static unsigned getSetupCost(const SCEV *Reg, unsigned Depth) { /// Tally up interesting quantities from the given register. void Cost::RateRegister(const Formula &F, const SCEV *Reg, SmallPtrSetImpl<const SCEV *> &Regs) { + TTI::AddressingModeKind AMK = TTI->getPreferredAddressingMode(L, SE); + if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) { // If this is an addrec for another loop, it should be an invariant // with respect to L since L is the innermost loop (at least // for now LSR only handles innermost loops). if (AR->getLoop() != L) { // If the AddRec exists, consider it's register free and leave it alone. - if (isExistingPhi(AR, *SE) && !TTI->shouldFavorPostInc()) + if (isExistingPhi(AR, *SE) && AMK != TTI::AMK_PostIndexed) return; // It is bad to allow LSR for current loop to add induction variables @@ -1254,13 +1256,11 @@ void Cost::RateRegister(const Formula &F, const SCEV *Reg, // If the step size matches the base offset, we could use pre-indexed // addressing. - if (TTI->shouldFavorBackedgeIndex(L)) { + if (AMK == TTI::AMK_PreIndexed) { if (auto *Step = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*SE))) if (Step->getAPInt() == F.BaseOffset) LoopCost = 0; - } - - if (TTI->shouldFavorPostInc()) { + } else if (AMK == TTI::AMK_PostIndexed) { const SCEV *LoopStep = AR->getStepRecurrence(*SE); if (isa<SCEVConstant>(LoopStep)) { const SCEV *LoopStart = AR->getStart(); @@ -3575,7 +3575,8 @@ void LSRInstance::GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx, // may generate a post-increment operator. The reason is that the // reassociations cause extra base+register formula to be created, // and possibly chosen, but the post-increment is more efficient. - if (TTI.shouldFavorPostInc() && mayUsePostIncMode(TTI, LU, BaseReg, L, SE)) + TTI::AddressingModeKind AMK = TTI.getPreferredAddressingMode(L, &SE); + if (AMK == TTI::AMK_PostIndexed && mayUsePostIncMode(TTI, LU, BaseReg, L, SE)) return; SmallVector<const SCEV *, 8> AddOps; const SCEV *Remainder = CollectSubexprs(BaseReg, nullptr, AddOps, L, SE); @@ -4239,7 +4240,8 @@ void LSRInstance::GenerateCrossUseConstantOffsets() { NewF.BaseOffset = (uint64_t)NewF.BaseOffset + Imm; if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, NewF)) { - if (TTI.shouldFavorPostInc() && + if (TTI.getPreferredAddressingMode(this->L, &SE) == + TTI::AMK_PostIndexed && mayUsePostIncMode(TTI, LU, OrigReg, this->L, SE)) continue; if (!TTI.isLegalAddImmediate((uint64_t)NewF.UnfoldedOffset + Imm)) @@ -4679,7 +4681,7 @@ void LSRInstance::NarrowSearchSpaceByFilterFormulaWithSameScaledReg() { /// If we are over the complexity limit, filter out any post-inc prefering /// variables to only post-inc values. void LSRInstance::NarrowSearchSpaceByFilterPostInc() { - if (!TTI.shouldFavorPostInc()) + if (TTI.getPreferredAddressingMode(L, &SE) != TTI::AMK_PostIndexed) return; if (EstimateSearchSpaceComplexity() < ComplexityLimit) return; @@ -4978,7 +4980,8 @@ void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution, // This can sometimes (notably when trying to favour postinc) lead to // sub-optimial decisions. There it is best left to the cost modelling to // get correct. - if (!TTI.shouldFavorPostInc() || LU.Kind != LSRUse::Address) { + if (TTI.getPreferredAddressingMode(L, &SE) != TTI::AMK_PostIndexed || + LU.Kind != LSRUse::Address) { int NumReqRegsToFind = std::min(F.getNumRegs(), ReqRegs.size()); for (const SCEV *Reg : ReqRegs) { if ((F.ScaledReg && F.ScaledReg == Reg) || @@ -5560,7 +5563,8 @@ LSRInstance::LSRInstance(Loop *L, IVUsers &IU, ScalarEvolution &SE, TargetLibraryInfo &TLI, MemorySSAUpdater *MSSAU) : IU(IU), SE(SE), DT(DT), LI(LI), AC(AC), TLI(TLI), TTI(TTI), L(L), MSSAU(MSSAU), FavorBackedgeIndex(EnableBackedgeIndexing && - TTI.shouldFavorBackedgeIndex(L)) { + TTI.getPreferredAddressingMode(L, &SE) == + TTI::AMK_PreIndexed) { // If LoopSimplify form is not available, stay out of trouble. if (!L->isLoopSimplifyForm()) return; </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-lts-allmodconfig - Build # 6 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-lts-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-lts-allmodconfig Culprit: <cut> commit 132a8267adabd645476b542b3b132c1b91988fe8 Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Date: Thu Aug 12 13:22:21 2021 +0200 Linux 5.10.58 Link: https://lore.kernel.org/r/20210810172955.660225700@linuxfoundation.org Tested-by: Fox Chen <foxhlchen(a)gmail.com> Tested-by: Hulk Robot <hulkrobot(a)huawei.com> Tested-by: Sudip Mukherjee <sudip.mukherjee(a)codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org> Tested-by: Guenter Roeck <linux(a)roeck-us.net> Tested-by: Shuah Khan <skhan(a)linuxfoundation.org> Tested-by: Aakash Hemadri <aakashhemadri123(a)gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> </cut> Results regressed to (for first_bad == 132a8267adabd645476b542b3b132c1b91988fe8) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 28702 # linux build successful: all # First few build errors in logs: from (for last_good == 3d7d1b0f5f41d66a2d177f9fdcdb32e23a4b2513) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 28702 # linux build successful: all # linux boot successful: boot Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-132a8267adabd645476b542b3b132c1b91988fe8 cd investigate-linux-132a8267adabd645476b542b3b132c1b91988fe8 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 132a8267adabd645476b542b3b132c1b91988fe8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3d7d1b0f5f41d66a2d177f9fdcdb32e23a4b2513 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Full commit (up to 1000 lines): <cut> commit 132a8267adabd645476b542b3b132c1b91988fe8 Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Date: Thu Aug 12 13:22:21 2021 +0200 Linux 5.10.58 Link: https://lore.kernel.org/r/20210810172955.660225700@linuxfoundation.org Tested-by: Fox Chen <foxhlchen(a)gmail.com> Tested-by: Hulk Robot <hulkrobot(a)huawei.com> Tested-by: Sudip Mukherjee <sudip.mukherjee(a)codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org> Tested-by: Guenter Roeck <linux(a)roeck-us.net> Tested-by: Shuah Khan <skhan(a)linuxfoundation.org> Tested-by: Aakash Hemadri <aakashhemadri123(a)gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e9621a90e752..232dee1140c1 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 10 -SUBLEVEL = 57 +SUBLEVEL = 58 EXTRAVERSION = NAME = Dare mighty things </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2_LTO - Build # 22 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO Culprit: <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. </cut> Results regressed to (for first_bad == 4389a413e2129d7d55ee779638b649aa852b6f8a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-4389a413e2129d7d55ee779638b649aa852b6f8a/results_id: 1 # 444.namd,namd_base.default regressed by 104 # 447.dealII,dealII_base.default regressed by 105 from (for last_good == dfce2909ee1ea1523ec27b834a0e56429e9c2beb) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-dfce2909ee1ea1523ec27b834a0e56429e9c2beb/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/3823 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/3819 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a cd investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4389a413e2129d7d55ee779638b649aa852b6f8a ../artifacts/test.sh # Reproduce last_good build git checkout --detach dfce2909ee1ea1523ec27b834a0e56429e9c2beb ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 838669794ea8..980d0ab45975 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 1c79640be80f..96bbc0250126 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2641,7 +2641,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2666,7 +2666,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2686,18 +2686,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2783,11 +2785,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2897,17 +2897,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-aarch64-bootstrap_ubsan - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-aarch64-bootstrap_ubsan. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_ubsan Culprit: <cut> commit d1819df86fbe42125cccb2fc2959a0bf51e524d6 Author: Jonathan Wright <jonathan.wright(a)arm.com> Date: Mon Aug 16 14:37:18 2021 +0100 aarch64: Remove macros for vld4[q]_lane Neon intrinsics Remove macros for vld4[q]_lane Neon intrinsics. This is a preparatory step before adding new modes for structures of Advanced SIMD vectors. gcc/ChangeLog: 2021-08-16 Jonathan Wright <jonathan.wright(a)arm.com> * config/aarch64/arm_neon.h (__LD4_LANE_FUNC): Delete. (__LD4Q_LANE_FUNC): Likewise. (vld4_lane_u8): Define without macro. (vld4_lane_u16): Likewise. (vld4_lane_u32): Likewise. (vld4_lane_u64): Likewise. (vld4_lane_s8): Likewise. (vld4_lane_s16): Likewise. (vld4_lane_s32): Likewise. (vld4_lane_s64): Likewise. (vld4_lane_f16): Likewise. (vld4_lane_f32): Likewise. (vld4_lane_f64): Likewise. (vld4_lane_p8): Likewise. (vld4_lane_p16): Likewise. (vld4_lane_p64): Likewise. (vld4q_lane_u8): Likewise. (vld4q_lane_u16): Likewise. (vld4q_lane_u32): Likewise. (vld4q_lane_u64): Likewise. (vld4q_lane_s8): Likewise. (vld4q_lane_s16): Likewise. (vld4q_lane_s32): Likewise. (vld4q_lane_s64): Likewise. (vld4q_lane_f16): Likewise. (vld4q_lane_f32): Likewise. (vld4q_lane_f64): Likewise. (vld4q_lane_p8): Likewise. (vld4q_lane_p16): Likewise. (vld4q_lane_p64): Likewise. (vld4_lane_bf16): Likewise. (vld4q_lane_bf16): Likewise. </cut> Results regressed to (for first_bad == d1819df86fbe42125cccb2fc2959a0bf51e524d6) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:10:53 make[3]: [Makefile:1769: aarch64-unknown-linux-gnu/bits/largefile-config.h] Error 1 (ignored) # 00:10:53 make[3]: [Makefile:1770: aarch64-unknown-linux-gnu/bits/largefile-config.h] Error 1 (ignored) # 00:26:15 /home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:21081:11: error: cannot convert ‘float*’ to ‘const int*’ # 00:26:15 /home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:21384:9: error: cannot convert ‘long int*’ to ‘const double*’ # 00:26:16 make[3]: *** [Makefile:226: lex.o] Error 1 # 00:26:30 make[2]: *** [Makefile:9758: all-stage2-libcpp] Error 2 # 00:28:15 make[1]: *** [Makefile:25899: stage2-bubble] Error 2 # 00:28:15 make: *** [Makefile:1010: all] Error 2 from (for last_good == 08f83812e5c5fdd9a7a4a1b9e46bb33725185c5a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_ubsan: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-d1819df86fbe42125cccb2fc2959a0bf51e524d6 cd investigate-gcc-d1819df86fbe42125cccb2fc2959a0bf51e524d6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach d1819df86fbe42125cccb2fc2959a0bf51e524d6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 08f83812e5c5fdd9a7a4a1b9e46bb33725185c5a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Full commit (up to 1000 lines): <cut> commit d1819df86fbe42125cccb2fc2959a0bf51e524d6 Author: Jonathan Wright <jonathan.wright(a)arm.com> Date: Mon Aug 16 14:37:18 2021 +0100 aarch64: Remove macros for vld4[q]_lane Neon intrinsics Remove macros for vld4[q]_lane Neon intrinsics. This is a preparatory step before adding new modes for structures of Advanced SIMD vectors. gcc/ChangeLog: 2021-08-16 Jonathan Wright <jonathan.wright(a)arm.com> * config/aarch64/arm_neon.h (__LD4_LANE_FUNC): Delete. (__LD4Q_LANE_FUNC): Likewise. (vld4_lane_u8): Define without macro. (vld4_lane_u16): Likewise. (vld4_lane_u32): Likewise. (vld4_lane_u64): Likewise. (vld4_lane_s8): Likewise. (vld4_lane_s16): Likewise. (vld4_lane_s32): Likewise. (vld4_lane_s64): Likewise. (vld4_lane_f16): Likewise. (vld4_lane_f32): Likewise. (vld4_lane_f64): Likewise. (vld4_lane_p8): Likewise. (vld4_lane_p16): Likewise. (vld4_lane_p64): Likewise. (vld4q_lane_u8): Likewise. (vld4q_lane_u16): Likewise. (vld4q_lane_u32): Likewise. (vld4q_lane_u64): Likewise. (vld4q_lane_s8): Likewise. (vld4q_lane_s16): Likewise. (vld4q_lane_s32): Likewise. (vld4q_lane_s64): Likewise. (vld4q_lane_f16): Likewise. (vld4q_lane_f32): Likewise. (vld4q_lane_f64): Likewise. (vld4q_lane_p8): Likewise. (vld4q_lane_p16): Likewise. (vld4q_lane_p64): Likewise. (vld4_lane_bf16): Likewise. (vld4q_lane_bf16): Likewise. --- gcc/config/aarch64/arm_neon.h | 728 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 624 insertions(+), 104 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 29b62988a91..d8b29706a20 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -20856,110 +20856,595 @@ vld3q_lane_p64 (const poly64_t * __ptr, poly64x2x3_t __b, const int __c) /* vld4_lane */ -#define __LD4_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \ - qmode, ptrmode, funcsuffix, signedtype) \ -__extension__ extern __inline intype \ -__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \ -vld4_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ -{ \ - __builtin_aarch64_simd_xi __o; \ - largetype __temp; \ - __temp.val[0] = \ - vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \ - __temp.val[1] = \ - vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \ - __temp.val[2] = \ - vcombine_##funcsuffix (__b.val[2], vcreate_##funcsuffix (0)); \ - __temp.val[3] = \ - vcombine_##funcsuffix (__b.val[3], vcreate_##funcsuffix (0)); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[0], \ - 0); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[1], \ - 1); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[2], \ - 2); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[3], \ - 3); \ - __o = __builtin_aarch64_ld4_lane##mode ( \ - (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \ - __b.val[0] = (vectype) __builtin_aarch64_get_dregxidi (__o, 0); \ - __b.val[1] = (vectype) __builtin_aarch64_get_dregxidi (__o, 1); \ - __b.val[2] = (vectype) __builtin_aarch64_get_dregxidi (__o, 2); \ - __b.val[3] = (vectype) __builtin_aarch64_get_dregxidi (__o, 3); \ - return __b; \ +__extension__ extern __inline uint8x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u8 (const uint8_t * __ptr, uint8x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint8x16x4_t __temp; + __temp.val[0] = vcombine_u8 (__b.val[0], vcreate_u8 (0)); + __temp.val[1] = vcombine_u8 (__b.val[1], vcreate_u8 (0)); + __temp.val[2] = vcombine_u8 (__b.val[2], vcreate_u8 (0)); + __temp.val[3] = vcombine_u8 (__b.val[3], vcreate_u8 (0)); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + __b.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; } -/* vld4q_lane */ +__extension__ extern __inline uint16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u16 (const uint16_t * __ptr, uint16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint16x8x4_t __temp; + __temp.val[0] = vcombine_u16 (__b.val[0], vcreate_u16 (0)); + __temp.val[1] = vcombine_u16 (__b.val[1], vcreate_u16 (0)); + __temp.val[2] = vcombine_u16 (__b.val[2], vcreate_u16 (0)); + __temp.val[3] = vcombine_u16 (__b.val[3], vcreate_u16 (0)); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + __b.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline uint32x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u32 (const uint32_t * __ptr, uint32x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint32x4x4_t __temp; + __temp.val[0] = vcombine_u32 (__b.val[0], vcreate_u32 (0)); + __temp.val[1] = vcombine_u32 (__b.val[1], vcreate_u32 (0)); + __temp.val[2] = vcombine_u32 (__b.val[2], vcreate_u32 (0)); + __temp.val[3] = vcombine_u32 (__b.val[3], vcreate_u32 (0)); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + __b.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline uint64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u64 (const uint64_t * __ptr, uint64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint64x2x4_t __temp; + __temp.val[0] = vcombine_u64 (__b.val[0], vcreate_u64 (0)); + __temp.val[1] = vcombine_u64 (__b.val[1], vcreate_u64 (0)); + __temp.val[2] = vcombine_u64 (__b.val[2], vcreate_u64 (0)); + __temp.val[3] = vcombine_u64 (__b.val[3], vcreate_u64 (0)); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + __b.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int8x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s8 (const int8_t * __ptr, int8x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int8x16x4_t __temp; + __temp.val[0] = vcombine_s8 (__b.val[0], vcreate_s8 (0)); + __temp.val[1] = vcombine_s8 (__b.val[1], vcreate_s8 (0)); + __temp.val[2] = vcombine_s8 (__b.val[2], vcreate_s8 (0)); + __temp.val[3] = vcombine_s8 (__b.val[3], vcreate_s8 (0)); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + __b.val[0] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s16 (const int16_t * __ptr, int16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int16x8x4_t __temp; + __temp.val[0] = vcombine_s16 (__b.val[0], vcreate_s16 (0)); + __temp.val[1] = vcombine_s16 (__b.val[1], vcreate_s16 (0)); + __temp.val[2] = vcombine_s16 (__b.val[2], vcreate_s16 (0)); + __temp.val[3] = vcombine_s16 (__b.val[3], vcreate_s16 (0)); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + __b.val[0] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int32x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s32 (const int32_t * __ptr, int32x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int32x4x4_t __temp; + __temp.val[0] = vcombine_s32 (__b.val[0], vcreate_s32 (0)); + __temp.val[1] = vcombine_s32 (__b.val[1], vcreate_s32 (0)); + __temp.val[2] = vcombine_s32 (__b.val[2], vcreate_s32 (0)); + __temp.val[3] = vcombine_s32 (__b.val[3], vcreate_s32 (0)); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + __b.val[0] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s64 (const int64_t * __ptr, int64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int64x2x4_t __temp; + __temp.val[0] = vcombine_s64 (__b.val[0], vcreate_s64 (0)); + __temp.val[1] = vcombine_s64 (__b.val[1], vcreate_s64 (0)); + __temp.val[2] = vcombine_s64 (__b.val[2], vcreate_s64 (0)); + __temp.val[3] = vcombine_s64 (__b.val[3], vcreate_s64 (0)); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + __b.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline float16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_f16 (const float16_t * __ptr, float16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float16x8x4_t __temp; + __temp.val[0] = vcombine_f16 (__b.val[0], vcreate_f16 (0)); + __temp.val[1] = vcombine_f16 (__b.val[1], vcreate_f16 (0)); + __temp.val[2] = vcombine_f16 (__b.val[2], vcreate_f16 (0)); + __temp.val[3] = vcombine_f16 (__b.val[3], vcreate_f16 (0)); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hf ( + (__builtin_aarch64_simd_hf *) __ptr, __o, __c); + __b.val[0] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline float32x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_f32 (const float32_t * __ptr, float32x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float32x4x4_t __temp; + __temp.val[0] = vcombine_f32 (__b.val[0], vcreate_f32 (0)); + __temp.val[1] = vcombine_f32 (__b.val[1], vcreate_f32 (0)); + __temp.val[2] = vcombine_f32 (__b.val[2], vcreate_f32 (0)); + __temp.val[3] = vcombine_f32 (__b.val[3], vcreate_f32 (0)); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2si ( + (__builtin_aarch64_simd_sf *) __ptr, __o, __c); + __b.val[0] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline float64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_f64 (const float64_t * __ptr, float64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float64x2x4_t __temp; + __temp.val[0] = vcombine_f64 (__b.val[0], vcreate_f64 (0)); + __temp.val[1] = vcombine_f64 (__b.val[1], vcreate_f64 (0)); + __temp.val[2] = vcombine_f64 (__b.val[2], vcreate_f64 (0)); + __temp.val[3] = vcombine_f64 (__b.val[3], vcreate_f64 (0)); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedf ( + (__builtin_aarch64_simd_df *) __ptr, __o, __c); + __b.val[0] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline poly8x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_p8 (const poly8_t * __ptr, poly8x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly8x16x4_t __temp; + __temp.val[0] = vcombine_p8 (__b.val[0], vcreate_p8 (0)); + __temp.val[1] = vcombine_p8 (__b.val[1], vcreate_p8 (0)); + __temp.val[2] = vcombine_p8 (__b.val[2], vcreate_p8 (0)); + __temp.val[3] = vcombine_p8 (__b.val[3], vcreate_p8 (0)); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + __b.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} -__LD4_LANE_FUNC (float16x4x4_t, float16x4_t, float16x8x4_t, float16_t, v4hf, - v8hf, hf, f16, float16x8_t) -__LD4_LANE_FUNC (float32x2x4_t, float32x2_t, float32x4x4_t, float32_t, v2sf, v4sf, - sf, f32, float32x4_t) -__LD4_LANE_FUNC (float64x1x4_t, float64x1_t, float64x2x4_t, float64_t, df, v2df, - df, f64, float64x2_t) -__LD4_LANE_FUNC (poly8x8x4_t, poly8x8_t, poly8x16x4_t, poly8_t, v8qi, v16qi, qi, p8, - int8x16_t) -__LD4_LANE_FUNC (poly16x4x4_t, poly16x4_t, poly16x8x4_t, poly16_t, v4hi, v8hi, hi, - p16, int16x8_t) -__LD4_LANE_FUNC (poly64x1x4_t, poly64x1_t, poly64x2x4_t, poly64_t, di, - v2di_ssps, di, p64, poly64x2_t) -__LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v8qi, v16qi, qi, s8, - int8x16_t) -__LD4_LANE_FUNC (int16x4x4_t, int16x4_t, int16x8x4_t, int16_t, v4hi, v8hi, hi, s16, - int16x8_t) -__LD4_LANE_FUNC (int32x2x4_t, int32x2_t, int32x4x4_t, int32_t, v2si, v4si, si, s32, - int32x4_t) -__LD4_LANE_FUNC (int64x1x4_t, int64x1_t, int64x2x4_t, int64_t, di, v2di, di, s64, - int64x2_t) -__LD4_LANE_FUNC (uint8x8x4_t, uint8x8_t, uint8x16x4_t, uint8_t, v8qi, v16qi, qi, u8, - int8x16_t) -__LD4_LANE_FUNC (uint16x4x4_t, uint16x4_t, uint16x8x4_t, uint16_t, v4hi, v8hi, hi, - u16, int16x8_t) -__LD4_LANE_FUNC (uint32x2x4_t, uint32x2_t, uint32x4x4_t, uint32_t, v2si, v4si, si, - u32, int32x4_t) -__LD4_LANE_FUNC (uint64x1x4_t, uint64x1_t, uint64x2x4_t, uint64_t, di, v2di, di, - u64, int64x2_t) +__extension__ extern __inline poly16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_p16 (const poly16_t * __ptr, poly16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly16x8x4_t __temp; + __temp.val[0] = vcombine_p16 (__b.val[0], vcreate_p16 (0)); + __temp.val[1] = vcombine_p16 (__b.val[1], vcreate_p16 (0)); + __temp.val[2] = vcombine_p16 (__b.val[2], vcreate_p16 (0)); + __temp.val[3] = vcombine_p16 (__b.val[3], vcreate_p16 (0)); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + __b.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline poly64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_p64 (const poly64_t * __ptr, poly64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly64x2x4_t __temp; + __temp.val[0] = vcombine_p64 (__b.val[0], vcreate_p64 (0)); + __temp.val[1] = vcombine_p64 (__b.val[1], vcreate_p64 (0)); + __temp.val[2] = vcombine_p64 (__b.val[2], vcreate_p64 (0)); + __temp.val[3] = vcombine_p64 (__b.val[3], vcreate_p64 (0)); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + __b.val[0] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} /* vld4q_lane */ -#define __LD4Q_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \ -__extension__ extern __inline intype \ -__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \ -vld4q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ -{ \ - __builtin_aarch64_simd_xi __o; \ - intype ret; \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); \ - __o = __builtin_aarch64_ld4_lane##mode ( \ - (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \ - ret.val[0] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 0); \ - ret.val[1] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 1); \ - ret.val[2] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 2); \ - ret.val[3] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 3); \ - return ret; \ -} - -__LD4Q_LANE_FUNC (float16x8x4_t, float16x8_t, float16_t, v8hf, hf, f16) -__LD4Q_LANE_FUNC (float32x4x4_t, float32x4_t, float32_t, v4sf, sf, f32) -__LD4Q_LANE_FUNC (float64x2x4_t, float64x2_t, float64_t, v2df, df, f64) -__LD4Q_LANE_FUNC (poly8x16x4_t, poly8x16_t, poly8_t, v16qi, qi, p8) -__LD4Q_LANE_FUNC (poly16x8x4_t, poly16x8_t, poly16_t, v8hi, hi, p16) -__LD4Q_LANE_FUNC (poly64x2x4_t, poly64x2_t, poly64_t, v2di, di, p64) -__LD4Q_LANE_FUNC (int8x16x4_t, int8x16_t, int8_t, v16qi, qi, s8) -__LD4Q_LANE_FUNC (int16x8x4_t, int16x8_t, int16_t, v8hi, hi, s16) -__LD4Q_LANE_FUNC (int32x4x4_t, int32x4_t, int32_t, v4si, si, s32) -__LD4Q_LANE_FUNC (int64x2x4_t, int64x2_t, int64_t, v2di, di, s64) -__LD4Q_LANE_FUNC (uint8x16x4_t, uint8x16_t, uint8_t, v16qi, qi, u8) -__LD4Q_LANE_FUNC (uint16x8x4_t, uint16x8_t, uint16_t, v8hi, hi, u16) -__LD4Q_LANE_FUNC (uint32x4x4_t, uint32x4_t, uint32_t, v4si, si, u32) -__LD4Q_LANE_FUNC (uint64x2x4_t, uint64x2_t, uint64_t, v2di, di, u64) +__extension__ extern __inline uint8x16x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u8 (const uint8_t * __ptr, uint8x16x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint8x16x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline uint16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u16 (const uint16_t * __ptr, uint16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline uint32x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u32 (const uint32_t * __ptr, uint32x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint32x4x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline uint64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u64 (const uint64_t * __ptr, uint64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int8x16x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s8 (const int8_t * __ptr, int8x16x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int8x16x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s16 (const int16_t * __ptr, int16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int32x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s32 (const int32_t * __ptr, int32x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int32x4x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s64 (const int64_t * __ptr, int64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline float16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_f16 (const float16_t * __ptr, float16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hf ( + (__builtin_aarch64_simd_hf *) __ptr, __o, __c); + ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline float32x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_f32 (const float32_t * __ptr, float32x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float32x4x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4sf ( + (__builtin_aarch64_simd_sf *) __ptr, __o, __c); + ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline float64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_f64 (const float64_t * __ptr, float64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2df ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline poly8x16x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_p8 (const poly8_t * __ptr, poly8x16x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly8x16x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline poly16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_p16 (const poly16_t * __ptr, poly16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline poly64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_p64 (const poly64_t * __ptr, poly64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} /* vmax */ @@ -35441,9 +35926,47 @@ vld3q_lane_bf16 (const bfloat16_t * __ptr, bfloat16x8x3_t __b, const int __c) return ret; } -__LD4_LANE_FUNC (bfloat16x4x4_t, bfloat16x4_t, bfloat16x8x4_t, bfloat16_t, v4bf, - v8bf, bf, bf16, bfloat16x8_t) -__LD4Q_LANE_FUNC (bfloat16x8x4_t, bfloat16x8_t, bfloat16_t, v8bf, bf, bf16) +__extension__ extern __inline bfloat16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_bf16 (const bfloat16_t * __ptr, bfloat16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + bfloat16x8x4_t __temp; + __temp.val[0] = vcombine_bf16 (__b.val[0], vcreate_bf16 (0)); + __temp.val[1] = vcombine_bf16 (__b.val[1], vcreate_bf16 (0)); + __temp.val[2] = vcombine_bf16 (__b.val[2], vcreate_bf16 (0)); + __temp.val[3] = vcombine_bf16 (__b.val[3], vcreate_bf16 (0)); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4bf ( + (__builtin_aarch64_simd_bf *) __ptr, __o, __c); + __b.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline bfloat16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_bf16 (const bfloat16_t * __ptr, bfloat16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + bfloat16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8bf ( + (__builtin_aarch64_simd_bf *) __ptr, __o, __c); + ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) @@ -35739,7 +36262,4 @@ vaddq_p128 (poly128_t __a, poly128_t __b) #undef __aarch64_vdupq_laneq_u32 #undef __aarch64_vdupq_laneq_u64 -#undef __LD4_LANE_FUNC -#undef __LD4Q_LANE_FUNC - #endif </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-release-arm-next-allmodconfig - Build # 31 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-arm-next-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-release-arm-next-allmodconfig Culprit: <cut> commit fad7cd3310db3099f95dd34312c77740fbc455e5 Author: Baokun Li <libaokun1(a)huawei.com> Date: Wed Aug 4 10:12:12 2021 +0800 nbd: add the check to prevent overflow in __nbd_ioctl() If user specify a large enough value of NBD blocks option, it may trigger signed integer overflow which may lead to nbd->config->bytesize becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31 signed integer overflow: 1024 * 4611686155866341414 cannot be represented in type 'long long int' [...] Call trace: [...] handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 nbd_size_set drivers/block/nbd.c:325 [inline] __nbd_ioctl drivers/block/nbd.c:1342 [inline] nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395 __blkdev_driver_ioctl block/ioctl.c:311 [inline] [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Reported-by: Hulk Robot <hulkci(a)huawei.com> Signed-off-by: Baokun Li <libaokun1(a)huawei.com> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com [axboe: dropped unlikely()] Signed-off-by: Jens Axboe <axboe(a)kernel.dk> </cut> Results regressed to (for first_bad == fad7cd3310db3099f95dd34312c77740fbc455e5) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 21709 # First few build errors in logs: # 00:07:12 make[1]: *** [modules-only.symvers] Error 1 # 00:07:12 make: *** [modules] Error 2 from (for last_good == da20b58d5bbbb0d23ae9530992a37d0f0d1787a4) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29751 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#ecf9343…" Reproduce builds: <cut> mkdir investigate-linux-fad7cd3310db3099f95dd34312c77740fbc455e5 cd investigate-linux-fad7cd3310db3099f95dd34312c77740fbc455e5 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach fad7cd3310db3099f95dd34312c77740fbc455e5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach da20b58d5bbbb0d23ae9530992a37d0f0d1787a4 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Full commit (up to 1000 lines): <cut> commit fad7cd3310db3099f95dd34312c77740fbc455e5 Author: Baokun Li <libaokun1(a)huawei.com> Date: Wed Aug 4 10:12:12 2021 +0800 nbd: add the check to prevent overflow in __nbd_ioctl() If user specify a large enough value of NBD blocks option, it may trigger signed integer overflow which may lead to nbd->config->bytesize becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31 signed integer overflow: 1024 * 4611686155866341414 cannot be represented in type 'long long int' [...] Call trace: [...] handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 nbd_size_set drivers/block/nbd.c:325 [inline] __nbd_ioctl drivers/block/nbd.c:1342 [inline] nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395 __blkdev_driver_ioctl block/ioctl.c:311 [inline] [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Reported-by: Hulk Robot <hulkci(a)huawei.com> Signed-off-by: Baokun Li <libaokun1(a)huawei.com> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com [axboe: dropped unlikely()] Signed-off-by: Jens Axboe <axboe(a)kernel.dk> --- drivers/block/nbd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c38317979f74..f82264835794 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1384,6 +1384,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, unsigned int cmd, unsigned long arg) { struct nbd_config *config = nbd->config; + loff_t bytesize; switch (cmd) { case NBD_DISCONNECT: @@ -1398,8 +1399,9 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, case NBD_SET_SIZE: return nbd_set_size(nbd, arg, config->blksize); case NBD_SET_SIZE_BLOCKS: - return nbd_set_size(nbd, arg * config->blksize, - config->blksize); + if (check_mul_overflow((loff_t)arg, config->blksize, &bytesize)) + return -EINVAL; + return nbd_set_size(nbd, bytesize, config->blksize); case NBD_SET_TIMEOUT: nbd_set_cmd_timeout(nbd, arg); return 0; </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3 - Build # 7 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3 Culprit: <cut> commit 6998f8ae2d14e096aff33968f226587b5c1a193a Author: David Sherwood <david.sherwood(a)arm.com> Date: Wed Mar 10 08:34:19 2021 +0000 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718 </cut> Results regressed to (for first_bad == 6998f8ae2d14e096aff33968f226587b5c1a193a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-6998f8ae2d14e096aff33968f226587b5c1a193a/results_id: 1 # 462.libquantum,libquantum_base.default regressed by 114 # 462.libquantum,[.] quantum_toffoli regressed by 123 # 462.libquantum,[.] quantum_cnot regressed by 115 from (for last_good == c835630c25a4f9925517949579f66a43b113fbc9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-c835630c25a4f9925517949579f66a43b113fbc9/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/3744 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/3755 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a cd investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 6998f8ae2d14e096aff33968f226587b5c1a193a ../artifacts/test.sh # Reproduce last_good build git checkout --detach c835630c25a4f9925517949579f66a43b113fbc9 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 6998f8ae2d14e096aff33968f226587b5c1a193a Author: David Sherwood <david.sherwood(a)arm.com> Date: Wed Mar 10 08:34:19 2021 +0000 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718 --- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 73 +++++++++++++--------- .../AArch64/no_vector_instructions.ll | 2 +- .../LoopVectorize/AArch64/predication_costs.ll | 35 +++++++++++ .../Transforms/LoopVectorize/scalarized-bitcast.ll | 40 ++++++++++++ 4 files changed, 121 insertions(+), 29 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 2b413fc49505..f25af23c86c2 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -7383,10 +7383,39 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, Type *RetTy = I->getType(); if (canTruncateToMinimalBitwidth(I, VF)) RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]); - VectorTy = isScalarAfterVectorization(I, VF) ? RetTy : ToVectorTy(RetTy, VF); auto SE = PSE.getSE(); TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput; + auto hasSingleCopyAfterVectorization = [this](Instruction *I, + ElementCount VF) -> bool { + if (VF.isScalar()) + return true; + + auto Scalarized = InstsToScalarize.find(VF); + assert(Scalarized != InstsToScalarize.end() && + "VF not yet analyzed for scalarization profitability"); + return !Scalarized->second.count(I) && + llvm::all_of(I->users(), [&](User *U) { + auto *UI = cast<Instruction>(U); + return !Scalarized->second.count(UI); + }); + }; + + if (isScalarAfterVectorization(I, VF)) { + // With the exception of GEPs and PHIs, after scalarization there should + // only be one copy of the instruction generated in the loop. This is + // because the VF is either 1, or any instructions that need scalarizing + // have already been dealt with by the the time we get here. As a result, + // it means we don't have to multiply the instruction cost by VF. + assert(I->getOpcode() == Instruction::GetElementPtr || + I->getOpcode() == Instruction::PHI || + (I->getOpcode() == Instruction::BitCast && + I->getType()->isPointerTy()) || + hasSingleCopyAfterVectorization(I, VF)); + VectorTy = RetTy; + } else + VectorTy = ToVectorTy(RetTy, VF); + // TODO: We need to estimate the cost of intrinsic calls. switch (I->getOpcode()) { case Instruction::GetElementPtr: @@ -7514,21 +7543,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, Op2VK = TargetTransformInfo::OK_UniformValue; SmallVector<const Value *, 4> Operands(I->operand_values()); - unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; - return N * TTI.getArithmeticInstrCost( - I->getOpcode(), VectorTy, CostKind, - TargetTransformInfo::OK_AnyValue, - Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I); + return TTI.getArithmeticInstrCost( + I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue, + Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I); } case Instruction::FNeg: { assert(!VF.isScalable() && "VF is assumed to be non scalable."); - unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; - return N * TTI.getArithmeticInstrCost( - I->getOpcode(), VectorTy, CostKind, - TargetTransformInfo::OK_AnyValue, - TargetTransformInfo::OK_AnyValue, - TargetTransformInfo::OP_None, TargetTransformInfo::OP_None, - I->getOperand(0), I); + return TTI.getArithmeticInstrCost( + I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue, + TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None, + TargetTransformInfo::OP_None, I->getOperand(0), I); } case Instruction::Select: { SelectInst *SI = cast<SelectInst>(I); @@ -7583,6 +7607,10 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, VectorTy = ToVectorTy(getMemInstValueType(I), Width); return getMemoryInstructionCost(I, VF); } + case Instruction::BitCast: + if (I->getType()->isPointerTy()) + return 0; + LLVM_FALLTHROUGH; case Instruction::ZExt: case Instruction::SExt: case Instruction::FPToUI: @@ -7593,8 +7621,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, case Instruction::SIToFP: case Instruction::UIToFP: case Instruction::Trunc: - case Instruction::FPTrunc: - case Instruction::BitCast: { + case Instruction::FPTrunc: { // Computes the CastContextHint from a Load/Store instruction. auto ComputeCCH = [&](Instruction *I) -> TTI::CastContextHint { assert((isa<LoadInst>(I) || isa<StoreInst>(I)) && @@ -7672,14 +7699,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, } } - unsigned N; - if (isScalarAfterVectorization(I, VF)) { - assert(!VF.isScalable() && "VF is assumed to be non scalable"); - N = VF.getKnownMinValue(); - } else - N = 1; - return N * - TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I); + return TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I); } case Instruction::Call: { bool NeedToScalarize; @@ -7694,11 +7714,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, case Instruction::ExtractValue: return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput); default: - // The cost of executing VF copies of the scalar instruction. This opcode - // is unknown. Assume that it is the same as 'mul'. - return VF.getKnownMinValue() * TTI.getArithmeticInstrCost( - Instruction::Mul, VectorTy, CostKind) + - getScalarizationOverhead(I, VF); + // This opcode is unknown. Assume that it is the same as 'mul'. + return TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy, CostKind); } // end of switch. } diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll index 247ea35ff5d0..3061998518ad 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll @@ -6,7 +6,7 @@ target triple = "aarch64--linux-gnu" ; CHECK-LABEL: all_scalar ; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2 -; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2 +; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2 ; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions ; define void @all_scalar(i64* %a, i64 %n) { diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll index b0ebb4edf2ad..858b28ddd321 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll @@ -86,6 +86,41 @@ for.end: ret void } +; CHECK-LABEL: predicated_store_phi +; +; Same as predicate_store except we use a pointer PHI to maintain the address +; +; CHECK: Found new scalar instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ] +; CHECK: Found new scalar instruction: %addr.next = getelementptr inbounds i32, i32* %addr, i64 1 +; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %addr, align 4 +; CHECK: Found an estimated cost of 0 for VF 2 For instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ] +; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %addr, align 4 +; +define void @predicated_store_phi(i32* %a, i1 %c, i32 %x, i64 %n) { +entry: + br label %for.body + +for.body: + %i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ] + %addr = phi i32 * [ %a, %entry ], [ %addr.next, %for.inc ] + %tmp1 = load i32, i32* %addr, align 4 + %tmp2 = add nsw i32 %tmp1, %x + br i1 %c, label %if.then, label %for.inc + +if.then: + store i32 %tmp2, i32* %addr, align 4 + br label %for.inc + +for.inc: + %i.next = add nuw nsw i64 %i, 1 + %cond = icmp slt i64 %i.next, %n + %addr.next = getelementptr inbounds i32, i32* %addr, i64 1 + br i1 %cond, label %for.body, label %for.end + +for.end: + ret void +} + ; CHECK-LABEL: predicated_udiv_scalarized_operand ; ; This test checks that we correctly compute the cost of the predicated udiv diff --git a/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll new file mode 100644 index 000000000000..0c97e6ac475e --- /dev/null +++ b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll @@ -0,0 +1,40 @@ +; REQUIRES: asserts +; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -S -o - < %s 2>&1 | FileCheck %s + +%struct.foo = type { i32, i64 } + +; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %0 = bitcast i64* %b to i32* + +; The bitcast below will be scalarized due to the predication in the loop. Bitcasts +; between pointer types should be treated as free, despite the scalarization. +define void @foo(%struct.foo* noalias nocapture %in, i32* noalias nocapture readnone %out, i64 %n) { +entry: + br label %for.body + +for.body: ; preds = %entry, %if.end + %i.012 = phi i64 [ %inc, %if.end ], [ 0, %entry ] + %b = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 1 + %0 = bitcast i64* %b to i32* + %a = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 0 + %1 = load i32, i32* %a, align 8 + %tobool.not = icmp eq i32 %1, 0 + br i1 %tobool.not, label %if.end, label %land.lhs.true + +land.lhs.true: ; preds = %for.body + %2 = load i32, i32* %0, align 4 + %cmp2 = icmp sgt i32 %2, 0 + br i1 %cmp2, label %if.then, label %if.end + +if.then: ; preds = %land.lhs.true + %sub = add nsw i32 %2, -1 + store i32 %sub, i32* %0, align 4 + br label %if.end + +if.end: ; preds = %if.then, %land.lhs.true, %for.body + %inc = add nuw nsw i64 %i.012, 1 + %exitcond.not = icmp eq i64 %inc, %n + br i1 %exitcond.not, label %for.end, label %for.body + +for.end: ; preds = %if.end + ret void +} </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz_LTO - Build # 3 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Culprit: <cut> commit 4aafd5f00c2a772337ec065d4542ef158453a343 Author: Jan Svoboda <jan_svoboda(a)apple.com> Date: Fri Aug 6 14:46:41 2021 +0200 [clang] Remove misleading assertion in FullSourceLoc D31709 added an assertion was added to `FullSourceLoc::hasManager()` that ensured a valid `SourceLocation` is always paired with a `SourceManager`, and missing `SourceManager` is always paired with an invalid `SourceLocation`. This appears to be incorrect, since clients never cared about constructing `FullSourceLoc` to uphold that invariant, or always checking `isValid()` before calling `hasManager()`. The assertion started failing when serializing diagnostics pointing into an explicit module. Explicit modules don't have valid `SourceLocation` for the `import` statement, since they are "imported" from the command-line argument `-fmodule-name=x.pcm`. This patch removes the assertion, since `FullSourceLoc` was never intended to uphold any kind of invariants between the validity of `SourceLocation` and presence of `SourceManager`. Reviewed By: arphaman Differential Revision: https://reviews.llvm.org/D106862 </cut> Results regressed to (for first_bad == 4aafd5f00c2a772337ec065d4542ef158453a343) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-4aafd5f00c2a772337ec065d4542ef158453a343/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == 3709822d2602b8b7db2d9bcc0e856f676582f25d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-3709822d2602b8b7db2d9bcc0e856f676582f25d/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/3746 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/3725 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4aafd5f00c2a772337ec065d4542ef158453a343 cd investigate-llvm-4aafd5f00c2a772337ec065d4542ef158453a343 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4aafd5f00c2a772337ec065d4542ef158453a343 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3709822d2602b8b7db2d9bcc0e856f676582f25d ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4aafd5f00c2a772337ec065d4542ef158453a343 Author: Jan Svoboda <jan_svoboda(a)apple.com> Date: Fri Aug 6 14:46:41 2021 +0200 [clang] Remove misleading assertion in FullSourceLoc D31709 added an assertion was added to `FullSourceLoc::hasManager()` that ensured a valid `SourceLocation` is always paired with a `SourceManager`, and missing `SourceManager` is always paired with an invalid `SourceLocation`. This appears to be incorrect, since clients never cared about constructing `FullSourceLoc` to uphold that invariant, or always checking `isValid()` before calling `hasManager()`. The assertion started failing when serializing diagnostics pointing into an explicit module. Explicit modules don't have valid `SourceLocation` for the `import` statement, since they are "imported" from the command-line argument `-fmodule-name=x.pcm`. This patch removes the assertion, since `FullSourceLoc` was never intended to uphold any kind of invariants between the validity of `SourceLocation` and presence of `SourceManager`. Reviewed By: arphaman Differential Revision: https://reviews.llvm.org/D106862 --- clang/include/clang/Basic/SourceLocation.h | 13 +++++++------ clang/test/Modules/Inputs/explicit-build-diags/a.h | 1 + .../Modules/Inputs/explicit-build-diags/module.modulemap | 1 + clang/test/Modules/explicit-build-diags.cpp | 8 ++++++++ 4 files changed, 17 insertions(+), 6 deletions(-) diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h index 540de23b9f55..ba2e9156a2b1 100644 --- a/clang/include/clang/Basic/SourceLocation.h +++ b/clang/include/clang/Basic/SourceLocation.h @@ -363,6 +363,10 @@ class FileEntry; /// A SourceLocation and its associated SourceManager. /// /// This is useful for argument passing to functions that expect both objects. +/// +/// This class does not guarantee the presence of either the SourceManager or +/// a valid SourceLocation. Clients should use `isValid()` and `hasManager()` +/// before calling the member functions. class FullSourceLoc : public SourceLocation { const SourceManager *SrcMgr = nullptr; @@ -373,13 +377,10 @@ public: explicit FullSourceLoc(SourceLocation Loc, const SourceManager &SM) : SourceLocation(Loc), SrcMgr(&SM) {} - bool hasManager() const { - bool hasSrcMgr = SrcMgr != nullptr; - assert(hasSrcMgr == isValid() && "FullSourceLoc has location but no manager"); - return hasSrcMgr; - } + /// Checks whether the SourceManager is present. + bool hasManager() const { return SrcMgr != nullptr; } - /// \pre This FullSourceLoc has an associated SourceManager. + /// \pre hasManager() const SourceManager &getManager() const { assert(SrcMgr && "SourceManager is NULL."); return *SrcMgr; diff --git a/clang/test/Modules/Inputs/explicit-build-diags/a.h b/clang/test/Modules/Inputs/explicit-build-diags/a.h new file mode 100644 index 000000000000..486941dde83b --- /dev/null +++ b/clang/test/Modules/Inputs/explicit-build-diags/a.h @@ -0,0 +1 @@ +void a() __attribute__((deprecated)); diff --git a/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap b/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap new file mode 100644 index 000000000000..bb00c840ce39 --- /dev/null +++ b/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap @@ -0,0 +1 @@ +module a { header "a.h" } diff --git a/clang/test/Modules/explicit-build-diags.cpp b/clang/test/Modules/explicit-build-diags.cpp new file mode 100644 index 000000000000..4a37dc108a68 --- /dev/null +++ b/clang/test/Modules/explicit-build-diags.cpp @@ -0,0 +1,8 @@ +// RUN: rm -rf %t && mkdir %t +// RUN: %clang_cc1 -fmodules -x c++ %S/Inputs/explicit-build-diags/module.modulemap -fmodule-name=a -emit-module -o %t/a.pcm +// RUN: %clang_cc1 -fmodules -Wdeprecated-declarations -fdiagnostics-show-note-include-stack -serialize-diagnostic-file %t/tu.dia \ +// RUN: -I %S/Inputs/explicit-build-diags -fmodule-file=%t/a.pcm -fsyntax-only %s + +#include "a.h" + +void foo() { a(); } </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit d8c373815d35df1b8544784ce172ade68fb01f8f Author: Vladislav Vinogradov <vlad.vinogradov(a)intel.com> Date: Tue Feb 2 18:26:05 2021 +0000 [mlir][NFC] Add missing include guards to MlirOptMain.h Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95533 </cut> Results regressed to (for first_bad == d8c373815d35df1b8544784ce172ade68fb01f8f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-d8c373815d35df1b8544784ce172ade68fb01f8f/results_id: 1 # 464.h264ref,libc.so.6 regressed by 120 from (for last_good == f1bdf9fa9bc5edc616842b6cb9028b7d207e012c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-f1bdf9fa9bc5edc616842b6cb9028b7d207e012c/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/3730 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/3737 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-d8c373815d35df1b8544784ce172ade68fb01f8f cd investigate-llvm-d8c373815d35df1b8544784ce172ade68fb01f8f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d8c373815d35df1b8544784ce172ade68fb01f8f ../artifacts/test.sh # Reproduce last_good build git checkout --detach f1bdf9fa9bc5edc616842b6cb9028b7d207e012c ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit d8c373815d35df1b8544784ce172ade68fb01f8f Author: Vladislav Vinogradov <vlad.vinogradov(a)intel.com> Date: Tue Feb 2 18:26:05 2021 +0000 [mlir][NFC] Add missing include guards to MlirOptMain.h Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95533 --- mlir/include/mlir/Support/MlirOptMain.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mlir/include/mlir/Support/MlirOptMain.h b/mlir/include/mlir/Support/MlirOptMain.h index da03baed2ae7..71d47317571e 100644 --- a/mlir/include/mlir/Support/MlirOptMain.h +++ b/mlir/include/mlir/Support/MlirOptMain.h @@ -10,6 +10,9 @@ // //===----------------------------------------------------------------------===// +#ifndef MLIR_SUPPORT_MLIROPTMAIN_H +#define MLIR_SUPPORT_MLIROPTMAIN_H + #include "mlir/Support/LogicalResult.h" #include "llvm/ADT/StringRef.h" @@ -59,3 +62,5 @@ LogicalResult MlirOptMain(int argc, char **argv, llvm::StringRef toolName, bool preloadDialectsInContext = true); } // end namespace mlir + +#endif // MLIR_SUPPORT_MLIROPTMAIN_H </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Oz - Build # 5 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz Culprit: <cut> commit fd26ce83981c6b50519805500272ab26b4e4c4b0 Author: Jeff Law <jlaw(a)localhost.localdomain> Date: Sun Aug 8 11:20:41 2021 -0400 Fix c6x test compromised by recent improvements to bswap & rotates gcc/testsuite * gcc.target/tic6x/rotdi16-scan.c: Pull rotate into its own function. </cut> Results regressed to (for first_bad == fd26ce83981c6b50519805500272ab26b4e4c4b0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz artifacts/build-fd26ce83981c6b50519805500272ab26b4e4c4b0/results_id: 1 # 482.sphinx3,[.] OUTLINED_FUNCTION_4 regressed by 117 from (for last_good == e9b639c4b532212ca92b2261f820768993770daa) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz artifacts/build-e9b639c4b532212ca92b2261f820768993770daa/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz/3651 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz/3643 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-fd26ce83981c6b50519805500272ab26b4e4c4b0 cd investigate-gcc-fd26ce83981c6b50519805500272ab26b4e4c4b0 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach fd26ce83981c6b50519805500272ab26b4e4c4b0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e9b639c4b532212ca92b2261f820768993770daa ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit fd26ce83981c6b50519805500272ab26b4e4c4b0 Author: Jeff Law <jlaw(a)localhost.localdomain> Date: Sun Aug 8 11:20:41 2021 -0400 Fix c6x test compromised by recent improvements to bswap & rotates gcc/testsuite * gcc.target/tic6x/rotdi16-scan.c: Pull rotate into its own function. --- gcc/testsuite/gcc.target/tic6x/rotdi16-scan.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/tic6x/rotdi16-scan.c b/gcc/testsuite/gcc.target/tic6x/rotdi16-scan.c index 4d7816c1537..550418324e6 100644 --- a/gcc/testsuite/gcc.target/tic6x/rotdi16-scan.c +++ b/gcc/testsuite/gcc.target/tic6x/rotdi16-scan.c @@ -7,10 +7,14 @@ unsigned long long z = 0x012389ab4567cdefull; +unsigned long long __attribute__ ((noinline,noclone,noipa)) bar () +{ + return (z << 48) | (z >> 16); +} + int main () { - unsigned long long z2 = (z << 48) | (z >> 16); - if (z2 != 0xcdef012389ab4567ull) + if (bar() != 0xcdef012389ab4567ull) abort (); exit (0); } </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3 - Build # 8 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3 Culprit: <cut> commit e771614bae0a05585f720812d5936a0b81dcddf0 Author: David Green <david.green(a)arm.com> Date: Thu Feb 11 11:58:55 2021 +0000 [ARM] Change getScalarizationOverhead overload used in gather costs. NFC This changes which of the getScalarizationOverhead overloads is used in the gather/scatter cost to use the base variant directly, not relying on the version using heuristics on the number of args with no args provided. It should still produce the same costs for scalarized gathers/scatters. </cut> Results regressed to (for first_bad == e771614bae0a05585f720812d5936a0b81dcddf0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-e771614bae0a05585f720812d5936a0b81dcddf0/results_id: 1 # 445.gobmk,[.] fastlib regressed by 115 from (for last_good == a31eae840525e9292a3a42c1fdac3fc594f42949) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-a31eae840525e9292a3a42c1fdac3fc594f42949/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3644 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3642 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-e771614bae0a05585f720812d5936a0b81dcddf0 cd investigate-llvm-e771614bae0a05585f720812d5936a0b81dcddf0 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach e771614bae0a05585f720812d5936a0b81dcddf0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a31eae840525e9292a3a42c1fdac3fc594f42949 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit e771614bae0a05585f720812d5936a0b81dcddf0 Author: David Green <david.green(a)arm.com> Date: Thu Feb 11 11:58:55 2021 +0000 [ARM] Change getScalarizationOverhead overload used in gather costs. NFC This changes which of the getScalarizationOverhead overloads is used in the gather/scatter cost to use the base variant directly, not relying on the version using heuristics on the number of args with no args provided. It should still produce the same costs for scalarized gathers/scatters. --- llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index af67839c2d75..de2c0607d2ed 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -1416,8 +1416,9 @@ unsigned ARMTTIImpl::getGatherScatterOpCost(unsigned Opcode, Type *DataTy, unsigned VectorCost = NumElems * LT.first * ST->getMVEVectorCostFactor(); // The scalarization cost should be a lot higher. We use the number of vector // elements plus the scalarization overhead. - unsigned ScalarCost = - NumElems * LT.first + BaseT::getScalarizationOverhead(VTy, {}); + unsigned ScalarCost = NumElems * LT.first + + BaseT::getScalarizationOverhead(VTy, true, false) + + BaseT::getScalarizationOverhead(VTy, false, true); if (EltSize < 8 || Alignment < EltSize / 8) return ScalarCost; </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 19 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. </cut> Results regressed to (for first_bad == 4389a413e2129d7d55ee779638b649aa852b6f8a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-4389a413e2129d7d55ee779638b649aa852b6f8a/results_id: 1 # 470.lbm,lbm_base.default regressed by 108 # 447.dealII,dealII_base.default regressed by 104 # 447.dealII,[.] _ZNK12SparseMatrixIdE5vmultI6VectorIdES3_EEvRT regressed by 120 # 444.namd,namd_base.default regressed by 104 # 400.perlbench,perlbench_base.default regressed by 103 from (for last_good == dfce2909ee1ea1523ec27b834a0e56429e9c2beb) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-dfce2909ee1ea1523ec27b834a0e56429e9c2beb/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/3640 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/3621 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a cd investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4389a413e2129d7d55ee779638b649aa852b6f8a ../artifacts/test.sh # Reproduce last_good build git checkout --detach dfce2909ee1ea1523ec27b834a0e56429e9c2beb ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 838669794ea8..980d0ab45975 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 1c79640be80f..96bbc0250126 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2641,7 +2641,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2666,7 +2666,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2686,18 +2686,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2783,11 +2785,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2897,17 +2897,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

3 years, 10 months

1
0
0 0

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O3_LTO - Build # 5 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO Culprit: <cut> commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Date: Mon Jan 25 11:00:56 2021 -0800 Turn on the new pass manager by default This turns on the new pass manager by default for the optimization pipeline in Clang and ThinLTO in various LLD backends. This also makes uses of `opt -instcombine` use the new pass manager (unless specifically opted out). This does not affect the backend target-dependent codegen pipeline. If this causes regressions, you can opt out of the new pass manager either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag while building LLVM, or via various compiler flags, e.g. -flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for ELF LLD. Please file bugs for any regressions. Major differences: * The inliner works slightly differently * -O1 does some amount of inlining * LCSSA and LoopSimplify are run before all loop passes * Loop unswitching is implemented slightly differently * A new SpeculateAroundPHIs pass is added to the pipeline https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html Reviewed By: asbirlea, ychen, MaskRay, echristo Differential Revision: https://reviews.llvm.org/D95380 </cut> Results regressed to (for first_bad == 669ddd1e9b1226432b003dbba05b99f8e992285b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-669ddd1e9b1226432b003dbba05b99f8e992285b/results_id: 1 # 473.astar,astar_base.default regressed by 106 from (for last_good == b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO artifacts/build-b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3543 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3539 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b ../artifacts/test.sh # Reproduce last_good build git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Full commit (up to 1000 lines): <cut> commit 669ddd1e9b1226432b003dbba05b99f8e992285b Author: Arthur Eubanks <aeubanks(a)google.com> Date: Mon Jan 25 11:00:56 2021 -0800 Turn on the new pass manager by default This turns on the new pass manager by default for the optimization pipeline in Clang and ThinLTO in various LLD backends. This also makes uses of `opt -instcombine` use the new pass manager (unless specifically opted out). This does not affect the backend target-dependent codegen pipeline. If this causes regressions, you can opt out of the new pass manager either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag while building LLVM, or via various compiler flags, e.g. -flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for ELF LLD. Please file bugs for any regressions. Major differences: * The inliner works slightly differently * -O1 does some amount of inlining * LCSSA and LoopSimplify are run before all loop passes * Loop unswitching is implemented slightly differently * A new SpeculateAroundPHIs pass is added to the pipeline https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html Reviewed By: asbirlea, ychen, MaskRay, echristo Differential Revision: https://reviews.llvm.org/D95380 --- llvm/CMakeLists.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 1affc289e64b..f5298de9f7ca 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -688,8 +688,8 @@ else() endif() option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default}) -set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL - "Enable the experimental new pass manager by default.") +set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL + "Enable the new pass manager by default.") include(HandleLLVMOptions) </cut>

3 years, 10 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain